imapext-2007: docs/locking.txt annotate

imapext-2007

annotate docs/locking.txt @ 0:ada5e610ab86

imap-2007e

author	yuuji@gentei.org
date	Mon, 14 Sep 2009 15:17:45 +0900
parents
children

rev	line source
yuuji@0	1 /* ========================================================================
yuuji@0	2 * Copyright 1988-2006 University of Washington
yuuji@0	3 *
yuuji@0	4 * Licensed under the Apache License, Version 2.0 (the "License");
yuuji@0	5 * you may not use this file except in compliance with the License.
yuuji@0	6 * You may obtain a copy of the License at
yuuji@0	7 *
yuuji@0	8 * http://www.apache.org/licenses/LICENSE-2.0
yuuji@0	9 *
yuuji@0	10 *
yuuji@0	11 * ========================================================================
yuuji@0	12 */
yuuji@0	13
yuuji@0	14 UNIX Advisory File Locking Implications on c-client
yuuji@0	15 Mark Crispin, 28 November 1995
yuuji@0	16
yuuji@0	17
yuuji@0	18 THIS DOCUMENT HAS BEEN UPDATED TO REFLECT THE FACT THAT
yuuji@0	19 LINUX SUPPORTS BOTH flock() AND fcntl() AND THAT OSF/1
yuuji@0	20 HAS BEEN BROKEN SO THAT IT ONLY SUPPORTS fcntl().
yuuji@0	21 -- JUNE 15, 2004
yuuji@0	22
yuuji@0	23 THIS DOCUMENT HAS BEEN UPDATED TO REFLECT THE CODE IN THE
yuuji@0	24 IMAP-4 TOOLKIT AS OF NOVEMBER 28, 1995. SOME STATEMENTS
yuuji@0	25 IN THIS DOCUMENT DO NOT APPLY TO EARLIER VERSIONS OF THE
yuuji@0	26 IMAP TOOLKIT.
yuuji@0	27
yuuji@0	28 INTRODUCTION
yuuji@0	29
yuuji@0	30 Advisory locking is a mechanism by which cooperating processes
yuuji@0	31 can signal to each other their usage of a resource and whether or not
yuuji@0	32 that usage is critical. It is not a mechanism to protect against
yuuji@0	33 processes which do not cooperate in the locking.
yuuji@0	34
yuuji@0	35 The most basic form of locking involves a counter. This counter
yuuji@0	36 is -1 when the resource is available. If a process wants the lock, it
yuuji@0	37 executes an atomic increment-and-test-if-zero. If the value is zero,
yuuji@0	38 the process has the lock and can execute the critical code that needs
yuuji@0	39 exclusive usage of a resource. When it is finished, it sets the lock
yuuji@0	40 back to -1. In C terms:
yuuji@0	41
yuuji@0	42 while (++lock) /* try to get lock */
yuuji@0	43 invoke_other_threads (); /* failed, try again */
yuuji@0	44 .
yuuji@0	45 . /* critical code here */
yuuji@0	46 .
yuuji@0	47 lock = -1; /* release lock */
yuuji@0	48
yuuji@0	49 This particular form of locking appears most commonly in
yuuji@0	50 multi-threaded applications such as operating system kernels. It
yuuji@0	51 makes several presumptions:
yuuji@0	52 (1) it is alright to keep testing the lock (no overflow)
yuuji@0	53 (2) the critical resource is single-access only
yuuji@0	54 (3) there is shared writeable memory between the two threads
yuuji@0	55 (4) the threads can be trusted to release the lock when finished
yuuji@0	56
yuuji@0	57 In applications programming on multi-user systems, most commonly
yuuji@0	58 the other threads are in an entirely different process, which may even
yuuji@0	59 be logged in as a different user. Few operating systems offer shared
yuuji@0	60 writeable memory between such processes.
yuuji@0	61
yuuji@0	62 A means of communicating this is by use of a file with a mutually
yuuji@0	63 agreed upon name. A binary semaphore can be passed by means of the
yuuji@0	64 existance or non-existance of that file, provided that there is an
yuuji@0	65 atomic means to create a file if and only if that file does not exist.
yuuji@0	66 In C terms:
yuuji@0	67
yuuji@0	68 /* try to get lock */
yuuji@0	69 while ((fd = open ("lockfile",O_WRONLY\|O_CREAT\|O_EXCL,0666)) < 0)
yuuji@0	70 sleep (1); /* failed, try again */
yuuji@0	71 close (fd); /* got the lock */
yuuji@0	72 .
yuuji@0	73 . /* critical code here */
yuuji@0	74 .
yuuji@0	75 unlink ("lockfile"); /* release lock */
yuuji@0	76
yuuji@0	77 This form of locking makes fewer presumptions, but it still is
yuuji@0	78 guilty of presumptions (2) and (4) above. Presumption (2) limits the
yuuji@0	79 ability to have processes sharing a resource in a non-conflicting
yuuji@0	80 fashion (e.g. reading from a file). Presumption (4) leads to
yuuji@0	81 deadlocks should the process crash while it has a resource locked.
yuuji@0	82
yuuji@0	83 Most modern operating systems provide a resource locking system
yuuji@0	84 call that has none of these presumptions. In particular, a mechanism
yuuji@0	85 is provided for identifying shared locks as opposed to exclusive
yuuji@0	86 locks. A shared lock permits other processes to obtain a shared lock,
yuuji@0	87 but denies exclusive locks. In other words:
yuuji@0	88
yuuji@0	89 current state want shared want exclusive
yuuji@0	90 ------------- ----------- --------------
yuuji@0	91 unlocked YES YES
yuuji@0	92 locked shared YES NO
yuuji@0	93 locked exclusive NO NO
yuuji@0	94
yuuji@0	95 Furthermore, the operating system automatically relinquishes all
yuuji@0	96 locks held by that process when it terminates.
yuuji@0	97
yuuji@0	98 A useful operation is the ability to upgrade a shared lock to
yuuji@0	99 exclusive (provided there are no other shared users of the lock) and
yuuji@0	100 to downgrade an exclusive lock to shared. It is important that at no
yuuji@0	101 time is the lock ever removed; a process upgrading to exclusive must
yuuji@0	102 not relenquish its shared lock.
yuuji@0	103
yuuji@0	104 Most commonly, the resources being locked are files. Shared
yuuji@0	105 locks are particularly important with files; multiple simultaneous
yuuji@0	106 processes can read from a file, but only one can safely write at a
yuuji@0	107 time. Some writes may be safer than others; an append to the end of
yuuji@0	108 the file is safer than changing existing file data. In turn, changing
yuuji@0	109 a file record in place is safer than rewriting the file with an
yuuji@0	110 entirely different structure.
yuuji@0	111
yuuji@0	112
yuuji@0	113 FILE LOCKING ON UNIX
yuuji@0	114
yuuji@0	115 In the oldest versions of UNIX, the use of a semaphore lockfile
yuuji@0	116 was the only available form of locking. Advisory locking system calls
yuuji@0	117 were not added to UNIX until after the BSD vs. System V split. Both
yuuji@0	118 of these system calls deal with file resources only.
yuuji@0	119
yuuji@0	120 Most systems only have one or the other form of locking. AIX
yuuji@0	121 and newer versions of OSF/1 emulate the BSD form of locking as a jacket
yuuji@0	122 into the System V form. Ultrix and Linux implement both forms.
yuuji@0	123
yuuji@0	124 BSD
yuuji@0	125
yuuji@0	126 BSD added the flock() system call. It offers capabilities to
yuuji@0	127 acquire shared lock, acquire exclusive lock, and unlock. Optionally,
yuuji@0	128 the process can request an immediate error return instead of blocking
yuuji@0	129 when the lock is unavailable.
yuuji@0	130
yuuji@0	131
yuuji@0	132 FLOCK() BUGS
yuuji@0	133
yuuji@0	134 flock() advertises that it permits upgrading of shared locks to
yuuji@0	135 exclusive and downgrading of exclusive locks to shared, but it does so
yuuji@0	136 by releasing the former lock and then trying to acquire the new lock.
yuuji@0	137 This creates a window of vulnerability in which another process can
yuuji@0	138 grab the exclusive lock. Therefore, this capability is not useful,
yuuji@0	139 although many programmers have been deluded by incautious reading of
yuuji@0	140 the flock() man page to believe otherwise. This problem can be
yuuji@0	141 programmed around, once the programmer is aware of it.
yuuji@0	142
yuuji@0	143 flock() always returns as if it succeeded on NFS files, when in
yuuji@0	144 fact it is a no-op. There is no way around this.
yuuji@0	145
yuuji@0	146 Leaving aside these two problems, flock() works remarkably well,
yuuji@0	147 and has shown itself to be robust and trustworthy.
yuuji@0	148
yuuji@0	149 SYSTEM V/POSIX
yuuji@0	150
yuuji@0	151 System V added new functions to the fnctl() system call, and a
yuuji@0	152 simple interface through the lockf() subroutine. This was
yuuji@0	153 subsequently included in POSIX. Both offer the facility to apply the
yuuji@0	154 lock to a particular region of the file instead of to the entire file.
yuuji@0	155 lockf() only supports exclusive locks, and calls fcntl() internally;
yuuji@0	156 hence it won't be discussed further.
yuuji@0	157
yuuji@0	158 Functionally, fcntl() locking is a superset of flock(); it is
yuuji@0	159 possible to implement a flock() emulator using fcntl(), with one minor
yuuji@0	160 exception: it is not possible to acquire an exclusive lock if the file
yuuji@0	161 is not open for write.
yuuji@0	162
yuuji@0	163 The fcntl() locking functions are: query lock station of a file
yuuji@0	164 region, lock/unlock a region, and lock/unlock a region and block until
yuuji@0	165 have the lock. The locks may be shared or exclusive. By means of the
yuuji@0	166 statd and lockd daemons, fcntl() locking is available on NFS files.
yuuji@0	167
yuuji@0	168 When statd is started at system boot, it reads its /etc/state
yuuji@0	169 file (which contains the number of times it has been invoked) and
yuuji@0	170 /etc/sm directory (which contains a list of all remote sites which are
yuuji@0	171 client or server locking with this site), and notifies the statd on
yuuji@0	172 each of these systems that it has been restarted. Each statd then
yuuji@0	173 notifies the local lockd of the restart of that system.
yuuji@0	174
yuuji@0	175 lockd receives fcntl() requests for NFS files. It communicates
yuuji@0	176 with the lockd at the server and requests it to apply the lock, and
yuuji@0	177 with the statd to request it for notification when the server goes
yuuji@0	178 down. It blocks until all these requests are completed.
yuuji@0	179
yuuji@0	180 There is quite a mythos about fcntl() locking.
yuuji@0	181
yuuji@0	182 One religion holds that fcntl() locking is the best thing since
yuuji@0	183 sliced bread, and that programs which use flock() should be converted
yuuji@0	184 to fcntl() so that NFS locking will work. However, as noted above,
yuuji@0	185 very few systems support both calls, so such an exercise is pointless
yuuji@0	186 except on Ultrix and Linux.
yuuji@0	187
yuuji@0	188 Another religion, which I adhere to, has the opposite viewpoint.
yuuji@0	189
yuuji@0	190
yuuji@0	191 FCNTL() BUGS
yuuji@0	192
yuuji@0	193 For all of the hairy code to do individual section locking of a
yuuji@0	194 file, it's clear that the designers of fcntl() locking never
yuuji@0	195 considered some very basic locking operations. It's as if all they
yuuji@0	196 knew about locking they got out of some CS textbook with not
yuuji@0	197 investigation of real-world needs.
yuuji@0	198
yuuji@0	199 It is not possible to acquire an exclusive lock unless the file
yuuji@0	200 is open for write. You could have append with shared read, and thus
yuuji@0	201 you could have a case in which a read-only access may need to go
yuuji@0	202 exclusive. This problem can be programmed around once the programmer
yuuji@0	203 is aware of it.
yuuji@0	204
yuuji@0	205 If the file is opened on another file designator in the same
yuuji@0	206 process, the file is unlocked even if no attempt is made to do any
yuuji@0	207 form of locking on the second designator. This is a very bad bug. It
yuuji@0	208 means that an application must keep track of all the files that it has
yuuji@0	209 opened and locked.
yuuji@0	210
yuuji@0	211 If there is no statd/lockd on the NFS server, fcntl() will hang
yuuji@0	212 forever waiting for them to appear. This is a bad bug. It means that
yuuji@0	213 any attempt to lock on a server that doesn't run these daemons will
yuuji@0	214 hang. There is no way for an application to request flock() style
yuuji@0	215 ``try to lock, but no-op if the mechanism ain't there''.
yuuji@0	216
yuuji@0	217 There is a rumor to the effect that fcntl() will hang forever on
yuuji@0	218 local files too if there is no local statd/lockd. These daemons are
yuuji@0	219 running on mailer.u, although they appear not to have much CPU time.
yuuji@0	220 A useful experiment would be to kill them and see if imapd is affected
yuuji@0	221 in any way, but I decline to do so without an OK from UCS! ;-) If
yuuji@0	222 killing statd/lockd can be done without breaking fcntl() on local
yuuji@0	223 files, this would become one of the primary means of dealing with this
yuuji@0	224 problem.
yuuji@0	225
yuuji@0	226 The statd and lockd daemons have quite a reputation for extreme
yuuji@0	227 fragility. There have been numerous reports about the locking
yuuji@0	228 mechanism being wedged on a systemwide or even clusterwide basis,
yuuji@0	229 requiring a reboot to clear. It is rumored that this wedge, once it
yuuji@0	230 happens, also blocks local locking. Presumably killing and restarting
yuuji@0	231 statd would suffice to clear the wedge, but I haven't verified this.
yuuji@0	232
yuuji@0	233 There appears to be a limit to how many locks may be in use at a
yuuji@0	234 time on the system, although the documentation only mentions it in
yuuji@0	235 passing. On some of their systems, UCS has increased lockd's ``size
yuuji@0	236 of the socket buffer'', whatever that means.
yuuji@0	237
yuuji@0	238 C-CLIENT USAGE
yuuji@0	239
yuuji@0	240 c-client uses flock(). On System V systems, flock() is simulated
yuuji@0	241 by an emulator that calls fcntl().
yuuji@0	242
yuuji@0	243
yuuji@0	244 BEZERK AND MMDF
yuuji@0	245
yuuji@0	246 Locking in the traditional UNIX formats was largely dictated by
yuuji@0	247 the status quo in other applications; however, additional protection
yuuji@0	248 is added against inadvertantly running multiple instances of a
yuuji@0	249 c-client application on the same mail file.
yuuji@0	250
yuuji@0	251 (1) c-client attempts to create a .lock file (mail file name with
yuuji@0	252 ``.lock'' appended) whenever it reads from, or writes to, the mail
yuuji@0	253 file. This is an exclusive lock, and is held only for short periods
yuuji@0	254 of time while c-client is actually doing the I/O. There is a 5-minute
yuuji@0	255 timeout for this lock, after which it is broken on the presumption
yuuji@0	256 that it is a stale lock. If it can not create the .lock file due to
yuuji@0	257 an EACCES (protection failure) error, it once silently proceeded
yuuji@0	258 without this lock; this was for systems which protect /usr/spool/mail
yuuji@0	259 from unprivileged processes creating files. Today, c-client reports
yuuji@0	260 an error unless it is built otherwise. The purpose of this lock is to
yuuji@0	261 prevent against unfavorable interactions with mail delivery.
yuuji@0	262
yuuji@0	263 (2) c-client applies a shared flock() to the mail file whenever
yuuji@0	264 it reads from the mail file, and an exclusive flock() whenever it
yuuji@0	265 writes to the mail file. This lock is freed as soon as it finishes
yuuji@0	266 reading. The purpose of this lock is to prevent against unfavorable
yuuji@0	267 interactions with mail delivery.
yuuji@0	268
yuuji@0	269 (3) c-client applies an exclusive flock() to a file on /tmp
yuuji@0	270 (whose name represents the device and inode number of the file) when
yuuji@0	271 it opens the mail file. This lock is maintained throughout the
yuuji@0	272 session, although c-client has a feature (called ``kiss of death'')
yuuji@0	273 which permits c-client to forcibly and irreversibly seize the lock
yuuji@0	274 from a cooperating c-client application that surrenders the lock on
yuuji@0	275 demand. The purpose of this lock is to prevent against unfavorable
yuuji@0	276 interactions with other instances of c-client (rewriting the mail
yuuji@0	277 file).
yuuji@0	278
yuuji@0	279 Mail delivery daemons use lock (1), (2), or both. Lock (1) works
yuuji@0	280 over NFS; lock (2) is the only one that works on sites that protect
yuuji@0	281 /usr/spool/mail against unprivileged file creation. Prudent mail
yuuji@0	282 delivery daemons use both forms of locking, and of course so does
yuuji@0	283 c-client.
yuuji@0	284
yuuji@0	285 If only lock (2) is used, then multiple processes can read from
yuuji@0	286 the mail file simultaneously, although in real life this doesn't
yuuji@0	287 really change things. The normal state of locks (1) and (2) is
yuuji@0	288 unlocked except for very brief periods.
yuuji@0	289
yuuji@0	290
yuuji@0	291 TENEX AND MTX
yuuji@0	292
yuuji@0	293 The design of the locking mechanism of these formats was
yuuji@0	294 motivated by a design to enable multiple simultaneous read/write
yuuji@0	295 access. It is almost the reverse of how locking works with
yuuji@0	296 bezerk/mmdf.
yuuji@0	297
yuuji@0	298 (1) c-client applies a shared flock() to the mail file when it
yuuji@0	299 opens the mail file. It upgrades this lock to exclusive whenever it
yuuji@0	300 tries to expunge the mail file. Because of the flock() bug that
yuuji@0	301 upgrading a lock actually releases it, it will not do so until it has
yuuji@0	302 acquired an exclusive lock (2) first. The purpose of this lock is to
yuuji@0	303 prevent against expunge taking place while some other c-client has the
yuuji@0	304 mail file open (and thus knows where all the messages are).
yuuji@0	305
yuuji@0	306 (2) c-client applies a shared flock() to a file on /tmp (whose
yuuji@0	307 name represents the device and inode number of the file) when it
yuuji@0	308 parses the mail file. It applies an exclusive flock() to this file
yuuji@0	309 when it appends new mail to the mail file, as well as before it
yuuji@0	310 attempts to upgrade lock (1) to exclusive. The purpose of this lock
yuuji@0	311 is to prevent against data being appended while some other c-client is
yuuji@0	312 parsing mail in the file (to prevent reading of incomplete messages).
yuuji@0	313 It also protects against the lock-releasing timing race on lock (1).
yuuji@0	314
yuuji@0	315 OBSERVATIONS
yuuji@0	316
yuuji@0	317 In a perfect world, locking works. You are protected against
yuuji@0	318 unfavorable interactions with the mailer and against your own mistake
yuuji@0	319 by running more than one instance of your mail reader. In tenex/mtx
yuuji@0	320 formats, you have the additional benefit that multiple simultaneous
yuuji@0	321 read/write access works, with the sole restriction being that you
yuuji@0	322 can't expunge if there are any sharers of the mail file.
yuuji@0	323
yuuji@0	324 If the mail file is NFS-mounted, then flock() locking is a silent
yuuji@0	325 no-op. This is the way BSD implements flock(), and c-client's
yuuji@0	326 emulation of flock() through fcntl() tests for NFS files and
yuuji@0	327 duplicates this functionality. There is no locking protection for
yuuji@0	328 tenex/mtx mail files at all, and only protection against the mailer
yuuji@0	329 for bezerk/mmdf mail files. This has been the accepted state of
yuuji@0	330 affairs on UNIX for many sad years.
yuuji@0	331
yuuji@0	332 If you can not create .lock files, it should not affect locking,
yuuji@0	333 since the flock() locks suffice for all protection. This is, however,
yuuji@0	334 not true if the mailer does not check for flock() locking, or if the
yuuji@0	335 the mail file is NFS-mounted.
yuuji@0	336
yuuji@0	337 What this means is that there is no locking protection at all
yuuji@0	338 in the case of a client using an NFS-mounted /usr/spool/mail that does
yuuji@0	339 not permit file creation by unprivileged programs. It is impossible,
yuuji@0	340 under these circumstances, for an unprivileged program to do anything
yuuji@0	341 about it. Worse, if EACCES errors on .lock file creation are no-op'ed
yuuji@0	342 , the user won't even know about it. This is arguably a site
yuuji@0	343 configuration error.
yuuji@0	344
yuuji@0	345 The problem with not being able to create .lock files exists on
yuuji@0	346 System V as well, but the failure modes for flock() -- which is
yuuji@0	347 implemented via fcntl() -- are different.
yuuji@0	348
yuuji@0	349 On System V, if the mail file is NFS-mounted and either the
yuuji@0	350 client or the server lacks a functioning statd/lockd pair, then the
yuuji@0	351 lock attempt would have hung forever if it weren't for the fact that
yuuji@0	352 c-client tests for NFS and no-ops the flock() emulator in this case.
yuuji@0	353 Systemwide or clusterwide failures of statd/lockd have been known to
yuuji@0	354 occur which cause all locks in all processes to hang (including
yuuji@0	355 local?). Without the special NFS test made by c-client, there would
yuuji@0	356 be no way to request BSD-style no-op behavior, nor is there any way to
yuuji@0	357 determine that this is happening other than the system being hung.
yuuji@0	358
yuuji@0	359 The additional locking introduced by c-client was shown to cause
yuuji@0	360 much more stress on the System V locking mechanism than has
yuuji@0	361 traditionally been placed upon it. If it was stressed too far, all
yuuji@0	362 hell broke loose. Fortunately, this is now past history.
yuuji@0	363
yuuji@0	364 TRADEOFFS
yuuji@0	365
yuuji@0	366 c-client based applications have a reasonable chance of winning
yuuji@0	367 as long as you don't use NFS for remote access to mail files. That's
yuuji@0	368 what IMAP is for, after all. It is, however, very important to
yuuji@0	369 realize that you can not use the lock-upgrade feature by itself
yuuji@0	370 because it releases the lock as an interim step -- you need to have
yuuji@0	371 lock-upgrading guarded by another lock.
yuuji@0	372
yuuji@0	373 If you have the misfortune of using System V, you are likely to
yuuji@0	374 run into problems sooner or later having to do with statd/lockd. You
yuuji@0	375 basically end up with one of three unsatisfactory choices:
yuuji@0	376 1) Grit your teeth and live with it.
yuuji@0	377 2) Try to make it work:
yuuji@0	378 a) avoid NFS access so as not to stress statd/lockd.
yuuji@0	379 b) try to understand the code in statd/lockd and hack it
yuuji@0	380 to be more robust.
yuuji@0	381 c) hunt out the system limit of locks, if there is one,
yuuji@0	382 and increase it. Figure on at least two locks per
yuuji@0	383 simultaneous imapd process and four locks per Pine
yuuji@0	384 process. Better yet, make the limit be 10 times the
yuuji@0	385 maximum number of processes.
yuuji@0	386 d) increase the socket buffer (-S switch to lockd) if
yuuji@0	387 it is offered. I don't know what this actually does,
yuuji@0	388 but giving lockd more resources to do its work can't
yuuji@0	389 hurt. Maybe.
yuuji@0	390 3) Decide that it can't possibly work, and turn off the
yuuji@0	391 fcntl() calls in your program.
yuuji@0	392 4) If nuking statd/lockd can be done without breaking local
yuuji@0	393 locking, then do so. This would make SVR4 have the same
yuuji@0	394 limitations as BSD locking, with a couple of additional
yuuji@0	395 bugs.
yuuji@0	396 5) Check for NFS, and don't do the fcntl() in the NFS case.
yuuji@0	397 This is what c-client does.
yuuji@0	398
yuuji@0	399 Note that if you are going to use NFS to access files on a server
yuuji@0	400 which does not have statd/lockd running, your only choice is (3), (4),
yuuji@0	401 or (5). Here again, IMAP can bail you out.
yuuji@0	402
yuuji@0	403 These problems aren't unique to c-client applications; they have
yuuji@0	404 also been reported with Elm, Mediamail, and other email tools.
yuuji@0	405
yuuji@0	406 Of the other two SVR4 locking bugs:
yuuji@0	407
yuuji@0	408 Programmer awareness is necessary to deal with the bug that you
yuuji@0	409 can not get an exclusive lock unless the file is open for write. I
yuuji@0	410 believe that c-client has fixed all of these cases.
yuuji@0	411
yuuji@0	412 The problem about opening a second designator smashing any
yuuji@0	413 current locks on the file has not been addressed satisfactorily yet.
yuuji@0	414 This is not an easy problem to deal with, especially in c-client which
yuuji@0	415 really doesn't know what other files/streams may be open by Pine.
yuuji@0	416
yuuji@0	417 Aren't you so happy that you bought an System V system?