imapext-2007

diff docs/locking.txt @ 0:ada5e610ab86
imap-2007e
author: yuuji@gentei.org
date: Mon, 14 Sep 2009 15:17:45 +0900
     1.1 --- /dev/null	Thu Jan 01 00:00:00 1970 +0000
     1.2 +++ b/docs/locking.txt	Mon Sep 14 15:17:45 2009 +0900
     1.3 @@ -0,0 +1,417 @@
     1.4 +/* ========================================================================
     1.5 + * Copyright 1988-2006 University of Washington
     1.6 + *
     1.7 + * Licensed under the Apache License, Version 2.0 (the "License");
     1.8 + * you may not use this file except in compliance with the License.
     1.9 + * You may obtain a copy of the License at
    1.10 + *
    1.11 + *     http://www.apache.org/licenses/LICENSE-2.0
    1.12 + *
    1.13 + * 
    1.14 + * ========================================================================
    1.15 + */
    1.16 +
    1.17 +	 UNIX Advisory File Locking Implications on c-client
    1.18 +		    Mark Crispin, 28 November 1995
    1.19 +
    1.20 +
    1.21 +	THIS DOCUMENT HAS BEEN UPDATED TO REFLECT THE FACT THAT
    1.22 +	LINUX SUPPORTS BOTH flock() AND fcntl() AND THAT OSF/1
    1.23 +	HAS BEEN BROKEN SO THAT IT ONLY SUPPORTS fcntl().
    1.24 +	-- JUNE 15, 2004
    1.25 +
    1.26 +	THIS DOCUMENT HAS BEEN UPDATED TO REFLECT THE CODE IN THE
    1.27 +	IMAP-4 TOOLKIT AS OF NOVEMBER 28, 1995.  SOME STATEMENTS
    1.28 +	IN THIS DOCUMENT DO NOT APPLY TO EARLIER VERSIONS OF THE
    1.29 +	IMAP TOOLKIT.
    1.30 +
    1.31 +INTRODUCTION
    1.32 +
    1.33 +     Advisory locking is a mechanism by which cooperating processes
    1.34 +can signal to each other their usage of a resource and whether or not
    1.35 +that usage is critical.  It is not a mechanism to protect against
    1.36 +processes which do not cooperate in the locking.
    1.37 +
    1.38 +     The most basic form of locking involves a counter.  This counter
    1.39 +is -1 when the resource is available.  If a process wants the lock, it
    1.40 +executes an atomic increment-and-test-if-zero.  If the value is zero,
    1.41 +the process has the lock and can execute the critical code that needs
    1.42 +exclusive usage of a resource.  When it is finished, it sets the lock
    1.43 +back to -1.  In C terms:
    1.44 +
    1.45 +  while (++lock)		/* try to get lock */
    1.46 +    invoke_other_threads ();	/* failed, try again */
    1.47 +   .
    1.48 +   .	/* critical code  here */
    1.49 +   .
    1.50 +  lock = -1;			/* release lock */
    1.51 +
    1.52 +     This particular form of locking appears most commonly in
    1.53 +multi-threaded applications such as operating system kernels.  It
    1.54 +makes several presumptions:
    1.55 + (1) it is alright to keep testing the lock (no overflow)
    1.56 + (2) the critical resource is single-access only
    1.57 + (3) there is shared writeable memory between the two threads
    1.58 + (4) the threads can be trusted to release the lock when finished
    1.59 +
    1.60 +     In applications programming on multi-user systems, most commonly
    1.61 +the other threads are in an entirely different process, which may even
    1.62 +be logged in as a different user.  Few operating systems offer shared
    1.63 +writeable memory between such processes.
    1.64 +
    1.65 +     A means of communicating this is by use of a file with a mutually
    1.66 +agreed upon name.  A binary semaphore can be passed by means of the
    1.67 +existance or non-existance of that file, provided that there is an
    1.68 +atomic means to create a file if and only if that file does not exist.
    1.69 +In C terms:
    1.70 +
    1.71 +				/* try to get lock */
    1.72 +  while ((fd = open ("lockfile",O_WRONLY|O_CREAT|O_EXCL,0666)) < 0)
    1.73 +    sleep (1);			/* failed, try again */
    1.74 +  close (fd);			/* got the lock */
    1.75 +   .
    1.76 +   .	/* critical code  here */
    1.77 +   .
    1.78 +  unlink ("lockfile"); 		/* release lock */
    1.79 +
    1.80 +     This form of locking makes fewer presumptions, but it still is
    1.81 +guilty of presumptions (2) and (4) above.  Presumption (2) limits the
    1.82 +ability to have processes sharing a resource in a non-conflicting
    1.83 +fashion (e.g. reading from a file).  Presumption (4) leads to
    1.84 +deadlocks should the process crash while it has a resource locked.
    1.85 +
    1.86 +     Most modern operating systems provide a resource locking system
    1.87 +call that has none of these presumptions.  In particular, a mechanism
    1.88 +is provided for identifying shared locks as opposed to exclusive
    1.89 +locks.  A shared lock permits other processes to obtain a shared lock,
    1.90 +but denies exclusive locks.  In other words:
    1.91 +
    1.92 +	current state		want shared	want exclusive
    1.93 +	-------------		-----------	--------------
    1.94 +	 unlocked		 YES		 YES
    1.95 +	 locked shared		 YES		 NO
    1.96 +	 locked exclusive	 NO		 NO
    1.97 +
    1.98 +     Furthermore, the operating system automatically relinquishes all
    1.99 +locks held by that process when it terminates.
   1.100 +
   1.101 +     A useful operation is the ability to upgrade a shared lock to
   1.102 +exclusive (provided there are no other shared users of the lock) and
   1.103 +to downgrade an exclusive lock to shared.  It is important that at no
   1.104 +time is the lock ever removed; a process upgrading to exclusive must
   1.105 +not relenquish its shared lock.
   1.106 +
   1.107 +     Most commonly, the resources being locked are files.  Shared
   1.108 +locks are particularly important with files; multiple simultaneous
   1.109 +processes can read from a file, but only one can safely write at a
   1.110 +time.  Some writes may be safer than others; an append to the end of
   1.111 +the file is safer than changing existing file data.  In turn, changing
   1.112 +a file record in place is safer than rewriting the file with an
   1.113 +entirely different structure.
   1.114 +
   1.115 +
   1.116 +FILE LOCKING ON UNIX
   1.117 +
   1.118 +     In the oldest versions of UNIX, the use of a semaphore lockfile
   1.119 +was the only available form of locking.  Advisory locking system calls
   1.120 +were not added to UNIX until after the BSD vs. System V split.  Both
   1.121 +of these system calls deal with file resources only.
   1.122 +
   1.123 +     Most systems only have one or the other form of locking.  AIX
   1.124 +and newer versions of OSF/1 emulate the BSD form of locking as a jacket
   1.125 +into the System V form.  Ultrix and Linux implement both forms.
   1.126 +
   1.127 +BSD
   1.128 +
   1.129 +     BSD added the flock() system call.  It offers capabilities to
   1.130 +acquire shared lock, acquire exclusive lock, and unlock.  Optionally,
   1.131 +the process can request an immediate error return instead of blocking
   1.132 +when the lock is unavailable.
   1.133 +
   1.134 +
   1.135 +FLOCK() BUGS
   1.136 +
   1.137 +     flock() advertises that it permits upgrading of shared locks to
   1.138 +exclusive and downgrading of exclusive locks to shared, but it does so
   1.139 +by releasing the former lock and then trying to acquire the new lock.
   1.140 +This creates a window of vulnerability in which another process can
   1.141 +grab the exclusive lock.  Therefore, this capability is not useful,
   1.142 +although many programmers have been deluded by incautious reading of
   1.143 +the flock() man page to believe otherwise.  This problem can be
   1.144 +programmed around, once the programmer is aware of it.
   1.145 +
   1.146 +     flock() always returns as if it succeeded on NFS files, when in
   1.147 +fact it is a no-op.  There is no way around this.
   1.148 +
   1.149 +     Leaving aside these two problems, flock() works remarkably well,
   1.150 +and has shown itself to be robust and trustworthy.
   1.151 +
   1.152 +SYSTEM V/POSIX
   1.153 +
   1.154 +     System V added new functions to the fnctl() system call, and a
   1.155 +simple interface through the lockf() subroutine.  This was
   1.156 +subsequently included in POSIX.  Both offer the facility to apply the
   1.157 +lock to a particular region of the file instead of to the entire file.
   1.158 +lockf() only supports exclusive locks, and calls fcntl() internally;
   1.159 +hence it won't be discussed further.
   1.160 +
   1.161 +     Functionally, fcntl() locking is a superset of flock(); it is
   1.162 +possible to implement a flock() emulator using fcntl(), with one minor
   1.163 +exception: it is not possible to acquire an exclusive lock if the file
   1.164 +is not open for write.
   1.165 +
   1.166 +     The fcntl() locking functions are: query lock station of a file
   1.167 +region, lock/unlock a region, and lock/unlock a region and block until
   1.168 +have the lock.  The locks may be shared or exclusive.  By means of the
   1.169 +statd and lockd daemons, fcntl() locking is available on NFS files.
   1.170 +
   1.171 +     When statd is started at system boot, it reads its /etc/state
   1.172 +file (which contains the number of times it has been invoked) and
   1.173 +/etc/sm directory (which contains a list of all remote sites which are
   1.174 +client or server locking with this site), and notifies the statd on
   1.175 +each of these systems that it has been restarted.  Each statd then
   1.176 +notifies the local lockd of the restart of that system.
   1.177 +
   1.178 +     lockd receives fcntl() requests for NFS files.  It communicates
   1.179 +with the lockd at the server and requests it to apply the lock, and
   1.180 +with the statd to request it for notification when the server goes
   1.181 +down.  It blocks until all these requests are completed.
   1.182 +
   1.183 +     There is quite a mythos about fcntl() locking.
   1.184 +
   1.185 +     One religion holds that fcntl() locking is the best thing since
   1.186 +sliced bread, and that programs which use flock() should be converted
   1.187 +to fcntl() so that NFS locking will work.  However, as noted above,
   1.188 +very few systems support both calls, so such an exercise is pointless
   1.189 +except on Ultrix and Linux.
   1.190 +
   1.191 +     Another religion, which I adhere to, has the opposite viewpoint.
   1.192 +
   1.193 +
   1.194 +FCNTL() BUGS
   1.195 +
   1.196 +     For all of the hairy code to do individual section locking of a
   1.197 +file, it's clear that the designers of fcntl() locking never
   1.198 +considered some very basic locking operations.  It's as if all they
   1.199 +knew about locking they got out of some CS textbook with not
   1.200 +investigation of real-world needs.
   1.201 +
   1.202 +     It is not possible to acquire an exclusive lock unless the file
   1.203 +is open for write.  You could have append with shared read, and thus
   1.204 +you could have a case in which a read-only access may need to go
   1.205 +exclusive.  This problem can be programmed around once the programmer
   1.206 +is aware of it.
   1.207 +
   1.208 +     If the file is opened on another file designator in the same
   1.209 +process, the file is unlocked even if no attempt is made to do any
   1.210 +form of locking on the second designator.  This is a very bad bug.  It
   1.211 +means that an application must keep track of all the files that it has
   1.212 +opened and locked.
   1.213 +
   1.214 +     If there is no statd/lockd on the NFS server, fcntl() will hang
   1.215 +forever waiting for them to appear.  This is a bad bug.  It means that
   1.216 +any attempt to lock on a server that doesn't run these daemons will
   1.217 +hang.  There is no way for an application to request flock() style
   1.218 +``try to lock, but no-op if the mechanism ain't there''.
   1.219 +
   1.220 +     There is a rumor to the effect that fcntl() will hang forever on
   1.221 +local files too if there is no local statd/lockd.  These daemons are
   1.222 +running on mailer.u, although they appear not to have much CPU time.
   1.223 +A useful experiment would be to kill them and see if imapd is affected
   1.224 +in any way, but I decline to do so without an OK from UCS!  ;-) If
   1.225 +killing statd/lockd can be done without breaking fcntl() on local
   1.226 +files, this would become one of the primary means of dealing with this
   1.227 +problem.
   1.228 +
   1.229 +     The statd and lockd daemons have quite a reputation for extreme
   1.230 +fragility.  There have been numerous reports about the locking
   1.231 +mechanism being wedged on a systemwide or even clusterwide basis,
   1.232 +requiring a reboot to clear.  It is rumored that this wedge, once it
   1.233 +happens, also blocks local locking.  Presumably killing and restarting
   1.234 +statd would suffice to clear the wedge, but I haven't verified this.
   1.235 +
   1.236 +     There appears to be a limit to how many locks may be in use at a
   1.237 +time on the system, although the documentation only mentions it in
   1.238 +passing.  On some of their systems, UCS has increased lockd's ``size
   1.239 +of the socket buffer'', whatever that means.
   1.240 +
   1.241 +C-CLIENT USAGE
   1.242 +
   1.243 +     c-client uses flock().  On System V systems, flock() is simulated
   1.244 +by an emulator that calls fcntl().
   1.245 +
   1.246 +
   1.247 +BEZERK AND MMDF
   1.248 +
   1.249 +     Locking in the traditional UNIX formats was largely dictated by
   1.250 +the status quo in other applications; however, additional protection
   1.251 +is added against inadvertantly running multiple instances of a
   1.252 +c-client application on the same mail file.
   1.253 +
   1.254 +     (1) c-client attempts to create a .lock file (mail file name with
   1.255 +``.lock'' appended) whenever it reads from, or writes to, the mail
   1.256 +file.  This is an exclusive lock, and is held only for short periods
   1.257 +of time while c-client is actually doing the I/O.  There is a 5-minute
   1.258 +timeout for this lock, after which it is broken on the presumption
   1.259 +that it is a stale lock.  If it can not create the .lock file due to
   1.260 +an EACCES (protection failure) error, it once silently proceeded
   1.261 +without this lock; this was for systems which protect /usr/spool/mail
   1.262 +from unprivileged processes creating files.  Today, c-client reports
   1.263 +an error unless it is built otherwise.  The purpose of this lock is to
   1.264 +prevent against unfavorable interactions with mail delivery.
   1.265 +
   1.266 +     (2) c-client applies a shared flock() to the mail file whenever
   1.267 +it reads from the mail file, and an exclusive flock() whenever it
   1.268 +writes to the mail file.  This lock is freed as soon as it finishes
   1.269 +reading.  The purpose of this lock is to prevent against unfavorable
   1.270 +interactions with mail delivery.
   1.271 +
   1.272 +     (3) c-client applies an exclusive flock() to a file on /tmp
   1.273 +(whose name represents the device and inode number of the file) when
   1.274 +it opens the mail file.  This lock is maintained throughout the
   1.275 +session, although c-client has a feature (called ``kiss of death'')
   1.276 +which permits c-client to forcibly and irreversibly seize the lock
   1.277 +from a cooperating c-client application that surrenders the lock on
   1.278 +demand.  The purpose of this lock is to prevent against unfavorable
   1.279 +interactions with other instances of c-client (rewriting the mail
   1.280 +file).
   1.281 +
   1.282 +     Mail delivery daemons use lock (1), (2), or both.  Lock (1) works
   1.283 +over NFS; lock (2) is the only one that works on sites that protect
   1.284 +/usr/spool/mail against unprivileged file creation.  Prudent mail
   1.285 +delivery daemons use both forms of locking, and of course so does
   1.286 +c-client.
   1.287 +
   1.288 +     If only lock (2) is used, then multiple processes can read from
   1.289 +the mail file simultaneously, although in real life this doesn't
   1.290 +really change things.  The normal state of locks (1) and (2) is
   1.291 +unlocked except for very brief periods.
   1.292 +
   1.293 +
   1.294 +TENEX AND MTX
   1.295 +
   1.296 +     The design of the locking mechanism of these formats was
   1.297 +motivated by a design to enable multiple simultaneous read/write
   1.298 +access.  It is almost the reverse of how locking works with
   1.299 +bezerk/mmdf.
   1.300 +
   1.301 +     (1) c-client applies a shared flock() to the mail file when it
   1.302 +opens the mail file.  It upgrades this lock to exclusive whenever it
   1.303 +tries to expunge the mail file.  Because of the flock() bug that
   1.304 +upgrading a lock actually releases it, it will not do so until it has
   1.305 +acquired an exclusive lock (2) first.  The purpose of this lock is to
   1.306 +prevent against expunge taking place while some other c-client has the
   1.307 +mail file open (and thus knows where all the messages are).
   1.308 +
   1.309 +     (2) c-client applies a shared flock() to a file on /tmp (whose
   1.310 +name represents the device and inode number of the file) when it
   1.311 +parses the mail file.  It applies an exclusive flock() to this file
   1.312 +when it appends new mail to the mail file, as well as before it
   1.313 +attempts to upgrade lock (1) to exclusive.  The purpose of this lock
   1.314 +is to prevent against data being appended while some other c-client is
   1.315 +parsing mail in the file (to prevent reading of incomplete messages).
   1.316 +It also protects against the lock-releasing timing race on lock (1).
   1.317 +
   1.318 +OBSERVATIONS
   1.319 +
   1.320 +     In a perfect world, locking works.  You are protected against
   1.321 +unfavorable interactions with the mailer and against your own mistake
   1.322 +by running more than one instance of your mail reader.  In tenex/mtx
   1.323 +formats, you have the additional benefit that multiple simultaneous
   1.324 +read/write access works, with the sole restriction being that you
   1.325 +can't expunge if there are any sharers of the mail file.
   1.326 +
   1.327 +     If the mail file is NFS-mounted, then flock() locking is a silent
   1.328 +no-op.  This is the way BSD implements flock(), and c-client's
   1.329 +emulation of flock() through fcntl() tests for NFS files and
   1.330 +duplicates this functionality.  There is no locking protection for
   1.331 +tenex/mtx mail files at all, and only protection against the mailer
   1.332 +for bezerk/mmdf mail files.  This has been the accepted state of
   1.333 +affairs on UNIX for many sad years.
   1.334 +
   1.335 +     If you can not create .lock files, it should not affect locking,
   1.336 +since the flock() locks suffice for all protection.  This is, however,
   1.337 +not true if the mailer does not check for flock() locking, or if the
   1.338 +the mail file is NFS-mounted.
   1.339 +
   1.340 +     What this means is that there is *no* locking protection at all
   1.341 +in the case of a client using an NFS-mounted /usr/spool/mail that does
   1.342 +not permit file creation by unprivileged programs.  It is impossible,
   1.343 +under these circumstances, for an unprivileged program to do anything
   1.344 +about it.  Worse, if EACCES errors on .lock file creation are no-op'ed
   1.345 +, the user won't even know about it.  This is arguably a site
   1.346 +configuration error.
   1.347 +
   1.348 +     The problem with not being able to create .lock files exists on
   1.349 +System V as well, but the failure modes for flock() -- which is
   1.350 +implemented via fcntl() -- are different.
   1.351 +
   1.352 +     On System V, if the mail file is NFS-mounted and either the
   1.353 +client or the server lacks a functioning statd/lockd pair, then the
   1.354 +lock attempt would have hung forever if it weren't for the fact that
   1.355 +c-client tests for NFS and no-ops the flock() emulator in this case.
   1.356 +Systemwide or clusterwide failures of statd/lockd have been known to
   1.357 +occur which cause all locks in all processes to hang (including
   1.358 +local?).  Without the special NFS test made by c-client, there would
   1.359 +be no way to request BSD-style no-op behavior, nor is there any way to
   1.360 +determine that this is happening other than the system being hung.
   1.361 +
   1.362 +     The additional locking introduced by c-client was shown to cause
   1.363 +much more stress on the System V locking mechanism than has
   1.364 +traditionally been placed upon it.  If it was stressed too far, all
   1.365 +hell broke loose.  Fortunately, this is now past history.
   1.366 +
   1.367 +TRADEOFFS
   1.368 +
   1.369 +     c-client based applications have a reasonable chance of winning
   1.370 +as long as you don't use NFS for remote access to mail files.  That's
   1.371 +what IMAP is for, after all.  It is, however, very important to
   1.372 +realize that you can *not* use the lock-upgrade feature by itself
   1.373 +because it releases the lock as an interim step -- you need to have
   1.374 +lock-upgrading guarded by another lock.
   1.375 +
   1.376 +     If you have the misfortune of using System V, you are likely to
   1.377 +run into problems sooner or later having to do with statd/lockd.  You
   1.378 +basically end up with one of three unsatisfactory choices:
   1.379 +	1) Grit your teeth and live with it.
   1.380 +	2) Try to make it work:
   1.381 +	   a) avoid NFS access so as not to stress statd/lockd.
   1.382 +	   b) try to understand the code in statd/lockd and hack it
   1.383 +	      to be more robust.
   1.384 +	   c) hunt out the system limit of locks, if there is one,
   1.385 +	      and increase it.  Figure on at least two locks per
   1.386 +	      simultaneous imapd process and four locks per Pine
   1.387 +	      process.  Better yet, make the limit be 10 times the
   1.388 +	      maximum number of processes.
   1.389 +	   d) increase the socket buffer (-S switch to lockd) if
   1.390 +	      it is offered.  I don't know what this actually does,
   1.391 +	      but giving lockd more resources to do its work can't
   1.392 +	      hurt.  Maybe.
   1.393 +	3) Decide that it can't possibly work, and turn off the 
   1.394 +	   fcntl() calls in your program.
   1.395 +	4) If nuking statd/lockd can be done without breaking local
   1.396 +	   locking, then do so.  This would make SVR4 have the same
   1.397 +	   limitations as BSD locking, with a couple of additional
   1.398 +	   bugs.
   1.399 +	5) Check for NFS, and don't do the fcntl() in the NFS case.
   1.400 +	   This is what c-client does.
   1.401 +
   1.402 +     Note that if you are going to use NFS to access files on a server
   1.403 +which does not have statd/lockd running, your only choice is (3), (4),
   1.404 +or (5).  Here again, IMAP can bail you out.
   1.405 +
   1.406 +     These problems aren't unique to c-client applications; they have
   1.407 +also been reported with Elm, Mediamail, and other email tools.
   1.408 +
   1.409 +     Of the other two SVR4 locking bugs:
   1.410 +
   1.411 +     Programmer awareness is necessary to deal with the bug that you
   1.412 +can not get an exclusive lock unless the file is open for write.  I
   1.413 +believe that c-client has fixed all of these cases.
   1.414 +
   1.415 +     The problem about opening a second designator smashing any
   1.416 +current locks on the file has not been addressed satisfactorily yet.
   1.417 +This is not an easy problem to deal with, especially in c-client which
   1.418 +really doesn't know what other files/streams may be open by Pine.
   1.419 +
   1.420 +     Aren't you so happy that you bought an System V system?
author	yuuji@gentei.org
date	Mon, 14 Sep 2009 15:17:45 +0900
parents
children