imapext-2007
diff docs/locking.txt @ 0:ada5e610ab86
imap-2007e
author | yuuji@gentei.org |
---|---|
date | Mon, 14 Sep 2009 15:17:45 +0900 |
parents | |
children |
line diff
1.1 --- /dev/null Thu Jan 01 00:00:00 1970 +0000 1.2 +++ b/docs/locking.txt Mon Sep 14 15:17:45 2009 +0900 1.3 @@ -0,0 +1,417 @@ 1.4 +/* ======================================================================== 1.5 + * Copyright 1988-2006 University of Washington 1.6 + * 1.7 + * Licensed under the Apache License, Version 2.0 (the "License"); 1.8 + * you may not use this file except in compliance with the License. 1.9 + * You may obtain a copy of the License at 1.10 + * 1.11 + * http://www.apache.org/licenses/LICENSE-2.0 1.12 + * 1.13 + * 1.14 + * ======================================================================== 1.15 + */ 1.16 + 1.17 + UNIX Advisory File Locking Implications on c-client 1.18 + Mark Crispin, 28 November 1995 1.19 + 1.20 + 1.21 + THIS DOCUMENT HAS BEEN UPDATED TO REFLECT THE FACT THAT 1.22 + LINUX SUPPORTS BOTH flock() AND fcntl() AND THAT OSF/1 1.23 + HAS BEEN BROKEN SO THAT IT ONLY SUPPORTS fcntl(). 1.24 + -- JUNE 15, 2004 1.25 + 1.26 + THIS DOCUMENT HAS BEEN UPDATED TO REFLECT THE CODE IN THE 1.27 + IMAP-4 TOOLKIT AS OF NOVEMBER 28, 1995. SOME STATEMENTS 1.28 + IN THIS DOCUMENT DO NOT APPLY TO EARLIER VERSIONS OF THE 1.29 + IMAP TOOLKIT. 1.30 + 1.31 +INTRODUCTION 1.32 + 1.33 + Advisory locking is a mechanism by which cooperating processes 1.34 +can signal to each other their usage of a resource and whether or not 1.35 +that usage is critical. It is not a mechanism to protect against 1.36 +processes which do not cooperate in the locking. 1.37 + 1.38 + The most basic form of locking involves a counter. This counter 1.39 +is -1 when the resource is available. If a process wants the lock, it 1.40 +executes an atomic increment-and-test-if-zero. If the value is zero, 1.41 +the process has the lock and can execute the critical code that needs 1.42 +exclusive usage of a resource. When it is finished, it sets the lock 1.43 +back to -1. In C terms: 1.44 + 1.45 + while (++lock) /* try to get lock */ 1.46 + invoke_other_threads (); /* failed, try again */ 1.47 + . 1.48 + . /* critical code here */ 1.49 + . 1.50 + lock = -1; /* release lock */ 1.51 + 1.52 + This particular form of locking appears most commonly in 1.53 +multi-threaded applications such as operating system kernels. It 1.54 +makes several presumptions: 1.55 + (1) it is alright to keep testing the lock (no overflow) 1.56 + (2) the critical resource is single-access only 1.57 + (3) there is shared writeable memory between the two threads 1.58 + (4) the threads can be trusted to release the lock when finished 1.59 + 1.60 + In applications programming on multi-user systems, most commonly 1.61 +the other threads are in an entirely different process, which may even 1.62 +be logged in as a different user. Few operating systems offer shared 1.63 +writeable memory between such processes. 1.64 + 1.65 + A means of communicating this is by use of a file with a mutually 1.66 +agreed upon name. A binary semaphore can be passed by means of the 1.67 +existance or non-existance of that file, provided that there is an 1.68 +atomic means to create a file if and only if that file does not exist. 1.69 +In C terms: 1.70 + 1.71 + /* try to get lock */ 1.72 + while ((fd = open ("lockfile",O_WRONLY|O_CREAT|O_EXCL,0666)) < 0) 1.73 + sleep (1); /* failed, try again */ 1.74 + close (fd); /* got the lock */ 1.75 + . 1.76 + . /* critical code here */ 1.77 + . 1.78 + unlink ("lockfile"); /* release lock */ 1.79 + 1.80 + This form of locking makes fewer presumptions, but it still is 1.81 +guilty of presumptions (2) and (4) above. Presumption (2) limits the 1.82 +ability to have processes sharing a resource in a non-conflicting 1.83 +fashion (e.g. reading from a file). Presumption (4) leads to 1.84 +deadlocks should the process crash while it has a resource locked. 1.85 + 1.86 + Most modern operating systems provide a resource locking system 1.87 +call that has none of these presumptions. In particular, a mechanism 1.88 +is provided for identifying shared locks as opposed to exclusive 1.89 +locks. A shared lock permits other processes to obtain a shared lock, 1.90 +but denies exclusive locks. In other words: 1.91 + 1.92 + current state want shared want exclusive 1.93 + ------------- ----------- -------------- 1.94 + unlocked YES YES 1.95 + locked shared YES NO 1.96 + locked exclusive NO NO 1.97 + 1.98 + Furthermore, the operating system automatically relinquishes all 1.99 +locks held by that process when it terminates. 1.100 + 1.101 + A useful operation is the ability to upgrade a shared lock to 1.102 +exclusive (provided there are no other shared users of the lock) and 1.103 +to downgrade an exclusive lock to shared. It is important that at no 1.104 +time is the lock ever removed; a process upgrading to exclusive must 1.105 +not relenquish its shared lock. 1.106 + 1.107 + Most commonly, the resources being locked are files. Shared 1.108 +locks are particularly important with files; multiple simultaneous 1.109 +processes can read from a file, but only one can safely write at a 1.110 +time. Some writes may be safer than others; an append to the end of 1.111 +the file is safer than changing existing file data. In turn, changing 1.112 +a file record in place is safer than rewriting the file with an 1.113 +entirely different structure. 1.114 + 1.115 + 1.116 +FILE LOCKING ON UNIX 1.117 + 1.118 + In the oldest versions of UNIX, the use of a semaphore lockfile 1.119 +was the only available form of locking. Advisory locking system calls 1.120 +were not added to UNIX until after the BSD vs. System V split. Both 1.121 +of these system calls deal with file resources only. 1.122 + 1.123 + Most systems only have one or the other form of locking. AIX 1.124 +and newer versions of OSF/1 emulate the BSD form of locking as a jacket 1.125 +into the System V form. Ultrix and Linux implement both forms. 1.126 + 1.127 +BSD 1.128 + 1.129 + BSD added the flock() system call. It offers capabilities to 1.130 +acquire shared lock, acquire exclusive lock, and unlock. Optionally, 1.131 +the process can request an immediate error return instead of blocking 1.132 +when the lock is unavailable. 1.133 + 1.134 + 1.135 +FLOCK() BUGS 1.136 + 1.137 + flock() advertises that it permits upgrading of shared locks to 1.138 +exclusive and downgrading of exclusive locks to shared, but it does so 1.139 +by releasing the former lock and then trying to acquire the new lock. 1.140 +This creates a window of vulnerability in which another process can 1.141 +grab the exclusive lock. Therefore, this capability is not useful, 1.142 +although many programmers have been deluded by incautious reading of 1.143 +the flock() man page to believe otherwise. This problem can be 1.144 +programmed around, once the programmer is aware of it. 1.145 + 1.146 + flock() always returns as if it succeeded on NFS files, when in 1.147 +fact it is a no-op. There is no way around this. 1.148 + 1.149 + Leaving aside these two problems, flock() works remarkably well, 1.150 +and has shown itself to be robust and trustworthy. 1.151 + 1.152 +SYSTEM V/POSIX 1.153 + 1.154 + System V added new functions to the fnctl() system call, and a 1.155 +simple interface through the lockf() subroutine. This was 1.156 +subsequently included in POSIX. Both offer the facility to apply the 1.157 +lock to a particular region of the file instead of to the entire file. 1.158 +lockf() only supports exclusive locks, and calls fcntl() internally; 1.159 +hence it won't be discussed further. 1.160 + 1.161 + Functionally, fcntl() locking is a superset of flock(); it is 1.162 +possible to implement a flock() emulator using fcntl(), with one minor 1.163 +exception: it is not possible to acquire an exclusive lock if the file 1.164 +is not open for write. 1.165 + 1.166 + The fcntl() locking functions are: query lock station of a file 1.167 +region, lock/unlock a region, and lock/unlock a region and block until 1.168 +have the lock. The locks may be shared or exclusive. By means of the 1.169 +statd and lockd daemons, fcntl() locking is available on NFS files. 1.170 + 1.171 + When statd is started at system boot, it reads its /etc/state 1.172 +file (which contains the number of times it has been invoked) and 1.173 +/etc/sm directory (which contains a list of all remote sites which are 1.174 +client or server locking with this site), and notifies the statd on 1.175 +each of these systems that it has been restarted. Each statd then 1.176 +notifies the local lockd of the restart of that system. 1.177 + 1.178 + lockd receives fcntl() requests for NFS files. It communicates 1.179 +with the lockd at the server and requests it to apply the lock, and 1.180 +with the statd to request it for notification when the server goes 1.181 +down. It blocks until all these requests are completed. 1.182 + 1.183 + There is quite a mythos about fcntl() locking. 1.184 + 1.185 + One religion holds that fcntl() locking is the best thing since 1.186 +sliced bread, and that programs which use flock() should be converted 1.187 +to fcntl() so that NFS locking will work. However, as noted above, 1.188 +very few systems support both calls, so such an exercise is pointless 1.189 +except on Ultrix and Linux. 1.190 + 1.191 + Another religion, which I adhere to, has the opposite viewpoint. 1.192 + 1.193 + 1.194 +FCNTL() BUGS 1.195 + 1.196 + For all of the hairy code to do individual section locking of a 1.197 +file, it's clear that the designers of fcntl() locking never 1.198 +considered some very basic locking operations. It's as if all they 1.199 +knew about locking they got out of some CS textbook with not 1.200 +investigation of real-world needs. 1.201 + 1.202 + It is not possible to acquire an exclusive lock unless the file 1.203 +is open for write. You could have append with shared read, and thus 1.204 +you could have a case in which a read-only access may need to go 1.205 +exclusive. This problem can be programmed around once the programmer 1.206 +is aware of it. 1.207 + 1.208 + If the file is opened on another file designator in the same 1.209 +process, the file is unlocked even if no attempt is made to do any 1.210 +form of locking on the second designator. This is a very bad bug. It 1.211 +means that an application must keep track of all the files that it has 1.212 +opened and locked. 1.213 + 1.214 + If there is no statd/lockd on the NFS server, fcntl() will hang 1.215 +forever waiting for them to appear. This is a bad bug. It means that 1.216 +any attempt to lock on a server that doesn't run these daemons will 1.217 +hang. There is no way for an application to request flock() style 1.218 +``try to lock, but no-op if the mechanism ain't there''. 1.219 + 1.220 + There is a rumor to the effect that fcntl() will hang forever on 1.221 +local files too if there is no local statd/lockd. These daemons are 1.222 +running on mailer.u, although they appear not to have much CPU time. 1.223 +A useful experiment would be to kill them and see if imapd is affected 1.224 +in any way, but I decline to do so without an OK from UCS! ;-) If 1.225 +killing statd/lockd can be done without breaking fcntl() on local 1.226 +files, this would become one of the primary means of dealing with this 1.227 +problem. 1.228 + 1.229 + The statd and lockd daemons have quite a reputation for extreme 1.230 +fragility. There have been numerous reports about the locking 1.231 +mechanism being wedged on a systemwide or even clusterwide basis, 1.232 +requiring a reboot to clear. It is rumored that this wedge, once it 1.233 +happens, also blocks local locking. Presumably killing and restarting 1.234 +statd would suffice to clear the wedge, but I haven't verified this. 1.235 + 1.236 + There appears to be a limit to how many locks may be in use at a 1.237 +time on the system, although the documentation only mentions it in 1.238 +passing. On some of their systems, UCS has increased lockd's ``size 1.239 +of the socket buffer'', whatever that means. 1.240 + 1.241 +C-CLIENT USAGE 1.242 + 1.243 + c-client uses flock(). On System V systems, flock() is simulated 1.244 +by an emulator that calls fcntl(). 1.245 + 1.246 + 1.247 +BEZERK AND MMDF 1.248 + 1.249 + Locking in the traditional UNIX formats was largely dictated by 1.250 +the status quo in other applications; however, additional protection 1.251 +is added against inadvertantly running multiple instances of a 1.252 +c-client application on the same mail file. 1.253 + 1.254 + (1) c-client attempts to create a .lock file (mail file name with 1.255 +``.lock'' appended) whenever it reads from, or writes to, the mail 1.256 +file. This is an exclusive lock, and is held only for short periods 1.257 +of time while c-client is actually doing the I/O. There is a 5-minute 1.258 +timeout for this lock, after which it is broken on the presumption 1.259 +that it is a stale lock. If it can not create the .lock file due to 1.260 +an EACCES (protection failure) error, it once silently proceeded 1.261 +without this lock; this was for systems which protect /usr/spool/mail 1.262 +from unprivileged processes creating files. Today, c-client reports 1.263 +an error unless it is built otherwise. The purpose of this lock is to 1.264 +prevent against unfavorable interactions with mail delivery. 1.265 + 1.266 + (2) c-client applies a shared flock() to the mail file whenever 1.267 +it reads from the mail file, and an exclusive flock() whenever it 1.268 +writes to the mail file. This lock is freed as soon as it finishes 1.269 +reading. The purpose of this lock is to prevent against unfavorable 1.270 +interactions with mail delivery. 1.271 + 1.272 + (3) c-client applies an exclusive flock() to a file on /tmp 1.273 +(whose name represents the device and inode number of the file) when 1.274 +it opens the mail file. This lock is maintained throughout the 1.275 +session, although c-client has a feature (called ``kiss of death'') 1.276 +which permits c-client to forcibly and irreversibly seize the lock 1.277 +from a cooperating c-client application that surrenders the lock on 1.278 +demand. The purpose of this lock is to prevent against unfavorable 1.279 +interactions with other instances of c-client (rewriting the mail 1.280 +file). 1.281 + 1.282 + Mail delivery daemons use lock (1), (2), or both. Lock (1) works 1.283 +over NFS; lock (2) is the only one that works on sites that protect 1.284 +/usr/spool/mail against unprivileged file creation. Prudent mail 1.285 +delivery daemons use both forms of locking, and of course so does 1.286 +c-client. 1.287 + 1.288 + If only lock (2) is used, then multiple processes can read from 1.289 +the mail file simultaneously, although in real life this doesn't 1.290 +really change things. The normal state of locks (1) and (2) is 1.291 +unlocked except for very brief periods. 1.292 + 1.293 + 1.294 +TENEX AND MTX 1.295 + 1.296 + The design of the locking mechanism of these formats was 1.297 +motivated by a design to enable multiple simultaneous read/write 1.298 +access. It is almost the reverse of how locking works with 1.299 +bezerk/mmdf. 1.300 + 1.301 + (1) c-client applies a shared flock() to the mail file when it 1.302 +opens the mail file. It upgrades this lock to exclusive whenever it 1.303 +tries to expunge the mail file. Because of the flock() bug that 1.304 +upgrading a lock actually releases it, it will not do so until it has 1.305 +acquired an exclusive lock (2) first. The purpose of this lock is to 1.306 +prevent against expunge taking place while some other c-client has the 1.307 +mail file open (and thus knows where all the messages are). 1.308 + 1.309 + (2) c-client applies a shared flock() to a file on /tmp (whose 1.310 +name represents the device and inode number of the file) when it 1.311 +parses the mail file. It applies an exclusive flock() to this file 1.312 +when it appends new mail to the mail file, as well as before it 1.313 +attempts to upgrade lock (1) to exclusive. The purpose of this lock 1.314 +is to prevent against data being appended while some other c-client is 1.315 +parsing mail in the file (to prevent reading of incomplete messages). 1.316 +It also protects against the lock-releasing timing race on lock (1). 1.317 + 1.318 +OBSERVATIONS 1.319 + 1.320 + In a perfect world, locking works. You are protected against 1.321 +unfavorable interactions with the mailer and against your own mistake 1.322 +by running more than one instance of your mail reader. In tenex/mtx 1.323 +formats, you have the additional benefit that multiple simultaneous 1.324 +read/write access works, with the sole restriction being that you 1.325 +can't expunge if there are any sharers of the mail file. 1.326 + 1.327 + If the mail file is NFS-mounted, then flock() locking is a silent 1.328 +no-op. This is the way BSD implements flock(), and c-client's 1.329 +emulation of flock() through fcntl() tests for NFS files and 1.330 +duplicates this functionality. There is no locking protection for 1.331 +tenex/mtx mail files at all, and only protection against the mailer 1.332 +for bezerk/mmdf mail files. This has been the accepted state of 1.333 +affairs on UNIX for many sad years. 1.334 + 1.335 + If you can not create .lock files, it should not affect locking, 1.336 +since the flock() locks suffice for all protection. This is, however, 1.337 +not true if the mailer does not check for flock() locking, or if the 1.338 +the mail file is NFS-mounted. 1.339 + 1.340 + What this means is that there is *no* locking protection at all 1.341 +in the case of a client using an NFS-mounted /usr/spool/mail that does 1.342 +not permit file creation by unprivileged programs. It is impossible, 1.343 +under these circumstances, for an unprivileged program to do anything 1.344 +about it. Worse, if EACCES errors on .lock file creation are no-op'ed 1.345 +, the user won't even know about it. This is arguably a site 1.346 +configuration error. 1.347 + 1.348 + The problem with not being able to create .lock files exists on 1.349 +System V as well, but the failure modes for flock() -- which is 1.350 +implemented via fcntl() -- are different. 1.351 + 1.352 + On System V, if the mail file is NFS-mounted and either the 1.353 +client or the server lacks a functioning statd/lockd pair, then the 1.354 +lock attempt would have hung forever if it weren't for the fact that 1.355 +c-client tests for NFS and no-ops the flock() emulator in this case. 1.356 +Systemwide or clusterwide failures of statd/lockd have been known to 1.357 +occur which cause all locks in all processes to hang (including 1.358 +local?). Without the special NFS test made by c-client, there would 1.359 +be no way to request BSD-style no-op behavior, nor is there any way to 1.360 +determine that this is happening other than the system being hung. 1.361 + 1.362 + The additional locking introduced by c-client was shown to cause 1.363 +much more stress on the System V locking mechanism than has 1.364 +traditionally been placed upon it. If it was stressed too far, all 1.365 +hell broke loose. Fortunately, this is now past history. 1.366 + 1.367 +TRADEOFFS 1.368 + 1.369 + c-client based applications have a reasonable chance of winning 1.370 +as long as you don't use NFS for remote access to mail files. That's 1.371 +what IMAP is for, after all. It is, however, very important to 1.372 +realize that you can *not* use the lock-upgrade feature by itself 1.373 +because it releases the lock as an interim step -- you need to have 1.374 +lock-upgrading guarded by another lock. 1.375 + 1.376 + If you have the misfortune of using System V, you are likely to 1.377 +run into problems sooner or later having to do with statd/lockd. You 1.378 +basically end up with one of three unsatisfactory choices: 1.379 + 1) Grit your teeth and live with it. 1.380 + 2) Try to make it work: 1.381 + a) avoid NFS access so as not to stress statd/lockd. 1.382 + b) try to understand the code in statd/lockd and hack it 1.383 + to be more robust. 1.384 + c) hunt out the system limit of locks, if there is one, 1.385 + and increase it. Figure on at least two locks per 1.386 + simultaneous imapd process and four locks per Pine 1.387 + process. Better yet, make the limit be 10 times the 1.388 + maximum number of processes. 1.389 + d) increase the socket buffer (-S switch to lockd) if 1.390 + it is offered. I don't know what this actually does, 1.391 + but giving lockd more resources to do its work can't 1.392 + hurt. Maybe. 1.393 + 3) Decide that it can't possibly work, and turn off the 1.394 + fcntl() calls in your program. 1.395 + 4) If nuking statd/lockd can be done without breaking local 1.396 + locking, then do so. This would make SVR4 have the same 1.397 + limitations as BSD locking, with a couple of additional 1.398 + bugs. 1.399 + 5) Check for NFS, and don't do the fcntl() in the NFS case. 1.400 + This is what c-client does. 1.401 + 1.402 + Note that if you are going to use NFS to access files on a server 1.403 +which does not have statd/lockd running, your only choice is (3), (4), 1.404 +or (5). Here again, IMAP can bail you out. 1.405 + 1.406 + These problems aren't unique to c-client applications; they have 1.407 +also been reported with Elm, Mediamail, and other email tools. 1.408 + 1.409 + Of the other two SVR4 locking bugs: 1.410 + 1.411 + Programmer awareness is necessary to deal with the bug that you 1.412 +can not get an exclusive lock unless the file is open for write. I 1.413 +believe that c-client has fixed all of these cases. 1.414 + 1.415 + The problem about opening a second designator smashing any 1.416 +current locks on the file has not been addressed satisfactorily yet. 1.417 +This is not an easy problem to deal with, especially in c-client which 1.418 +really doesn't know what other files/streams may be open by Pine. 1.419 + 1.420 + Aren't you so happy that you bought an System V system?