imapext-2007

annotate docs/locking.txt @ 0:ada5e610ab86

imap-2007e
author yuuji@gentei.org
date Mon, 14 Sep 2009 15:17:45 +0900
parents
children
rev   line source
yuuji@0 1 /* ========================================================================
yuuji@0 2 * Copyright 1988-2006 University of Washington
yuuji@0 3 *
yuuji@0 4 * Licensed under the Apache License, Version 2.0 (the "License");
yuuji@0 5 * you may not use this file except in compliance with the License.
yuuji@0 6 * You may obtain a copy of the License at
yuuji@0 7 *
yuuji@0 8 * http://www.apache.org/licenses/LICENSE-2.0
yuuji@0 9 *
yuuji@0 10 *
yuuji@0 11 * ========================================================================
yuuji@0 12 */
yuuji@0 13
yuuji@0 14 UNIX Advisory File Locking Implications on c-client
yuuji@0 15 Mark Crispin, 28 November 1995
yuuji@0 16
yuuji@0 17
yuuji@0 18 THIS DOCUMENT HAS BEEN UPDATED TO REFLECT THE FACT THAT
yuuji@0 19 LINUX SUPPORTS BOTH flock() AND fcntl() AND THAT OSF/1
yuuji@0 20 HAS BEEN BROKEN SO THAT IT ONLY SUPPORTS fcntl().
yuuji@0 21 -- JUNE 15, 2004
yuuji@0 22
yuuji@0 23 THIS DOCUMENT HAS BEEN UPDATED TO REFLECT THE CODE IN THE
yuuji@0 24 IMAP-4 TOOLKIT AS OF NOVEMBER 28, 1995. SOME STATEMENTS
yuuji@0 25 IN THIS DOCUMENT DO NOT APPLY TO EARLIER VERSIONS OF THE
yuuji@0 26 IMAP TOOLKIT.
yuuji@0 27
yuuji@0 28 INTRODUCTION
yuuji@0 29
yuuji@0 30 Advisory locking is a mechanism by which cooperating processes
yuuji@0 31 can signal to each other their usage of a resource and whether or not
yuuji@0 32 that usage is critical. It is not a mechanism to protect against
yuuji@0 33 processes which do not cooperate in the locking.
yuuji@0 34
yuuji@0 35 The most basic form of locking involves a counter. This counter
yuuji@0 36 is -1 when the resource is available. If a process wants the lock, it
yuuji@0 37 executes an atomic increment-and-test-if-zero. If the value is zero,
yuuji@0 38 the process has the lock and can execute the critical code that needs
yuuji@0 39 exclusive usage of a resource. When it is finished, it sets the lock
yuuji@0 40 back to -1. In C terms:
yuuji@0 41
yuuji@0 42 while (++lock) /* try to get lock */
yuuji@0 43 invoke_other_threads (); /* failed, try again */
yuuji@0 44 .
yuuji@0 45 . /* critical code here */
yuuji@0 46 .
yuuji@0 47 lock = -1; /* release lock */
yuuji@0 48
yuuji@0 49 This particular form of locking appears most commonly in
yuuji@0 50 multi-threaded applications such as operating system kernels. It
yuuji@0 51 makes several presumptions:
yuuji@0 52 (1) it is alright to keep testing the lock (no overflow)
yuuji@0 53 (2) the critical resource is single-access only
yuuji@0 54 (3) there is shared writeable memory between the two threads
yuuji@0 55 (4) the threads can be trusted to release the lock when finished
yuuji@0 56
yuuji@0 57 In applications programming on multi-user systems, most commonly
yuuji@0 58 the other threads are in an entirely different process, which may even
yuuji@0 59 be logged in as a different user. Few operating systems offer shared
yuuji@0 60 writeable memory between such processes.
yuuji@0 61
yuuji@0 62 A means of communicating this is by use of a file with a mutually
yuuji@0 63 agreed upon name. A binary semaphore can be passed by means of the
yuuji@0 64 existance or non-existance of that file, provided that there is an
yuuji@0 65 atomic means to create a file if and only if that file does not exist.
yuuji@0 66 In C terms:
yuuji@0 67
yuuji@0 68 /* try to get lock */
yuuji@0 69 while ((fd = open ("lockfile",O_WRONLY|O_CREAT|O_EXCL,0666)) < 0)
yuuji@0 70 sleep (1); /* failed, try again */
yuuji@0 71 close (fd); /* got the lock */
yuuji@0 72 .
yuuji@0 73 . /* critical code here */
yuuji@0 74 .
yuuji@0 75 unlink ("lockfile"); /* release lock */
yuuji@0 76
yuuji@0 77 This form of locking makes fewer presumptions, but it still is
yuuji@0 78 guilty of presumptions (2) and (4) above. Presumption (2) limits the
yuuji@0 79 ability to have processes sharing a resource in a non-conflicting
yuuji@0 80 fashion (e.g. reading from a file). Presumption (4) leads to
yuuji@0 81 deadlocks should the process crash while it has a resource locked.
yuuji@0 82
yuuji@0 83 Most modern operating systems provide a resource locking system
yuuji@0 84 call that has none of these presumptions. In particular, a mechanism
yuuji@0 85 is provided for identifying shared locks as opposed to exclusive
yuuji@0 86 locks. A shared lock permits other processes to obtain a shared lock,
yuuji@0 87 but denies exclusive locks. In other words:
yuuji@0 88
yuuji@0 89 current state want shared want exclusive
yuuji@0 90 ------------- ----------- --------------
yuuji@0 91 unlocked YES YES
yuuji@0 92 locked shared YES NO
yuuji@0 93 locked exclusive NO NO
yuuji@0 94
yuuji@0 95 Furthermore, the operating system automatically relinquishes all
yuuji@0 96 locks held by that process when it terminates.
yuuji@0 97
yuuji@0 98 A useful operation is the ability to upgrade a shared lock to
yuuji@0 99 exclusive (provided there are no other shared users of the lock) and
yuuji@0 100 to downgrade an exclusive lock to shared. It is important that at no
yuuji@0 101 time is the lock ever removed; a process upgrading to exclusive must
yuuji@0 102 not relenquish its shared lock.
yuuji@0 103
yuuji@0 104 Most commonly, the resources being locked are files. Shared
yuuji@0 105 locks are particularly important with files; multiple simultaneous
yuuji@0 106 processes can read from a file, but only one can safely write at a
yuuji@0 107 time. Some writes may be safer than others; an append to the end of
yuuji@0 108 the file is safer than changing existing file data. In turn, changing
yuuji@0 109 a file record in place is safer than rewriting the file with an
yuuji@0 110 entirely different structure.
yuuji@0 111
yuuji@0 112
yuuji@0 113 FILE LOCKING ON UNIX
yuuji@0 114
yuuji@0 115 In the oldest versions of UNIX, the use of a semaphore lockfile
yuuji@0 116 was the only available form of locking. Advisory locking system calls
yuuji@0 117 were not added to UNIX until after the BSD vs. System V split. Both
yuuji@0 118 of these system calls deal with file resources only.
yuuji@0 119
yuuji@0 120 Most systems only have one or the other form of locking. AIX
yuuji@0 121 and newer versions of OSF/1 emulate the BSD form of locking as a jacket
yuuji@0 122 into the System V form. Ultrix and Linux implement both forms.
yuuji@0 123
yuuji@0 124 BSD
yuuji@0 125
yuuji@0 126 BSD added the flock() system call. It offers capabilities to
yuuji@0 127 acquire shared lock, acquire exclusive lock, and unlock. Optionally,
yuuji@0 128 the process can request an immediate error return instead of blocking
yuuji@0 129 when the lock is unavailable.
yuuji@0 130
yuuji@0 131
yuuji@0 132 FLOCK() BUGS
yuuji@0 133
yuuji@0 134 flock() advertises that it permits upgrading of shared locks to
yuuji@0 135 exclusive and downgrading of exclusive locks to shared, but it does so
yuuji@0 136 by releasing the former lock and then trying to acquire the new lock.
yuuji@0 137 This creates a window of vulnerability in which another process can
yuuji@0 138 grab the exclusive lock. Therefore, this capability is not useful,
yuuji@0 139 although many programmers have been deluded by incautious reading of
yuuji@0 140 the flock() man page to believe otherwise. This problem can be
yuuji@0 141 programmed around, once the programmer is aware of it.
yuuji@0 142
yuuji@0 143 flock() always returns as if it succeeded on NFS files, when in
yuuji@0 144 fact it is a no-op. There is no way around this.
yuuji@0 145
yuuji@0 146 Leaving aside these two problems, flock() works remarkably well,
yuuji@0 147 and has shown itself to be robust and trustworthy.
yuuji@0 148
yuuji@0 149 SYSTEM V/POSIX
yuuji@0 150
yuuji@0 151 System V added new functions to the fnctl() system call, and a
yuuji@0 152 simple interface through the lockf() subroutine. This was
yuuji@0 153 subsequently included in POSIX. Both offer the facility to apply the
yuuji@0 154 lock to a particular region of the file instead of to the entire file.
yuuji@0 155 lockf() only supports exclusive locks, and calls fcntl() internally;
yuuji@0 156 hence it won't be discussed further.
yuuji@0 157
yuuji@0 158 Functionally, fcntl() locking is a superset of flock(); it is
yuuji@0 159 possible to implement a flock() emulator using fcntl(), with one minor
yuuji@0 160 exception: it is not possible to acquire an exclusive lock if the file
yuuji@0 161 is not open for write.
yuuji@0 162
yuuji@0 163 The fcntl() locking functions are: query lock station of a file
yuuji@0 164 region, lock/unlock a region, and lock/unlock a region and block until
yuuji@0 165 have the lock. The locks may be shared or exclusive. By means of the
yuuji@0 166 statd and lockd daemons, fcntl() locking is available on NFS files.
yuuji@0 167
yuuji@0 168 When statd is started at system boot, it reads its /etc/state
yuuji@0 169 file (which contains the number of times it has been invoked) and
yuuji@0 170 /etc/sm directory (which contains a list of all remote sites which are
yuuji@0 171 client or server locking with this site), and notifies the statd on
yuuji@0 172 each of these systems that it has been restarted. Each statd then
yuuji@0 173 notifies the local lockd of the restart of that system.
yuuji@0 174
yuuji@0 175 lockd receives fcntl() requests for NFS files. It communicates
yuuji@0 176 with the lockd at the server and requests it to apply the lock, and
yuuji@0 177 with the statd to request it for notification when the server goes
yuuji@0 178 down. It blocks until all these requests are completed.
yuuji@0 179
yuuji@0 180 There is quite a mythos about fcntl() locking.
yuuji@0 181
yuuji@0 182 One religion holds that fcntl() locking is the best thing since
yuuji@0 183 sliced bread, and that programs which use flock() should be converted
yuuji@0 184 to fcntl() so that NFS locking will work. However, as noted above,
yuuji@0 185 very few systems support both calls, so such an exercise is pointless
yuuji@0 186 except on Ultrix and Linux.
yuuji@0 187
yuuji@0 188 Another religion, which I adhere to, has the opposite viewpoint.
yuuji@0 189
yuuji@0 190
yuuji@0 191 FCNTL() BUGS
yuuji@0 192
yuuji@0 193 For all of the hairy code to do individual section locking of a
yuuji@0 194 file, it's clear that the designers of fcntl() locking never
yuuji@0 195 considered some very basic locking operations. It's as if all they
yuuji@0 196 knew about locking they got out of some CS textbook with not
yuuji@0 197 investigation of real-world needs.
yuuji@0 198
yuuji@0 199 It is not possible to acquire an exclusive lock unless the file
yuuji@0 200 is open for write. You could have append with shared read, and thus
yuuji@0 201 you could have a case in which a read-only access may need to go
yuuji@0 202 exclusive. This problem can be programmed around once the programmer
yuuji@0 203 is aware of it.
yuuji@0 204
yuuji@0 205 If the file is opened on another file designator in the same
yuuji@0 206 process, the file is unlocked even if no attempt is made to do any
yuuji@0 207 form of locking on the second designator. This is a very bad bug. It
yuuji@0 208 means that an application must keep track of all the files that it has
yuuji@0 209 opened and locked.
yuuji@0 210
yuuji@0 211 If there is no statd/lockd on the NFS server, fcntl() will hang
yuuji@0 212 forever waiting for them to appear. This is a bad bug. It means that
yuuji@0 213 any attempt to lock on a server that doesn't run these daemons will
yuuji@0 214 hang. There is no way for an application to request flock() style
yuuji@0 215 ``try to lock, but no-op if the mechanism ain't there''.
yuuji@0 216
yuuji@0 217 There is a rumor to the effect that fcntl() will hang forever on
yuuji@0 218 local files too if there is no local statd/lockd. These daemons are
yuuji@0 219 running on mailer.u, although they appear not to have much CPU time.
yuuji@0 220 A useful experiment would be to kill them and see if imapd is affected
yuuji@0 221 in any way, but I decline to do so without an OK from UCS! ;-) If
yuuji@0 222 killing statd/lockd can be done without breaking fcntl() on local
yuuji@0 223 files, this would become one of the primary means of dealing with this
yuuji@0 224 problem.
yuuji@0 225
yuuji@0 226 The statd and lockd daemons have quite a reputation for extreme
yuuji@0 227 fragility. There have been numerous reports about the locking
yuuji@0 228 mechanism being wedged on a systemwide or even clusterwide basis,
yuuji@0 229 requiring a reboot to clear. It is rumored that this wedge, once it
yuuji@0 230 happens, also blocks local locking. Presumably killing and restarting
yuuji@0 231 statd would suffice to clear the wedge, but I haven't verified this.
yuuji@0 232
yuuji@0 233 There appears to be a limit to how many locks may be in use at a
yuuji@0 234 time on the system, although the documentation only mentions it in
yuuji@0 235 passing. On some of their systems, UCS has increased lockd's ``size
yuuji@0 236 of the socket buffer'', whatever that means.
yuuji@0 237
yuuji@0 238 C-CLIENT USAGE
yuuji@0 239
yuuji@0 240 c-client uses flock(). On System V systems, flock() is simulated
yuuji@0 241 by an emulator that calls fcntl().
yuuji@0 242
yuuji@0 243
yuuji@0 244 BEZERK AND MMDF
yuuji@0 245
yuuji@0 246 Locking in the traditional UNIX formats was largely dictated by
yuuji@0 247 the status quo in other applications; however, additional protection
yuuji@0 248 is added against inadvertantly running multiple instances of a
yuuji@0 249 c-client application on the same mail file.
yuuji@0 250
yuuji@0 251 (1) c-client attempts to create a .lock file (mail file name with
yuuji@0 252 ``.lock'' appended) whenever it reads from, or writes to, the mail
yuuji@0 253 file. This is an exclusive lock, and is held only for short periods
yuuji@0 254 of time while c-client is actually doing the I/O. There is a 5-minute
yuuji@0 255 timeout for this lock, after which it is broken on the presumption
yuuji@0 256 that it is a stale lock. If it can not create the .lock file due to
yuuji@0 257 an EACCES (protection failure) error, it once silently proceeded
yuuji@0 258 without this lock; this was for systems which protect /usr/spool/mail
yuuji@0 259 from unprivileged processes creating files. Today, c-client reports
yuuji@0 260 an error unless it is built otherwise. The purpose of this lock is to
yuuji@0 261 prevent against unfavorable interactions with mail delivery.
yuuji@0 262
yuuji@0 263 (2) c-client applies a shared flock() to the mail file whenever
yuuji@0 264 it reads from the mail file, and an exclusive flock() whenever it
yuuji@0 265 writes to the mail file. This lock is freed as soon as it finishes
yuuji@0 266 reading. The purpose of this lock is to prevent against unfavorable
yuuji@0 267 interactions with mail delivery.
yuuji@0 268
yuuji@0 269 (3) c-client applies an exclusive flock() to a file on /tmp
yuuji@0 270 (whose name represents the device and inode number of the file) when
yuuji@0 271 it opens the mail file. This lock is maintained throughout the
yuuji@0 272 session, although c-client has a feature (called ``kiss of death'')
yuuji@0 273 which permits c-client to forcibly and irreversibly seize the lock
yuuji@0 274 from a cooperating c-client application that surrenders the lock on
yuuji@0 275 demand. The purpose of this lock is to prevent against unfavorable
yuuji@0 276 interactions with other instances of c-client (rewriting the mail
yuuji@0 277 file).
yuuji@0 278
yuuji@0 279 Mail delivery daemons use lock (1), (2), or both. Lock (1) works
yuuji@0 280 over NFS; lock (2) is the only one that works on sites that protect
yuuji@0 281 /usr/spool/mail against unprivileged file creation. Prudent mail
yuuji@0 282 delivery daemons use both forms of locking, and of course so does
yuuji@0 283 c-client.
yuuji@0 284
yuuji@0 285 If only lock (2) is used, then multiple processes can read from
yuuji@0 286 the mail file simultaneously, although in real life this doesn't
yuuji@0 287 really change things. The normal state of locks (1) and (2) is
yuuji@0 288 unlocked except for very brief periods.
yuuji@0 289
yuuji@0 290
yuuji@0 291 TENEX AND MTX
yuuji@0 292
yuuji@0 293 The design of the locking mechanism of these formats was
yuuji@0 294 motivated by a design to enable multiple simultaneous read/write
yuuji@0 295 access. It is almost the reverse of how locking works with
yuuji@0 296 bezerk/mmdf.
yuuji@0 297
yuuji@0 298 (1) c-client applies a shared flock() to the mail file when it
yuuji@0 299 opens the mail file. It upgrades this lock to exclusive whenever it
yuuji@0 300 tries to expunge the mail file. Because of the flock() bug that
yuuji@0 301 upgrading a lock actually releases it, it will not do so until it has
yuuji@0 302 acquired an exclusive lock (2) first. The purpose of this lock is to
yuuji@0 303 prevent against expunge taking place while some other c-client has the
yuuji@0 304 mail file open (and thus knows where all the messages are).
yuuji@0 305
yuuji@0 306 (2) c-client applies a shared flock() to a file on /tmp (whose
yuuji@0 307 name represents the device and inode number of the file) when it
yuuji@0 308 parses the mail file. It applies an exclusive flock() to this file
yuuji@0 309 when it appends new mail to the mail file, as well as before it
yuuji@0 310 attempts to upgrade lock (1) to exclusive. The purpose of this lock
yuuji@0 311 is to prevent against data being appended while some other c-client is
yuuji@0 312 parsing mail in the file (to prevent reading of incomplete messages).
yuuji@0 313 It also protects against the lock-releasing timing race on lock (1).
yuuji@0 314
yuuji@0 315 OBSERVATIONS
yuuji@0 316
yuuji@0 317 In a perfect world, locking works. You are protected against
yuuji@0 318 unfavorable interactions with the mailer and against your own mistake
yuuji@0 319 by running more than one instance of your mail reader. In tenex/mtx
yuuji@0 320 formats, you have the additional benefit that multiple simultaneous
yuuji@0 321 read/write access works, with the sole restriction being that you
yuuji@0 322 can't expunge if there are any sharers of the mail file.
yuuji@0 323
yuuji@0 324 If the mail file is NFS-mounted, then flock() locking is a silent
yuuji@0 325 no-op. This is the way BSD implements flock(), and c-client's
yuuji@0 326 emulation of flock() through fcntl() tests for NFS files and
yuuji@0 327 duplicates this functionality. There is no locking protection for
yuuji@0 328 tenex/mtx mail files at all, and only protection against the mailer
yuuji@0 329 for bezerk/mmdf mail files. This has been the accepted state of
yuuji@0 330 affairs on UNIX for many sad years.
yuuji@0 331
yuuji@0 332 If you can not create .lock files, it should not affect locking,
yuuji@0 333 since the flock() locks suffice for all protection. This is, however,
yuuji@0 334 not true if the mailer does not check for flock() locking, or if the
yuuji@0 335 the mail file is NFS-mounted.
yuuji@0 336
yuuji@0 337 What this means is that there is *no* locking protection at all
yuuji@0 338 in the case of a client using an NFS-mounted /usr/spool/mail that does
yuuji@0 339 not permit file creation by unprivileged programs. It is impossible,
yuuji@0 340 under these circumstances, for an unprivileged program to do anything
yuuji@0 341 about it. Worse, if EACCES errors on .lock file creation are no-op'ed
yuuji@0 342 , the user won't even know about it. This is arguably a site
yuuji@0 343 configuration error.
yuuji@0 344
yuuji@0 345 The problem with not being able to create .lock files exists on
yuuji@0 346 System V as well, but the failure modes for flock() -- which is
yuuji@0 347 implemented via fcntl() -- are different.
yuuji@0 348
yuuji@0 349 On System V, if the mail file is NFS-mounted and either the
yuuji@0 350 client or the server lacks a functioning statd/lockd pair, then the
yuuji@0 351 lock attempt would have hung forever if it weren't for the fact that
yuuji@0 352 c-client tests for NFS and no-ops the flock() emulator in this case.
yuuji@0 353 Systemwide or clusterwide failures of statd/lockd have been known to
yuuji@0 354 occur which cause all locks in all processes to hang (including
yuuji@0 355 local?). Without the special NFS test made by c-client, there would
yuuji@0 356 be no way to request BSD-style no-op behavior, nor is there any way to
yuuji@0 357 determine that this is happening other than the system being hung.
yuuji@0 358
yuuji@0 359 The additional locking introduced by c-client was shown to cause
yuuji@0 360 much more stress on the System V locking mechanism than has
yuuji@0 361 traditionally been placed upon it. If it was stressed too far, all
yuuji@0 362 hell broke loose. Fortunately, this is now past history.
yuuji@0 363
yuuji@0 364 TRADEOFFS
yuuji@0 365
yuuji@0 366 c-client based applications have a reasonable chance of winning
yuuji@0 367 as long as you don't use NFS for remote access to mail files. That's
yuuji@0 368 what IMAP is for, after all. It is, however, very important to
yuuji@0 369 realize that you can *not* use the lock-upgrade feature by itself
yuuji@0 370 because it releases the lock as an interim step -- you need to have
yuuji@0 371 lock-upgrading guarded by another lock.
yuuji@0 372
yuuji@0 373 If you have the misfortune of using System V, you are likely to
yuuji@0 374 run into problems sooner or later having to do with statd/lockd. You
yuuji@0 375 basically end up with one of three unsatisfactory choices:
yuuji@0 376 1) Grit your teeth and live with it.
yuuji@0 377 2) Try to make it work:
yuuji@0 378 a) avoid NFS access so as not to stress statd/lockd.
yuuji@0 379 b) try to understand the code in statd/lockd and hack it
yuuji@0 380 to be more robust.
yuuji@0 381 c) hunt out the system limit of locks, if there is one,
yuuji@0 382 and increase it. Figure on at least two locks per
yuuji@0 383 simultaneous imapd process and four locks per Pine
yuuji@0 384 process. Better yet, make the limit be 10 times the
yuuji@0 385 maximum number of processes.
yuuji@0 386 d) increase the socket buffer (-S switch to lockd) if
yuuji@0 387 it is offered. I don't know what this actually does,
yuuji@0 388 but giving lockd more resources to do its work can't
yuuji@0 389 hurt. Maybe.
yuuji@0 390 3) Decide that it can't possibly work, and turn off the
yuuji@0 391 fcntl() calls in your program.
yuuji@0 392 4) If nuking statd/lockd can be done without breaking local
yuuji@0 393 locking, then do so. This would make SVR4 have the same
yuuji@0 394 limitations as BSD locking, with a couple of additional
yuuji@0 395 bugs.
yuuji@0 396 5) Check for NFS, and don't do the fcntl() in the NFS case.
yuuji@0 397 This is what c-client does.
yuuji@0 398
yuuji@0 399 Note that if you are going to use NFS to access files on a server
yuuji@0 400 which does not have statd/lockd running, your only choice is (3), (4),
yuuji@0 401 or (5). Here again, IMAP can bail you out.
yuuji@0 402
yuuji@0 403 These problems aren't unique to c-client applications; they have
yuuji@0 404 also been reported with Elm, Mediamail, and other email tools.
yuuji@0 405
yuuji@0 406 Of the other two SVR4 locking bugs:
yuuji@0 407
yuuji@0 408 Programmer awareness is necessary to deal with the bug that you
yuuji@0 409 can not get an exclusive lock unless the file is open for write. I
yuuji@0 410 believe that c-client has fixed all of these cases.
yuuji@0 411
yuuji@0 412 The problem about opening a second designator smashing any
yuuji@0 413 current locks on the file has not been addressed satisfactorily yet.
yuuji@0 414 This is not an easy problem to deal with, especially in c-client which
yuuji@0 415 really doesn't know what other files/streams may be open by Pine.
yuuji@0 416
yuuji@0 417 Aren't you so happy that you bought an System V system?

UW-IMAP'd extensions by yuuji