imapext-2007: ada5e610ab86 docs/locking.txt

imapext-2007

view docs/locking.txt @ 0:ada5e610ab86

imap-2007e

author	yuuji@gentei.org
date	Mon, 14 Sep 2009 15:17:45 +0900
parents
children

line source

1 /* ========================================================================

3 *

4 * Licensed under the Apache License, Version 2.0 (the "License");

5 * you may not use this file except in compliance with the License.

6 * You may obtain a copy of the License at

7 *

8 * http://www.apache.org/licenses/LICENSE-2.0

9 *

10 *

11 * ========================================================================

12 */

14 UNIX Advisory File Locking Implications on c-client

15 Mark Crispin, 28 November 1995

18 THIS DOCUMENT HAS BEEN UPDATED TO REFLECT THE FACT THAT

19 LINUX SUPPORTS BOTH flock() AND fcntl() AND THAT OSF/1

20 HAS BEEN BROKEN SO THAT IT ONLY SUPPORTS fcntl().

21 -- JUNE 15, 2004

23 THIS DOCUMENT HAS BEEN UPDATED TO REFLECT THE CODE IN THE

24 IMAP-4 TOOLKIT AS OF NOVEMBER 28, 1995. SOME STATEMENTS

25 IN THIS DOCUMENT DO NOT APPLY TO EARLIER VERSIONS OF THE

26 IMAP TOOLKIT.

28 INTRODUCTION

30 Advisory locking is a mechanism by which cooperating processes

31 can signal to each other their usage of a resource and whether or not

32 that usage is critical. It is not a mechanism to protect against

33 processes which do not cooperate in the locking.

35 The most basic form of locking involves a counter. This counter

36 is -1 when the resource is available. If a process wants the lock, it

37 executes an atomic increment-and-test-if-zero. If the value is zero,

38 the process has the lock and can execute the critical code that needs

39 exclusive usage of a resource. When it is finished, it sets the lock

40 back to -1. In C terms:

42 while (++lock) /* try to get lock */

43 invoke_other_threads (); /* failed, try again */

44 .

45 . /* critical code here */

46 .

47 lock = -1; /* release lock */

49 This particular form of locking appears most commonly in

50 multi-threaded applications such as operating system kernels. It

51 makes several presumptions:

52 (1) it is alright to keep testing the lock (no overflow)

53 (2) the critical resource is single-access only

54 (3) there is shared writeable memory between the two threads

55 (4) the threads can be trusted to release the lock when finished

57 In applications programming on multi-user systems, most commonly

58 the other threads are in an entirely different process, which may even

59 be logged in as a different user. Few operating systems offer shared

60 writeable memory between such processes.

62 A means of communicating this is by use of a file with a mutually

63 agreed upon name. A binary semaphore can be passed by means of the

64 existance or non-existance of that file, provided that there is an

65 atomic means to create a file if and only if that file does not exist.

66 In C terms:

68 /* try to get lock */

69 while ((fd = open ("lockfile",O_WRONLY|O_CREAT|O_EXCL,0666)) < 0)

70 sleep (1); /* failed, try again */

71 close (fd); /* got the lock */

72 .

73 . /* critical code here */

74 .

75 unlink ("lockfile"); /* release lock */

77 This form of locking makes fewer presumptions, but it still is

78 guilty of presumptions (2) and (4) above. Presumption (2) limits the

79 ability to have processes sharing a resource in a non-conflicting

80 fashion (e.g. reading from a file). Presumption (4) leads to

81 deadlocks should the process crash while it has a resource locked.

83 Most modern operating systems provide a resource locking system

84 call that has none of these presumptions. In particular, a mechanism

85 is provided for identifying shared locks as opposed to exclusive

86 locks. A shared lock permits other processes to obtain a shared lock,

87 but denies exclusive locks. In other words:

89 current state want shared want exclusive

90 ------------- ----------- --------------

91 unlocked YES YES

92 locked shared YES NO

93 locked exclusive NO NO

95 Furthermore, the operating system automatically relinquishes all

96 locks held by that process when it terminates.

98 A useful operation is the ability to upgrade a shared lock to

99 exclusive (provided there are no other shared users of the lock) and

100 to downgrade an exclusive lock to shared. It is important that at no

101 time is the lock ever removed; a process upgrading to exclusive must

102 not relenquish its shared lock.

103

104 Most commonly, the resources being locked are files. Shared

105 locks are particularly important with files; multiple simultaneous

106 processes can read from a file, but only one can safely write at a

107 time. Some writes may be safer than others; an append to the end of

108 the file is safer than changing existing file data. In turn, changing

109 a file record in place is safer than rewriting the file with an

110 entirely different structure.

111

112

113 FILE LOCKING ON UNIX

114

115 In the oldest versions of UNIX, the use of a semaphore lockfile

116 was the only available form of locking. Advisory locking system calls

117 were not added to UNIX until after the BSD vs. System V split. Both

118 of these system calls deal with file resources only.

119

120 Most systems only have one or the other form of locking. AIX

121 and newer versions of OSF/1 emulate the BSD form of locking as a jacket

122 into the System V form. Ultrix and Linux implement both forms.

123

124 BSD

125

126 BSD added the flock() system call. It offers capabilities to

127 acquire shared lock, acquire exclusive lock, and unlock. Optionally,

128 the process can request an immediate error return instead of blocking

129 when the lock is unavailable.

130

131

132 FLOCK() BUGS

133

134 flock() advertises that it permits upgrading of shared locks to

135 exclusive and downgrading of exclusive locks to shared, but it does so

136 by releasing the former lock and then trying to acquire the new lock.

137 This creates a window of vulnerability in which another process can

138 grab the exclusive lock. Therefore, this capability is not useful,

139 although many programmers have been deluded by incautious reading of

140 the flock() man page to believe otherwise. This problem can be

141 programmed around, once the programmer is aware of it.

142

143 flock() always returns as if it succeeded on NFS files, when in

144 fact it is a no-op. There is no way around this.

145

146 Leaving aside these two problems, flock() works remarkably well,

147 and has shown itself to be robust and trustworthy.

148

149 SYSTEM V/POSIX

150

151 System V added new functions to the fnctl() system call, and a

152 simple interface through the lockf() subroutine. This was

153 subsequently included in POSIX. Both offer the facility to apply the

154 lock to a particular region of the file instead of to the entire file.

155 lockf() only supports exclusive locks, and calls fcntl() internally;

156 hence it won't be discussed further.

157

158 Functionally, fcntl() locking is a superset of flock(); it is

159 possible to implement a flock() emulator using fcntl(), with one minor

160 exception: it is not possible to acquire an exclusive lock if the file

161 is not open for write.

162

163 The fcntl() locking functions are: query lock station of a file

164 region, lock/unlock a region, and lock/unlock a region and block until

165 have the lock. The locks may be shared or exclusive. By means of the

166 statd and lockd daemons, fcntl() locking is available on NFS files.

167

168 When statd is started at system boot, it reads its /etc/state

169 file (which contains the number of times it has been invoked) and

170 /etc/sm directory (which contains a list of all remote sites which are

171 client or server locking with this site), and notifies the statd on

172 each of these systems that it has been restarted. Each statd then

173 notifies the local lockd of the restart of that system.

174

175 lockd receives fcntl() requests for NFS files. It communicates

176 with the lockd at the server and requests it to apply the lock, and

177 with the statd to request it for notification when the server goes

178 down. It blocks until all these requests are completed.

179

180 There is quite a mythos about fcntl() locking.

181

182 One religion holds that fcntl() locking is the best thing since

183 sliced bread, and that programs which use flock() should be converted

184 to fcntl() so that NFS locking will work. However, as noted above,

185 very few systems support both calls, so such an exercise is pointless

186 except on Ultrix and Linux.

187

188 Another religion, which I adhere to, has the opposite viewpoint.

189

190

191 FCNTL() BUGS

192

193 For all of the hairy code to do individual section locking of a

194 file, it's clear that the designers of fcntl() locking never

195 considered some very basic locking operations. It's as if all they

196 knew about locking they got out of some CS textbook with not

197 investigation of real-world needs.

198

199 It is not possible to acquire an exclusive lock unless the file

200 is open for write. You could have append with shared read, and thus

201 you could have a case in which a read-only access may need to go

202 exclusive. This problem can be programmed around once the programmer

203 is aware of it.

204

205 If the file is opened on another file designator in the same

206 process, the file is unlocked even if no attempt is made to do any

207 form of locking on the second designator. This is a very bad bug. It

208 means that an application must keep track of all the files that it has

209 opened and locked.

210

211 If there is no statd/lockd on the NFS server, fcntl() will hang

212 forever waiting for them to appear. This is a bad bug. It means that

213 any attempt to lock on a server that doesn't run these daemons will

214 hang. There is no way for an application to request flock() style

215 ``try to lock, but no-op if the mechanism ain't there''.

216

217 There is a rumor to the effect that fcntl() will hang forever on

218 local files too if there is no local statd/lockd. These daemons are

219 running on mailer.u, although they appear not to have much CPU time.

220 A useful experiment would be to kill them and see if imapd is affected

221 in any way, but I decline to do so without an OK from UCS! ;-) If

222 killing statd/lockd can be done without breaking fcntl() on local

223 files, this would become one of the primary means of dealing with this

224 problem.

225

226 The statd and lockd daemons have quite a reputation for extreme

227 fragility. There have been numerous reports about the locking

228 mechanism being wedged on a systemwide or even clusterwide basis,

229 requiring a reboot to clear. It is rumored that this wedge, once it

230 happens, also blocks local locking. Presumably killing and restarting

231 statd would suffice to clear the wedge, but I haven't verified this.

232

233 There appears to be a limit to how many locks may be in use at a

234 time on the system, although the documentation only mentions it in

235 passing. On some of their systems, UCS has increased lockd's ``size

236 of the socket buffer'', whatever that means.

237

238 C-CLIENT USAGE

239

240 c-client uses flock(). On System V systems, flock() is simulated

241 by an emulator that calls fcntl().

242

243

244 BEZERK AND MMDF

245

246 Locking in the traditional UNIX formats was largely dictated by

247 the status quo in other applications; however, additional protection

248 is added against inadvertantly running multiple instances of a

249 c-client application on the same mail file.

250

251 (1) c-client attempts to create a .lock file (mail file name with

252 ``.lock'' appended) whenever it reads from, or writes to, the mail

253 file. This is an exclusive lock, and is held only for short periods

254 of time while c-client is actually doing the I/O. There is a 5-minute

255 timeout for this lock, after which it is broken on the presumption

256 that it is a stale lock. If it can not create the .lock file due to

257 an EACCES (protection failure) error, it once silently proceeded

258 without this lock; this was for systems which protect /usr/spool/mail

259 from unprivileged processes creating files. Today, c-client reports

260 an error unless it is built otherwise. The purpose of this lock is to

261 prevent against unfavorable interactions with mail delivery.

262

263 (2) c-client applies a shared flock() to the mail file whenever

264 it reads from the mail file, and an exclusive flock() whenever it

265 writes to the mail file. This lock is freed as soon as it finishes

266 reading. The purpose of this lock is to prevent against unfavorable

267 interactions with mail delivery.

268

269 (3) c-client applies an exclusive flock() to a file on /tmp

270 (whose name represents the device and inode number of the file) when

271 it opens the mail file. This lock is maintained throughout the

272 session, although c-client has a feature (called ``kiss of death'')

273 which permits c-client to forcibly and irreversibly seize the lock

274 from a cooperating c-client application that surrenders the lock on

275 demand. The purpose of this lock is to prevent against unfavorable

276 interactions with other instances of c-client (rewriting the mail

277 file).

278

279 Mail delivery daemons use lock (1), (2), or both. Lock (1) works

280 over NFS; lock (2) is the only one that works on sites that protect

281 /usr/spool/mail against unprivileged file creation. Prudent mail

282 delivery daemons use both forms of locking, and of course so does

283 c-client.

284

285 If only lock (2) is used, then multiple processes can read from

286 the mail file simultaneously, although in real life this doesn't

287 really change things. The normal state of locks (1) and (2) is

288 unlocked except for very brief periods.

289

290

291 TENEX AND MTX

292

293 The design of the locking mechanism of these formats was

294 motivated by a design to enable multiple simultaneous read/write

295 access. It is almost the reverse of how locking works with

296 bezerk/mmdf.

297

298 (1) c-client applies a shared flock() to the mail file when it

299 opens the mail file. It upgrades this lock to exclusive whenever it

300 tries to expunge the mail file. Because of the flock() bug that

301 upgrading a lock actually releases it, it will not do so until it has

302 acquired an exclusive lock (2) first. The purpose of this lock is to

303 prevent against expunge taking place while some other c-client has the

304 mail file open (and thus knows where all the messages are).

305

306 (2) c-client applies a shared flock() to a file on /tmp (whose

307 name represents the device and inode number of the file) when it

308 parses the mail file. It applies an exclusive flock() to this file

309 when it appends new mail to the mail file, as well as before it

310 attempts to upgrade lock (1) to exclusive. The purpose of this lock

311 is to prevent against data being appended while some other c-client is

312 parsing mail in the file (to prevent reading of incomplete messages).

313 It also protects against the lock-releasing timing race on lock (1).

314

315 OBSERVATIONS

316

317 In a perfect world, locking works. You are protected against

318 unfavorable interactions with the mailer and against your own mistake

319 by running more than one instance of your mail reader. In tenex/mtx

320 formats, you have the additional benefit that multiple simultaneous

321 read/write access works, with the sole restriction being that you

322 can't expunge if there are any sharers of the mail file.

323

324 If the mail file is NFS-mounted, then flock() locking is a silent

325 no-op. This is the way BSD implements flock(), and c-client's

326 emulation of flock() through fcntl() tests for NFS files and

327 duplicates this functionality. There is no locking protection for

328 tenex/mtx mail files at all, and only protection against the mailer

329 for bezerk/mmdf mail files. This has been the accepted state of

330 affairs on UNIX for many sad years.

331

332 If you can not create .lock files, it should not affect locking,

333 since the flock() locks suffice for all protection. This is, however,

334 not true if the mailer does not check for flock() locking, or if the

335 the mail file is NFS-mounted.

336

337 What this means is that there is *no* locking protection at all

338 in the case of a client using an NFS-mounted /usr/spool/mail that does

339 not permit file creation by unprivileged programs. It is impossible,

340 under these circumstances, for an unprivileged program to do anything

341 about it. Worse, if EACCES errors on .lock file creation are no-op'ed

342 , the user won't even know about it. This is arguably a site

343 configuration error.

344

345 The problem with not being able to create .lock files exists on

346 System V as well, but the failure modes for flock() -- which is

347 implemented via fcntl() -- are different.

348

349 On System V, if the mail file is NFS-mounted and either the

350 client or the server lacks a functioning statd/lockd pair, then the

351 lock attempt would have hung forever if it weren't for the fact that

352 c-client tests for NFS and no-ops the flock() emulator in this case.

353 Systemwide or clusterwide failures of statd/lockd have been known to

354 occur which cause all locks in all processes to hang (including

355 local?). Without the special NFS test made by c-client, there would

356 be no way to request BSD-style no-op behavior, nor is there any way to

357 determine that this is happening other than the system being hung.

358

359 The additional locking introduced by c-client was shown to cause

360 much more stress on the System V locking mechanism than has

361 traditionally been placed upon it. If it was stressed too far, all

362 hell broke loose. Fortunately, this is now past history.

363

364 TRADEOFFS

365

366 c-client based applications have a reasonable chance of winning

367 as long as you don't use NFS for remote access to mail files. That's

368 what IMAP is for, after all. It is, however, very important to

369 realize that you can *not* use the lock-upgrade feature by itself

370 because it releases the lock as an interim step -- you need to have

371 lock-upgrading guarded by another lock.

372

373 If you have the misfortune of using System V, you are likely to

374 run into problems sooner or later having to do with statd/lockd. You

375 basically end up with one of three unsatisfactory choices:

376 1) Grit your teeth and live with it.

377 2) Try to make it work:

378 a) avoid NFS access so as not to stress statd/lockd.

379 b) try to understand the code in statd/lockd and hack it

380 to be more robust.

381 c) hunt out the system limit of locks, if there is one,

382 and increase it. Figure on at least two locks per

383 simultaneous imapd process and four locks per Pine

384 process. Better yet, make the limit be 10 times the

385 maximum number of processes.

386 d) increase the socket buffer (-S switch to lockd) if

387 it is offered. I don't know what this actually does,

388 but giving lockd more resources to do its work can't

389 hurt. Maybe.

390 3) Decide that it can't possibly work, and turn off the

391 fcntl() calls in your program.

392 4) If nuking statd/lockd can be done without breaking local

393 locking, then do so. This would make SVR4 have the same

394 limitations as BSD locking, with a couple of additional

395 bugs.

396 5) Check for NFS, and don't do the fcntl() in the NFS case.

397 This is what c-client does.

398

399 Note that if you are going to use NFS to access files on a server

400 which does not have statd/lockd running, your only choice is (3), (4),

401 or (5). Here again, IMAP can bail you out.

402

403 These problems aren't unique to c-client applications; they have

404 also been reported with Elm, Mediamail, and other email tools.

405

406 Of the other two SVR4 locking bugs:

407

408 Programmer awareness is necessary to deal with the bug that you

409 can not get an exclusive lock unless the file is open for write. I

410 believe that c-client has fixed all of these cases.

411

412 The problem about opening a second designator smashing any

413 current locks on the file has not been addressed satisfactorily yet.

414 This is not an easy problem to deal with, especially in c-client which

415 really doesn't know what other files/streams may be open by Pine.

416

417 Aren't you so happy that you bought an System V system?