rev |
line source |
yuuji@0
|
1 /* ========================================================================
|
yuuji@0
|
2 * Copyright 1988-2006 University of Washington
|
yuuji@0
|
3 *
|
yuuji@0
|
4 * Licensed under the Apache License, Version 2.0 (the "License");
|
yuuji@0
|
5 * you may not use this file except in compliance with the License.
|
yuuji@0
|
6 * You may obtain a copy of the License at
|
yuuji@0
|
7 *
|
yuuji@0
|
8 * http://www.apache.org/licenses/LICENSE-2.0
|
yuuji@0
|
9 *
|
yuuji@0
|
10 *
|
yuuji@0
|
11 * ========================================================================
|
yuuji@0
|
12 */
|
yuuji@0
|
13
|
yuuji@0
|
14 Last update: 18 December 2006
|
yuuji@0
|
15
|
yuuji@0
|
16 INTRODUCTION
|
yuuji@0
|
17
|
yuuji@0
|
18 This file is the descendant of a design document used to specify the
|
yuuji@0
|
19 mix format. An attempt is being made to keep this document more or
|
yuuji@0
|
20 less current with the way the mix format actually works.
|
yuuji@0
|
21
|
yuuji@0
|
22
|
yuuji@0
|
23 1. Mix mailbox naming
|
yuuji@0
|
24
|
yuuji@0
|
25 Mailbox names correspond to directory names; thus mix format mailboxes
|
yuuji@0
|
26 are "dual-use" (lack both \NoInferiors and \NoSelect). This will
|
yuuji@0
|
27 satisfy some long-standing requests.
|
yuuji@0
|
28
|
yuuji@0
|
29
|
yuuji@0
|
30 2. Mailbox files
|
yuuji@0
|
31
|
yuuji@0
|
32 A mix format mailbox is a directory with regular files with filenames
|
yuuji@0
|
33 of:
|
yuuji@0
|
34 .mixmeta mailbox metadata file
|
yuuji@0
|
35 .mixindex message index file (message static data)
|
yuuji@0
|
36 .mixstatus message status file (message dynamic data)
|
yuuji@0
|
37 .mix######## (where ######### is a <hex8>) secondary message
|
yuuji@0
|
38 data files.
|
yuuji@0
|
39 .mix primary message data file (used in experimental
|
yuuji@0
|
40 versions, supported for compatibility only)
|
yuuji@0
|
41
|
yuuji@0
|
42 2.1 Metadata, index, and status files
|
yuuji@0
|
43
|
yuuji@0
|
44 The mailbox metadata, index, and status files contain a sequence of
|
yuuji@0
|
45 CRLF-terminated lines. These files have an update sequence, which is
|
yuuji@0
|
46 a strictly-ascending sequence value. Any time the file is changed,
|
yuuji@0
|
47 the update sequence is increased; this allows easy detection of
|
yuuji@0
|
48 whether the file has been changed by another process. For now, this
|
yuuji@0
|
49 update sequence is a modseq (see below).
|
yuuji@0
|
50
|
yuuji@0
|
51 2.1.1 Metadata file
|
yuuji@0
|
52
|
yuuji@0
|
53 The mailbox metadata file is called ".mixmeta". It contains a series
|
yuuji@0
|
54 of CRLF-terminated lines. The first character of the line is a key that
|
yuuji@0
|
55 identifies the payload of the line, and the remainder of the line is the
|
yuuji@0
|
56 payload.
|
yuuji@0
|
57 Key Payload
|
yuuji@0
|
58 --- -------
|
yuuji@0
|
59 S <hex8> ;; update sequence
|
yuuji@0
|
60 V <hex8> ;; UIDVALIDITY
|
yuuji@0
|
61 L <hex8> ;; UIDLAST
|
yuuji@0
|
62 N <hex8> ;; current new message file
|
yuuji@0
|
63 K [atom 0*(SP atom)] ;; keyword list
|
yuuji@0
|
64
|
yuuji@0
|
65 All other keys are reserved for future assignment and must be ignored
|
yuuji@0
|
66 (and may be discarded) by software which does not recognize them. The
|
yuuji@0
|
67 mailbox metadata file is rewritten as part of new mail delivery (so
|
yuuji@0
|
68 APPENDUID/COPYUID can work) and when new keywords are added.
|
yuuji@0
|
69
|
yuuji@0
|
70 2.1.2 Message static index file
|
yuuji@0
|
71
|
yuuji@0
|
72 The mailbox message static index file is called ".mixindex". It contains
|
yuuji@0
|
73 a series of CRLF-terminated lines. The first character of the line is a
|
yuuji@0
|
74 key that identifies the payload of the line, and the remainder of the line
|
yuuji@0
|
75 is the payload.
|
yuuji@0
|
76 Key Payload
|
yuuji@0
|
77 --- -------
|
yuuji@0
|
78 S <hex8> ;; update sequence
|
yuuji@0
|
79 : <uid>:<date>:<size>:<file>:<pos>:<isiz>:<hsiz>
|
yuuji@0
|
80 ;; per-message record
|
yuuji@0
|
81
|
yuuji@0
|
82 The per-message records contain the following data:
|
yuuji@0
|
83 <uid> = <hex8> ;; message UID
|
yuuji@0
|
84 <date> = <yyyymmddhhmmss+zzzz> ;; internal date
|
yuuji@0
|
85 <size> = <hex8> ;; rfc822.size
|
yuuji@0
|
86 <file> = <hex8> ;; message data file (0 = .mix file)
|
yuuji@0
|
87 <pos> = <hex8> ;; message position in file
|
yuuji@0
|
88 <isiz> = <hex8> ;; message internal data size
|
yuuji@0
|
89 <hsiz> = <hex8> ;; header size (offset to body)
|
yuuji@0
|
90
|
yuuji@0
|
91 All other keys, and subsequent fields in per-message records, are
|
yuuji@0
|
92 reserved for future assignment and must be ignored (and may be
|
yuuji@0
|
93 discarded) by software which does not recognize them. The mailbox
|
yuuji@0
|
94 metadata file is appended by new mail delivery and rewritten by
|
yuuji@0
|
95 expunge "burping", and otherwise is not altered.
|
yuuji@0
|
96
|
yuuji@0
|
97 2.1.3 Message dynamic status file
|
yuuji@0
|
98
|
yuuji@0
|
99 The mailbox message dynamic status file is called ".mixstatus". It contains
|
yuuji@0
|
100 a series of CRLF-terminated lines. The first character of the line is a
|
yuuji@0
|
101 key that identifies the payload of the line, and the remainder of the line
|
yuuji@0
|
102 is the payload.
|
yuuji@0
|
103 Key Payload
|
yuuji@0
|
104 --- -------
|
yuuji@0
|
105 S <hex8> ;; update sequence
|
yuuji@0
|
106 : <uid>:<uf>:<sf>:<mod>: ;; per-message record
|
yuuji@0
|
107
|
yuuji@0
|
108 The per-message records contain the following data:
|
yuuji@0
|
109 <uid> = <hex8> ;; message UID
|
yuuji@0
|
110 <keys> = <hex8> ;; keyword flags
|
yuuji@0
|
111 <flag> = <hex4> ;; system flags
|
yuuji@0
|
112 <mod> = <hex8> ;; date/time last modified (modseq)
|
yuuji@0
|
113
|
yuuji@0
|
114 All other keys, and subsequent fields in per-message records, are
|
yuuji@0
|
115 reserved for future assignment and must be ignored (and may be
|
yuuji@0
|
116 discarded) by software which does not recognize them. The mailbox
|
yuuji@0
|
117 dynamic idex file is rewritten by flag changes (or any future change
|
yuuji@0
|
118 that alters dynamic data) and is re-read when a session sees that the
|
yuuji@0
|
119 mtime has changed (atime and ctime are not used).
|
yuuji@0
|
120
|
yuuji@0
|
121 The modseq is an unsigned 32-bit date/time, along with a guarantee
|
yuuji@0
|
122 that this value can not go backwards. It currently corresponds to the
|
yuuji@0
|
123 time from time(); however, since it is unsigned, it won't run out until
|
yuuji@0
|
124 the year 2106. In the future, this may be used as a basic for implementing
|
yuuji@0
|
125 the IMAP CONDSTORE extension.
|
yuuji@0
|
126
|
yuuji@0
|
127 2.2 Message data files
|
yuuji@0
|
128
|
yuuji@0
|
129 A mix message file is a regular file with filename starting with
|
yuuji@0
|
130 ".mix" followed by a <hex8> suffix which indicates the file number. It
|
yuuji@0
|
131 contains a series of CRLF-terminated lines. By special dispensation, the
|
yuuji@0
|
132 filename ".mix" is used for file number 0, which was used in experimental
|
yuuji@0
|
133 versions of mix as a "primary" file (this concept no longer exists).
|
yuuji@0
|
134
|
yuuji@0
|
135 A file number is set to the current modseq when it is created. If a copy
|
yuuji@0
|
136 or append causes the file to exceed the compiled-in file size limit, a new
|
yuuji@0
|
137 file is started and the metadata is updated accordingly.
|
yuuji@0
|
138
|
yuuji@0
|
139 Preceeding each message is per-message record with the following format:
|
yuuji@0
|
140 Key Payload
|
yuuji@0
|
141 --- -------
|
yuuji@0
|
142 ;; per-message record
|
yuuji@0
|
143 : :<code>:<uid>:<date>:<size>:
|
yuuji@0
|
144
|
yuuji@0
|
145 The per-message records contain the following data:
|
yuuji@0
|
146 <code> = "msg" ;; fixed code
|
yuuji@0
|
147 <uid> = <hex8> ;; message UID
|
yuuji@0
|
148 <date> = <yyyymmddhhmmss+zzzz> ;; internal date
|
yuuji@0
|
149 <size> = <hex8> ;; rfc822.size
|
yuuji@0
|
150 The message data begins on the next line
|
yuuji@0
|
151
|
yuuji@0
|
152 Subsequent fields are reserved for future assignment and must be ignored.
|
yuuji@0
|
153
|
yuuji@0
|
154
|
yuuji@0
|
155 3. New mail delivery
|
yuuji@0
|
156
|
yuuji@0
|
157 To deliver a new message, it is necessary to share lock the destination
|
yuuji@0
|
158 metadata file, then get an exclusive lock on the destination index and
|
yuuji@0
|
159 status files. Once this is done, the new message data is appended to the
|
yuuji@0
|
160 new message file. The metadata (UIDLAST value), index, and status
|
yuuji@0
|
161 files are all updated to add the new message.
|
yuuji@0
|
162
|
yuuji@0
|
163 Then all the destination mailbox files are closed.
|
yuuji@0
|
164
|
yuuji@0
|
165
|
yuuji@0
|
166 4. Mailbox pinging
|
yuuji@0
|
167
|
yuuji@0
|
168 The index and status files are share locked. Initially, sequences are
|
yuuji@0
|
169 remembered as zero, so at open time they are always "altered".
|
yuuji@0
|
170
|
yuuji@0
|
171 The sequence from the index file is checked; if it is altered the index
|
yuuji@0
|
172 file is read and processed as follows:
|
yuuji@0
|
173 . If expunge is permitted, then any messages that are not in the index
|
yuuji@0
|
174 are reported as having been expunged via mm_expunged().
|
yuuji@0
|
175 . new messages are announced via mm_exists()/mm_recent().
|
yuuji@0
|
176
|
yuuji@0
|
177 Next, the sequence from the status file is checked. If it is altered,
|
yuuji@0
|
178 the status file is read and the status updated for any message which is
|
yuuji@0
|
179 new or has an altered modseq in the status file. Altered modseq messages
|
yuuji@0
|
180 are announced via mm_flags().
|
yuuji@0
|
181
|
yuuji@0
|
182 Then the index and status files are closed.
|
yuuji@0
|
183
|
yuuji@0
|
184
|
yuuji@0
|
185 4. Flag alteration
|
yuuji@0
|
186
|
yuuji@0
|
187 The status file is exclusive locked.
|
yuuji@0
|
188
|
yuuji@0
|
189 The sequence from the status file is checked. If it is altered, the
|
yuuji@0
|
190 status file is read and the status updated for any message which is
|
yuuji@0
|
191 new or has an altered modseq in the status file. Altered modseq
|
yuuji@0
|
192 messages are announced via mm_flags().
|
yuuji@0
|
193
|
yuuji@0
|
194 The alterations are then applied for all requested messages, updating
|
yuuji@0
|
195 the modseq for each requestedmessage which changes flags as a result
|
yuuji@0
|
196 of the alteration (alterations which do not result in a change do not
|
yuuji@0
|
197 alter the modseq). Then the status file is rewritten with a new
|
yuuji@0
|
198 sequence, but only if flags of at least one message was changed.
|
yuuji@0
|
199
|
yuuji@0
|
200 Then the status file is closed.
|
yuuji@0
|
201
|
yuuji@0
|
202
|
yuuji@0
|
203 5. Checkpoint and expunge
|
yuuji@0
|
204
|
yuuji@0
|
205 Checkpoint is identical to expunge, however it skips the step of expunging
|
yuuji@0
|
206 deleted messages.
|
yuuji@0
|
207
|
yuuji@0
|
208 The index and status files are locked exclusive. If expunging, all
|
yuuji@0
|
209 deleted messages are expunged from the index and announced via
|
yuuji@0
|
210 mm_expunged(). The message data is notremoved at this time.
|
yuuji@0
|
211
|
yuuji@0
|
212 If a checkpoint was requested, or if any messages were expunged, or if
|
yuuji@0
|
213 it remembered that a "burp" was needed, then:
|
yuuji@0
|
214 . the metadata file is locked exclusive. If this fails, remember that
|
yuuji@0
|
215 a burp is needed. Otherwise perform a burp:
|
yuuji@0
|
216 . calculate the file byte ranges occupied by expunged messages
|
yuuji@0
|
217 . for each file needing "burping", open and slide down subsequent file
|
yuuji@0
|
218 data on top of the expunged messages
|
yuuji@0
|
219 . update the index and status files
|
yuuji@0
|
220
|
yuuji@0
|
221 Then the index and status files are closed.
|
yuuji@0
|
222
|
yuuji@0
|
223 5.1 More details on expunging and "burping"
|
yuuji@0
|
224
|
yuuji@0
|
225 Shared expunge presents a problem due to the requirements of the IMAP
|
yuuji@0
|
226 protocol. You can't "burp" away a message until you are certain that
|
yuuji@0
|
227 no sharers have a pointer to any longer. Consequently, for the nonce
|
yuuji@0
|
228 "burping" out expunged data be defered to an exclusive expunge as in
|
yuuji@0
|
229 mbx format.
|
yuuji@0
|
230
|
yuuji@0
|
231 If shared burping is ever implemented, then care will be needed not to
|
yuuji@0
|
232 burp data that a session still relies upon. It's easy enough to burp
|
yuuji@0
|
233 the index files; just create new index files, deleting the old, and
|
yuuji@0
|
234 require that you look for a new one appearing at mailbox ping time
|
yuuji@0
|
235 (when it's safe). The data files are a problem, since we
|
yuuji@0
|
236 intentionally don't want to keep them open and do want to avoid quota
|
yuuji@0
|
237 problems by overwriting in place. Also, when you burp you have to
|
yuuji@0
|
238 change the pointers in the index file.
|
yuuji@0
|
239
|
yuuji@0
|
240 Bottom line: shared burping is too hairy right now, so the first
|
yuuji@0
|
241 version will do exclusive-only burping and not worry about it. If
|
yuuji@0
|
242 shared burping is really needed, then that routine will need to be
|
yuuji@0
|
243 rewritten.
|
yuuji@0
|
244
|
yuuji@0
|
245 Shared burping has been a problem for every other IMAP server. Most
|
yuuji@0
|
246 get it wrong, and cause terrible confusion to clients (including
|
yuuji@0
|
247 client crashes).
|
yuuji@0
|
248
|
yuuji@0
|
249
|
yuuji@0
|
250 6. Message data file file roll out strategy
|
yuuji@0
|
251
|
yuuji@0
|
252 The current new message file is finalized, and a new one started, when
|
yuuji@0
|
253 an append or copy is done that would cause the file to grow to larger
|
yuuji@0
|
254 than a preconfigured size (MIXDATAROLL). A multi-message copy or
|
yuuji@0
|
255 append is written into its entirety to a single new message file. In
|
yuuji@0
|
256 the case of multi-copy, the new message file is switched when the sum
|
yuuji@0
|
257 of the sizes of all messages to be copied would cause the current new
|
yuuji@0
|
258 message file to exceed MIXDATAROLL. In the case of multi-append, only
|
yuuji@0
|
259 the first message is considered; this is due to technical limitations.
|
yuuji@0
|
260
|
yuuji@0
|
261 7. Error detection
|
yuuji@0
|
262
|
yuuji@0
|
263 Mix detects bad data in the metadata, index, and status files; and
|
yuuji@0
|
264 declares the stream dead. It does not unilaterally reassign
|
yuuji@0
|
265 UIDVALIDITY the way that the flat file formats do.
|
yuuji@0
|
266
|
yuuji@0
|
267 When mix reads a header from the message file, it also reads the
|
yuuji@0
|
268 per-message record and verifies that there is a per-message record there.
|
yuuji@0
|
269 This is a simple test for message file corruption. It doesn't declare
|
yuuji@0
|
270 the stream dead; it simply issues an error message and returns a
|
yuuji@0
|
271 zero-length string for the message header. This makes it possible for
|
yuuji@0
|
272 the user to fix the mailbox simply by deleting and expunging any messages
|
yuuji@0
|
273 that are in this state.
|
yuuji@0
|
274
|
yuuji@0
|
275
|
yuuji@0
|
276 8. Reconstruct tool
|
yuuji@0
|
277
|
yuuji@0
|
278 [None of this is implemented yet.]
|
yuuji@0
|
279
|
yuuji@0
|
280 The layout of these files is designed to make the reconstruct tool be
|
yuuji@0
|
281 as simple as possible. Much of the need for the reconstruct tool is
|
yuuji@0
|
282 eliminated since the mix format has a much more limited scope of
|
yuuji@0
|
283 writing than the flat file formats; thus there is "less collateral
|
yuuji@0
|
284 damage."
|
yuuji@0
|
285
|
yuuji@0
|
286 If the metadata file is lost or corrupted, then all keywords are lost;
|
yuuji@0
|
287 if the mailbox has any keywords used in the .mixstatus file, it'll be
|
yuuji@0
|
288 necessary to create some placeholder names. Otherwise, a new
|
yuuji@0
|
289 UIDVALIDITY can be assigned, and a good UIDLAST value calculated by
|
yuuji@0
|
290 the reconstruct tool. Since this file is very small, it's not likely
|
yuuji@0
|
291 to be damaged.
|
yuuji@0
|
292
|
yuuji@0
|
293 If the index file is lost or corrupted, it is possible to reconstruct
|
yuuji@0
|
294 it with no loss by reading all the data files. However, this could
|
yuuji@0
|
295 cause expunged but not yet burped messages to reappear.
|
yuuji@0
|
296
|
yuuji@0
|
297 If the status file is lost or corrupted, then flags are lost and
|
yuuji@0
|
298 will revert to a default state of no flags set. Just deleting the
|
yuuji@0
|
299 corrupted file is good enough.
|
yuuji@0
|
300
|
yuuji@0
|
301 The reconstruct tool can use the per-message record in the message
|
yuuji@0
|
302 file to locate messages if the recorded sizes and/or messages are
|
yuuji@0
|
303 corrupt. If that happens, it will need to rebuild the index file
|
yuuji@0
|
304 (with associated changes to the metadata file to change the
|
yuuji@0
|
305 UIDVALIDITY). That should probably be a manual operation and not be
|
yuuji@0
|
306 part of the default operation or auto-reconstruct.
|
yuuji@0
|
307
|
yuuji@0
|
308
|
yuuji@0
|
309 9. Locking strategy
|
yuuji@0
|
310
|
yuuji@0
|
311 The mix format does not use the traditional c-client /tmp file locking.
|
yuuji@0
|
312
|
yuuji@0
|
313 The metadata file is open and locked whenever the mailbox is open.
|
yuuji@0
|
314 Normally this is a shared lock, but it will be upgraded to exclusive
|
yuuji@0
|
315 if the mailbox is expunged. As a guard (since there is no true
|
yuuji@0
|
316 lock-upgrade/downgrade on UNIX), the index exclusive lock must be
|
yuuji@0
|
317 acquired first before upgrading to exclusive.
|
yuuji@0
|
318
|
yuuji@0
|
319 The index file is shared locked when reading the index, and exclusive
|
yuuji@0
|
320 locked (and read) when appending new messages to the index or when
|
yuuji@0
|
321 expunging (note that expunging also requires an exclusive lock on
|
yuuji@0
|
322 metadata). Normally, the index file is not open or locked.
|
yuuji@0
|
323
|
yuuji@0
|
324 The status file is shared locked when reading status, and exclusive
|
yuuji@0
|
325 locked (and read) when updating status. Normally, the status file is
|
yuuji@0
|
326 not open or locked.
|
yuuji@0
|
327
|
yuuji@0
|
328 It isn't necessary to lock any of the data files as long as we only
|
yuuji@0
|
329 have exclusive burping.
|
yuuji@0
|
330
|
yuuji@0
|
331
|
yuuji@0
|
332 10. Memory usage
|
yuuji@0
|
333
|
yuuji@0
|
334 The mix format returns a file stringstruct, which is the modern
|
yuuji@0
|
335 c-client behavior. This prevents imapd from growing to enormous sizes
|
yuuji@0
|
336 due to a godzillagram (how it affects other programs depends upon what
|
yuuji@0
|
337 they do with the returned stringstruct).
|
yuuji@0
|
338
|
yuuji@0
|
339
|
yuuji@0
|
340 11. Future extensions
|
yuuji@0
|
341
|
yuuji@0
|
342 Cached ENVELOPE, BODYSTRUCTURE. Cyrus does, and this will eliminate
|
yuuji@0
|
343 most of the reason to access the data files. Possibly cached overviews,
|
yuuji@0
|
344 ala NNTP, instead?
|
yuuji@0
|
345
|
yuuji@0
|
346
|
yuuji@0
|
347 Support for ANNOTATION.
|
yuuji@0
|
348
|
yuuji@0
|
349
|
yuuji@0
|
350 12. RENAME issues
|
yuuji@0
|
351
|
yuuji@0
|
352 Mix currently makes no attempt to address the IMAP RENAME problem.
|
yuuji@0
|
353 This occurs when a mailbox is deleted, and another mailbox is renamed
|
yuuji@0
|
354 with that name in place, no attempt is made to reassign UIDVALIDITY
|
yuuji@0
|
355 for this mailbox and all the inferior mailboxes. This potentially can
|
yuuji@0
|
356 cause problems for a disconnected-use client that has cached status
|
yuuji@0
|
357 for the old mailbox which had that name.
|
yuuji@0
|
358
|
yuuji@0
|
359 The RENAME problem is a well known flaw in the IMAP protocol. Few
|
yuuji@0
|
360 servers correctly handle it (among other things, not only do all the
|
yuuji@0
|
361 UIDVALIDITY values have to be changed but this has to be done
|
yuuji@0
|
362 atomically!). It was a mistake to add RENAME into IMAP, but it's much
|
yuuji@0
|
363 too late to remove it now.
|