imapext-2007

view docs/mixfmt.txt @ 0:ada5e610ab86

imap-2007e
author yuuji@gentei.org
date Mon, 14 Sep 2009 15:17:45 +0900
parents
children
line source
1 /* ========================================================================
2 * Copyright 1988-2006 University of Washington
3 *
4 * Licensed under the Apache License, Version 2.0 (the "License");
5 * you may not use this file except in compliance with the License.
6 * You may obtain a copy of the License at
7 *
8 * http://www.apache.org/licenses/LICENSE-2.0
9 *
10 *
11 * ========================================================================
12 */
14 Last update: 18 December 2006
16 INTRODUCTION
18 This file is the descendant of a design document used to specify the
19 mix format. An attempt is being made to keep this document more or
20 less current with the way the mix format actually works.
23 1. Mix mailbox naming
25 Mailbox names correspond to directory names; thus mix format mailboxes
26 are "dual-use" (lack both \NoInferiors and \NoSelect). This will
27 satisfy some long-standing requests.
30 2. Mailbox files
32 A mix format mailbox is a directory with regular files with filenames
33 of:
34 .mixmeta mailbox metadata file
35 .mixindex message index file (message static data)
36 .mixstatus message status file (message dynamic data)
37 .mix######## (where ######### is a <hex8>) secondary message
38 data files.
39 .mix primary message data file (used in experimental
40 versions, supported for compatibility only)
42 2.1 Metadata, index, and status files
44 The mailbox metadata, index, and status files contain a sequence of
45 CRLF-terminated lines. These files have an update sequence, which is
46 a strictly-ascending sequence value. Any time the file is changed,
47 the update sequence is increased; this allows easy detection of
48 whether the file has been changed by another process. For now, this
49 update sequence is a modseq (see below).
51 2.1.1 Metadata file
53 The mailbox metadata file is called ".mixmeta". It contains a series
54 of CRLF-terminated lines. The first character of the line is a key that
55 identifies the payload of the line, and the remainder of the line is the
56 payload.
57 Key Payload
58 --- -------
59 S <hex8> ;; update sequence
60 V <hex8> ;; UIDVALIDITY
61 L <hex8> ;; UIDLAST
62 N <hex8> ;; current new message file
63 K [atom 0*(SP atom)] ;; keyword list
65 All other keys are reserved for future assignment and must be ignored
66 (and may be discarded) by software which does not recognize them. The
67 mailbox metadata file is rewritten as part of new mail delivery (so
68 APPENDUID/COPYUID can work) and when new keywords are added.
70 2.1.2 Message static index file
72 The mailbox message static index file is called ".mixindex". It contains
73 a series of CRLF-terminated lines. The first character of the line is a
74 key that identifies the payload of the line, and the remainder of the line
75 is the payload.
76 Key Payload
77 --- -------
78 S <hex8> ;; update sequence
79 : <uid>:<date>:<size>:<file>:<pos>:<isiz>:<hsiz>
80 ;; per-message record
82 The per-message records contain the following data:
83 <uid> = <hex8> ;; message UID
84 <date> = <yyyymmddhhmmss+zzzz> ;; internal date
85 <size> = <hex8> ;; rfc822.size
86 <file> = <hex8> ;; message data file (0 = .mix file)
87 <pos> = <hex8> ;; message position in file
88 <isiz> = <hex8> ;; message internal data size
89 <hsiz> = <hex8> ;; header size (offset to body)
91 All other keys, and subsequent fields in per-message records, are
92 reserved for future assignment and must be ignored (and may be
93 discarded) by software which does not recognize them. The mailbox
94 metadata file is appended by new mail delivery and rewritten by
95 expunge "burping", and otherwise is not altered.
97 2.1.3 Message dynamic status file
99 The mailbox message dynamic status file is called ".mixstatus". It contains
100 a series of CRLF-terminated lines. The first character of the line is a
101 key that identifies the payload of the line, and the remainder of the line
102 is the payload.
103 Key Payload
104 --- -------
105 S <hex8> ;; update sequence
106 : <uid>:<uf>:<sf>:<mod>: ;; per-message record
108 The per-message records contain the following data:
109 <uid> = <hex8> ;; message UID
110 <keys> = <hex8> ;; keyword flags
111 <flag> = <hex4> ;; system flags
112 <mod> = <hex8> ;; date/time last modified (modseq)
114 All other keys, and subsequent fields in per-message records, are
115 reserved for future assignment and must be ignored (and may be
116 discarded) by software which does not recognize them. The mailbox
117 dynamic idex file is rewritten by flag changes (or any future change
118 that alters dynamic data) and is re-read when a session sees that the
119 mtime has changed (atime and ctime are not used).
121 The modseq is an unsigned 32-bit date/time, along with a guarantee
122 that this value can not go backwards. It currently corresponds to the
123 time from time(); however, since it is unsigned, it won't run out until
124 the year 2106. In the future, this may be used as a basic for implementing
125 the IMAP CONDSTORE extension.
127 2.2 Message data files
129 A mix message file is a regular file with filename starting with
130 ".mix" followed by a <hex8> suffix which indicates the file number. It
131 contains a series of CRLF-terminated lines. By special dispensation, the
132 filename ".mix" is used for file number 0, which was used in experimental
133 versions of mix as a "primary" file (this concept no longer exists).
135 A file number is set to the current modseq when it is created. If a copy
136 or append causes the file to exceed the compiled-in file size limit, a new
137 file is started and the metadata is updated accordingly.
139 Preceeding each message is per-message record with the following format:
140 Key Payload
141 --- -------
142 ;; per-message record
143 : :<code>:<uid>:<date>:<size>:
145 The per-message records contain the following data:
146 <code> = "msg" ;; fixed code
147 <uid> = <hex8> ;; message UID
148 <date> = <yyyymmddhhmmss+zzzz> ;; internal date
149 <size> = <hex8> ;; rfc822.size
150 The message data begins on the next line
152 Subsequent fields are reserved for future assignment and must be ignored.
155 3. New mail delivery
157 To deliver a new message, it is necessary to share lock the destination
158 metadata file, then get an exclusive lock on the destination index and
159 status files. Once this is done, the new message data is appended to the
160 new message file. The metadata (UIDLAST value), index, and status
161 files are all updated to add the new message.
163 Then all the destination mailbox files are closed.
166 4. Mailbox pinging
168 The index and status files are share locked. Initially, sequences are
169 remembered as zero, so at open time they are always "altered".
171 The sequence from the index file is checked; if it is altered the index
172 file is read and processed as follows:
173 . If expunge is permitted, then any messages that are not in the index
174 are reported as having been expunged via mm_expunged().
175 . new messages are announced via mm_exists()/mm_recent().
177 Next, the sequence from the status file is checked. If it is altered,
178 the status file is read and the status updated for any message which is
179 new or has an altered modseq in the status file. Altered modseq messages
180 are announced via mm_flags().
182 Then the index and status files are closed.
185 4. Flag alteration
187 The status file is exclusive locked.
189 The sequence from the status file is checked. If it is altered, the
190 status file is read and the status updated for any message which is
191 new or has an altered modseq in the status file. Altered modseq
192 messages are announced via mm_flags().
194 The alterations are then applied for all requested messages, updating
195 the modseq for each requestedmessage which changes flags as a result
196 of the alteration (alterations which do not result in a change do not
197 alter the modseq). Then the status file is rewritten with a new
198 sequence, but only if flags of at least one message was changed.
200 Then the status file is closed.
203 5. Checkpoint and expunge
205 Checkpoint is identical to expunge, however it skips the step of expunging
206 deleted messages.
208 The index and status files are locked exclusive. If expunging, all
209 deleted messages are expunged from the index and announced via
210 mm_expunged(). The message data is notremoved at this time.
212 If a checkpoint was requested, or if any messages were expunged, or if
213 it remembered that a "burp" was needed, then:
214 . the metadata file is locked exclusive. If this fails, remember that
215 a burp is needed. Otherwise perform a burp:
216 . calculate the file byte ranges occupied by expunged messages
217 . for each file needing "burping", open and slide down subsequent file
218 data on top of the expunged messages
219 . update the index and status files
221 Then the index and status files are closed.
223 5.1 More details on expunging and "burping"
225 Shared expunge presents a problem due to the requirements of the IMAP
226 protocol. You can't "burp" away a message until you are certain that
227 no sharers have a pointer to any longer. Consequently, for the nonce
228 "burping" out expunged data be defered to an exclusive expunge as in
229 mbx format.
231 If shared burping is ever implemented, then care will be needed not to
232 burp data that a session still relies upon. It's easy enough to burp
233 the index files; just create new index files, deleting the old, and
234 require that you look for a new one appearing at mailbox ping time
235 (when it's safe). The data files are a problem, since we
236 intentionally don't want to keep them open and do want to avoid quota
237 problems by overwriting in place. Also, when you burp you have to
238 change the pointers in the index file.
240 Bottom line: shared burping is too hairy right now, so the first
241 version will do exclusive-only burping and not worry about it. If
242 shared burping is really needed, then that routine will need to be
243 rewritten.
245 Shared burping has been a problem for every other IMAP server. Most
246 get it wrong, and cause terrible confusion to clients (including
247 client crashes).
250 6. Message data file file roll out strategy
252 The current new message file is finalized, and a new one started, when
253 an append or copy is done that would cause the file to grow to larger
254 than a preconfigured size (MIXDATAROLL). A multi-message copy or
255 append is written into its entirety to a single new message file. In
256 the case of multi-copy, the new message file is switched when the sum
257 of the sizes of all messages to be copied would cause the current new
258 message file to exceed MIXDATAROLL. In the case of multi-append, only
259 the first message is considered; this is due to technical limitations.
261 7. Error detection
263 Mix detects bad data in the metadata, index, and status files; and
264 declares the stream dead. It does not unilaterally reassign
265 UIDVALIDITY the way that the flat file formats do.
267 When mix reads a header from the message file, it also reads the
268 per-message record and verifies that there is a per-message record there.
269 This is a simple test for message file corruption. It doesn't declare
270 the stream dead; it simply issues an error message and returns a
271 zero-length string for the message header. This makes it possible for
272 the user to fix the mailbox simply by deleting and expunging any messages
273 that are in this state.
276 8. Reconstruct tool
278 [None of this is implemented yet.]
280 The layout of these files is designed to make the reconstruct tool be
281 as simple as possible. Much of the need for the reconstruct tool is
282 eliminated since the mix format has a much more limited scope of
283 writing than the flat file formats; thus there is "less collateral
284 damage."
286 If the metadata file is lost or corrupted, then all keywords are lost;
287 if the mailbox has any keywords used in the .mixstatus file, it'll be
288 necessary to create some placeholder names. Otherwise, a new
289 UIDVALIDITY can be assigned, and a good UIDLAST value calculated by
290 the reconstruct tool. Since this file is very small, it's not likely
291 to be damaged.
293 If the index file is lost or corrupted, it is possible to reconstruct
294 it with no loss by reading all the data files. However, this could
295 cause expunged but not yet burped messages to reappear.
297 If the status file is lost or corrupted, then flags are lost and
298 will revert to a default state of no flags set. Just deleting the
299 corrupted file is good enough.
301 The reconstruct tool can use the per-message record in the message
302 file to locate messages if the recorded sizes and/or messages are
303 corrupt. If that happens, it will need to rebuild the index file
304 (with associated changes to the metadata file to change the
305 UIDVALIDITY). That should probably be a manual operation and not be
306 part of the default operation or auto-reconstruct.
309 9. Locking strategy
311 The mix format does not use the traditional c-client /tmp file locking.
313 The metadata file is open and locked whenever the mailbox is open.
314 Normally this is a shared lock, but it will be upgraded to exclusive
315 if the mailbox is expunged. As a guard (since there is no true
316 lock-upgrade/downgrade on UNIX), the index exclusive lock must be
317 acquired first before upgrading to exclusive.
319 The index file is shared locked when reading the index, and exclusive
320 locked (and read) when appending new messages to the index or when
321 expunging (note that expunging also requires an exclusive lock on
322 metadata). Normally, the index file is not open or locked.
324 The status file is shared locked when reading status, and exclusive
325 locked (and read) when updating status. Normally, the status file is
326 not open or locked.
328 It isn't necessary to lock any of the data files as long as we only
329 have exclusive burping.
332 10. Memory usage
334 The mix format returns a file stringstruct, which is the modern
335 c-client behavior. This prevents imapd from growing to enormous sizes
336 due to a godzillagram (how it affects other programs depends upon what
337 they do with the returned stringstruct).
340 11. Future extensions
342 Cached ENVELOPE, BODYSTRUCTURE. Cyrus does, and this will eliminate
343 most of the reason to access the data files. Possibly cached overviews,
344 ala NNTP, instead?
347 Support for ANNOTATION.
350 12. RENAME issues
352 Mix currently makes no attempt to address the IMAP RENAME problem.
353 This occurs when a mailbox is deleted, and another mailbox is renamed
354 with that name in place, no attempt is made to reassign UIDVALIDITY
355 for this mailbox and all the inferior mailboxes. This potentially can
356 cause problems for a disconnected-use client that has cached status
357 for the old mailbox which had that name.
359 The RENAME problem is a well known flaw in the IMAP protocol. Few
360 servers correctly handle it (among other things, not only do all the
361 UIDVALIDITY values have to be changed but this has to be done
362 atomically!). It was a mistake to add RENAME into IMAP, but it's much
363 too late to remove it now.

UW-IMAP'd extensions by yuuji