imapext-2007

annotate docs/mixfmt.txt @ 0:ada5e610ab86

imap-2007e
author yuuji@gentei.org
date Mon, 14 Sep 2009 15:17:45 +0900
parents
children
rev   line source
yuuji@0 1 /* ========================================================================
yuuji@0 2 * Copyright 1988-2006 University of Washington
yuuji@0 3 *
yuuji@0 4 * Licensed under the Apache License, Version 2.0 (the "License");
yuuji@0 5 * you may not use this file except in compliance with the License.
yuuji@0 6 * You may obtain a copy of the License at
yuuji@0 7 *
yuuji@0 8 * http://www.apache.org/licenses/LICENSE-2.0
yuuji@0 9 *
yuuji@0 10 *
yuuji@0 11 * ========================================================================
yuuji@0 12 */
yuuji@0 13
yuuji@0 14 Last update: 18 December 2006
yuuji@0 15
yuuji@0 16 INTRODUCTION
yuuji@0 17
yuuji@0 18 This file is the descendant of a design document used to specify the
yuuji@0 19 mix format. An attempt is being made to keep this document more or
yuuji@0 20 less current with the way the mix format actually works.
yuuji@0 21
yuuji@0 22
yuuji@0 23 1. Mix mailbox naming
yuuji@0 24
yuuji@0 25 Mailbox names correspond to directory names; thus mix format mailboxes
yuuji@0 26 are "dual-use" (lack both \NoInferiors and \NoSelect). This will
yuuji@0 27 satisfy some long-standing requests.
yuuji@0 28
yuuji@0 29
yuuji@0 30 2. Mailbox files
yuuji@0 31
yuuji@0 32 A mix format mailbox is a directory with regular files with filenames
yuuji@0 33 of:
yuuji@0 34 .mixmeta mailbox metadata file
yuuji@0 35 .mixindex message index file (message static data)
yuuji@0 36 .mixstatus message status file (message dynamic data)
yuuji@0 37 .mix######## (where ######### is a <hex8>) secondary message
yuuji@0 38 data files.
yuuji@0 39 .mix primary message data file (used in experimental
yuuji@0 40 versions, supported for compatibility only)
yuuji@0 41
yuuji@0 42 2.1 Metadata, index, and status files
yuuji@0 43
yuuji@0 44 The mailbox metadata, index, and status files contain a sequence of
yuuji@0 45 CRLF-terminated lines. These files have an update sequence, which is
yuuji@0 46 a strictly-ascending sequence value. Any time the file is changed,
yuuji@0 47 the update sequence is increased; this allows easy detection of
yuuji@0 48 whether the file has been changed by another process. For now, this
yuuji@0 49 update sequence is a modseq (see below).
yuuji@0 50
yuuji@0 51 2.1.1 Metadata file
yuuji@0 52
yuuji@0 53 The mailbox metadata file is called ".mixmeta". It contains a series
yuuji@0 54 of CRLF-terminated lines. The first character of the line is a key that
yuuji@0 55 identifies the payload of the line, and the remainder of the line is the
yuuji@0 56 payload.
yuuji@0 57 Key Payload
yuuji@0 58 --- -------
yuuji@0 59 S <hex8> ;; update sequence
yuuji@0 60 V <hex8> ;; UIDVALIDITY
yuuji@0 61 L <hex8> ;; UIDLAST
yuuji@0 62 N <hex8> ;; current new message file
yuuji@0 63 K [atom 0*(SP atom)] ;; keyword list
yuuji@0 64
yuuji@0 65 All other keys are reserved for future assignment and must be ignored
yuuji@0 66 (and may be discarded) by software which does not recognize them. The
yuuji@0 67 mailbox metadata file is rewritten as part of new mail delivery (so
yuuji@0 68 APPENDUID/COPYUID can work) and when new keywords are added.
yuuji@0 69
yuuji@0 70 2.1.2 Message static index file
yuuji@0 71
yuuji@0 72 The mailbox message static index file is called ".mixindex". It contains
yuuji@0 73 a series of CRLF-terminated lines. The first character of the line is a
yuuji@0 74 key that identifies the payload of the line, and the remainder of the line
yuuji@0 75 is the payload.
yuuji@0 76 Key Payload
yuuji@0 77 --- -------
yuuji@0 78 S <hex8> ;; update sequence
yuuji@0 79 : <uid>:<date>:<size>:<file>:<pos>:<isiz>:<hsiz>
yuuji@0 80 ;; per-message record
yuuji@0 81
yuuji@0 82 The per-message records contain the following data:
yuuji@0 83 <uid> = <hex8> ;; message UID
yuuji@0 84 <date> = <yyyymmddhhmmss+zzzz> ;; internal date
yuuji@0 85 <size> = <hex8> ;; rfc822.size
yuuji@0 86 <file> = <hex8> ;; message data file (0 = .mix file)
yuuji@0 87 <pos> = <hex8> ;; message position in file
yuuji@0 88 <isiz> = <hex8> ;; message internal data size
yuuji@0 89 <hsiz> = <hex8> ;; header size (offset to body)
yuuji@0 90
yuuji@0 91 All other keys, and subsequent fields in per-message records, are
yuuji@0 92 reserved for future assignment and must be ignored (and may be
yuuji@0 93 discarded) by software which does not recognize them. The mailbox
yuuji@0 94 metadata file is appended by new mail delivery and rewritten by
yuuji@0 95 expunge "burping", and otherwise is not altered.
yuuji@0 96
yuuji@0 97 2.1.3 Message dynamic status file
yuuji@0 98
yuuji@0 99 The mailbox message dynamic status file is called ".mixstatus". It contains
yuuji@0 100 a series of CRLF-terminated lines. The first character of the line is a
yuuji@0 101 key that identifies the payload of the line, and the remainder of the line
yuuji@0 102 is the payload.
yuuji@0 103 Key Payload
yuuji@0 104 --- -------
yuuji@0 105 S <hex8> ;; update sequence
yuuji@0 106 : <uid>:<uf>:<sf>:<mod>: ;; per-message record
yuuji@0 107
yuuji@0 108 The per-message records contain the following data:
yuuji@0 109 <uid> = <hex8> ;; message UID
yuuji@0 110 <keys> = <hex8> ;; keyword flags
yuuji@0 111 <flag> = <hex4> ;; system flags
yuuji@0 112 <mod> = <hex8> ;; date/time last modified (modseq)
yuuji@0 113
yuuji@0 114 All other keys, and subsequent fields in per-message records, are
yuuji@0 115 reserved for future assignment and must be ignored (and may be
yuuji@0 116 discarded) by software which does not recognize them. The mailbox
yuuji@0 117 dynamic idex file is rewritten by flag changes (or any future change
yuuji@0 118 that alters dynamic data) and is re-read when a session sees that the
yuuji@0 119 mtime has changed (atime and ctime are not used).
yuuji@0 120
yuuji@0 121 The modseq is an unsigned 32-bit date/time, along with a guarantee
yuuji@0 122 that this value can not go backwards. It currently corresponds to the
yuuji@0 123 time from time(); however, since it is unsigned, it won't run out until
yuuji@0 124 the year 2106. In the future, this may be used as a basic for implementing
yuuji@0 125 the IMAP CONDSTORE extension.
yuuji@0 126
yuuji@0 127 2.2 Message data files
yuuji@0 128
yuuji@0 129 A mix message file is a regular file with filename starting with
yuuji@0 130 ".mix" followed by a <hex8> suffix which indicates the file number. It
yuuji@0 131 contains a series of CRLF-terminated lines. By special dispensation, the
yuuji@0 132 filename ".mix" is used for file number 0, which was used in experimental
yuuji@0 133 versions of mix as a "primary" file (this concept no longer exists).
yuuji@0 134
yuuji@0 135 A file number is set to the current modseq when it is created. If a copy
yuuji@0 136 or append causes the file to exceed the compiled-in file size limit, a new
yuuji@0 137 file is started and the metadata is updated accordingly.
yuuji@0 138
yuuji@0 139 Preceeding each message is per-message record with the following format:
yuuji@0 140 Key Payload
yuuji@0 141 --- -------
yuuji@0 142 ;; per-message record
yuuji@0 143 : :<code>:<uid>:<date>:<size>:
yuuji@0 144
yuuji@0 145 The per-message records contain the following data:
yuuji@0 146 <code> = "msg" ;; fixed code
yuuji@0 147 <uid> = <hex8> ;; message UID
yuuji@0 148 <date> = <yyyymmddhhmmss+zzzz> ;; internal date
yuuji@0 149 <size> = <hex8> ;; rfc822.size
yuuji@0 150 The message data begins on the next line
yuuji@0 151
yuuji@0 152 Subsequent fields are reserved for future assignment and must be ignored.
yuuji@0 153
yuuji@0 154
yuuji@0 155 3. New mail delivery
yuuji@0 156
yuuji@0 157 To deliver a new message, it is necessary to share lock the destination
yuuji@0 158 metadata file, then get an exclusive lock on the destination index and
yuuji@0 159 status files. Once this is done, the new message data is appended to the
yuuji@0 160 new message file. The metadata (UIDLAST value), index, and status
yuuji@0 161 files are all updated to add the new message.
yuuji@0 162
yuuji@0 163 Then all the destination mailbox files are closed.
yuuji@0 164
yuuji@0 165
yuuji@0 166 4. Mailbox pinging
yuuji@0 167
yuuji@0 168 The index and status files are share locked. Initially, sequences are
yuuji@0 169 remembered as zero, so at open time they are always "altered".
yuuji@0 170
yuuji@0 171 The sequence from the index file is checked; if it is altered the index
yuuji@0 172 file is read and processed as follows:
yuuji@0 173 . If expunge is permitted, then any messages that are not in the index
yuuji@0 174 are reported as having been expunged via mm_expunged().
yuuji@0 175 . new messages are announced via mm_exists()/mm_recent().
yuuji@0 176
yuuji@0 177 Next, the sequence from the status file is checked. If it is altered,
yuuji@0 178 the status file is read and the status updated for any message which is
yuuji@0 179 new or has an altered modseq in the status file. Altered modseq messages
yuuji@0 180 are announced via mm_flags().
yuuji@0 181
yuuji@0 182 Then the index and status files are closed.
yuuji@0 183
yuuji@0 184
yuuji@0 185 4. Flag alteration
yuuji@0 186
yuuji@0 187 The status file is exclusive locked.
yuuji@0 188
yuuji@0 189 The sequence from the status file is checked. If it is altered, the
yuuji@0 190 status file is read and the status updated for any message which is
yuuji@0 191 new or has an altered modseq in the status file. Altered modseq
yuuji@0 192 messages are announced via mm_flags().
yuuji@0 193
yuuji@0 194 The alterations are then applied for all requested messages, updating
yuuji@0 195 the modseq for each requestedmessage which changes flags as a result
yuuji@0 196 of the alteration (alterations which do not result in a change do not
yuuji@0 197 alter the modseq). Then the status file is rewritten with a new
yuuji@0 198 sequence, but only if flags of at least one message was changed.
yuuji@0 199
yuuji@0 200 Then the status file is closed.
yuuji@0 201
yuuji@0 202
yuuji@0 203 5. Checkpoint and expunge
yuuji@0 204
yuuji@0 205 Checkpoint is identical to expunge, however it skips the step of expunging
yuuji@0 206 deleted messages.
yuuji@0 207
yuuji@0 208 The index and status files are locked exclusive. If expunging, all
yuuji@0 209 deleted messages are expunged from the index and announced via
yuuji@0 210 mm_expunged(). The message data is notremoved at this time.
yuuji@0 211
yuuji@0 212 If a checkpoint was requested, or if any messages were expunged, or if
yuuji@0 213 it remembered that a "burp" was needed, then:
yuuji@0 214 . the metadata file is locked exclusive. If this fails, remember that
yuuji@0 215 a burp is needed. Otherwise perform a burp:
yuuji@0 216 . calculate the file byte ranges occupied by expunged messages
yuuji@0 217 . for each file needing "burping", open and slide down subsequent file
yuuji@0 218 data on top of the expunged messages
yuuji@0 219 . update the index and status files
yuuji@0 220
yuuji@0 221 Then the index and status files are closed.
yuuji@0 222
yuuji@0 223 5.1 More details on expunging and "burping"
yuuji@0 224
yuuji@0 225 Shared expunge presents a problem due to the requirements of the IMAP
yuuji@0 226 protocol. You can't "burp" away a message until you are certain that
yuuji@0 227 no sharers have a pointer to any longer. Consequently, for the nonce
yuuji@0 228 "burping" out expunged data be defered to an exclusive expunge as in
yuuji@0 229 mbx format.
yuuji@0 230
yuuji@0 231 If shared burping is ever implemented, then care will be needed not to
yuuji@0 232 burp data that a session still relies upon. It's easy enough to burp
yuuji@0 233 the index files; just create new index files, deleting the old, and
yuuji@0 234 require that you look for a new one appearing at mailbox ping time
yuuji@0 235 (when it's safe). The data files are a problem, since we
yuuji@0 236 intentionally don't want to keep them open and do want to avoid quota
yuuji@0 237 problems by overwriting in place. Also, when you burp you have to
yuuji@0 238 change the pointers in the index file.
yuuji@0 239
yuuji@0 240 Bottom line: shared burping is too hairy right now, so the first
yuuji@0 241 version will do exclusive-only burping and not worry about it. If
yuuji@0 242 shared burping is really needed, then that routine will need to be
yuuji@0 243 rewritten.
yuuji@0 244
yuuji@0 245 Shared burping has been a problem for every other IMAP server. Most
yuuji@0 246 get it wrong, and cause terrible confusion to clients (including
yuuji@0 247 client crashes).
yuuji@0 248
yuuji@0 249
yuuji@0 250 6. Message data file file roll out strategy
yuuji@0 251
yuuji@0 252 The current new message file is finalized, and a new one started, when
yuuji@0 253 an append or copy is done that would cause the file to grow to larger
yuuji@0 254 than a preconfigured size (MIXDATAROLL). A multi-message copy or
yuuji@0 255 append is written into its entirety to a single new message file. In
yuuji@0 256 the case of multi-copy, the new message file is switched when the sum
yuuji@0 257 of the sizes of all messages to be copied would cause the current new
yuuji@0 258 message file to exceed MIXDATAROLL. In the case of multi-append, only
yuuji@0 259 the first message is considered; this is due to technical limitations.
yuuji@0 260
yuuji@0 261 7. Error detection
yuuji@0 262
yuuji@0 263 Mix detects bad data in the metadata, index, and status files; and
yuuji@0 264 declares the stream dead. It does not unilaterally reassign
yuuji@0 265 UIDVALIDITY the way that the flat file formats do.
yuuji@0 266
yuuji@0 267 When mix reads a header from the message file, it also reads the
yuuji@0 268 per-message record and verifies that there is a per-message record there.
yuuji@0 269 This is a simple test for message file corruption. It doesn't declare
yuuji@0 270 the stream dead; it simply issues an error message and returns a
yuuji@0 271 zero-length string for the message header. This makes it possible for
yuuji@0 272 the user to fix the mailbox simply by deleting and expunging any messages
yuuji@0 273 that are in this state.
yuuji@0 274
yuuji@0 275
yuuji@0 276 8. Reconstruct tool
yuuji@0 277
yuuji@0 278 [None of this is implemented yet.]
yuuji@0 279
yuuji@0 280 The layout of these files is designed to make the reconstruct tool be
yuuji@0 281 as simple as possible. Much of the need for the reconstruct tool is
yuuji@0 282 eliminated since the mix format has a much more limited scope of
yuuji@0 283 writing than the flat file formats; thus there is "less collateral
yuuji@0 284 damage."
yuuji@0 285
yuuji@0 286 If the metadata file is lost or corrupted, then all keywords are lost;
yuuji@0 287 if the mailbox has any keywords used in the .mixstatus file, it'll be
yuuji@0 288 necessary to create some placeholder names. Otherwise, a new
yuuji@0 289 UIDVALIDITY can be assigned, and a good UIDLAST value calculated by
yuuji@0 290 the reconstruct tool. Since this file is very small, it's not likely
yuuji@0 291 to be damaged.
yuuji@0 292
yuuji@0 293 If the index file is lost or corrupted, it is possible to reconstruct
yuuji@0 294 it with no loss by reading all the data files. However, this could
yuuji@0 295 cause expunged but not yet burped messages to reappear.
yuuji@0 296
yuuji@0 297 If the status file is lost or corrupted, then flags are lost and
yuuji@0 298 will revert to a default state of no flags set. Just deleting the
yuuji@0 299 corrupted file is good enough.
yuuji@0 300
yuuji@0 301 The reconstruct tool can use the per-message record in the message
yuuji@0 302 file to locate messages if the recorded sizes and/or messages are
yuuji@0 303 corrupt. If that happens, it will need to rebuild the index file
yuuji@0 304 (with associated changes to the metadata file to change the
yuuji@0 305 UIDVALIDITY). That should probably be a manual operation and not be
yuuji@0 306 part of the default operation or auto-reconstruct.
yuuji@0 307
yuuji@0 308
yuuji@0 309 9. Locking strategy
yuuji@0 310
yuuji@0 311 The mix format does not use the traditional c-client /tmp file locking.
yuuji@0 312
yuuji@0 313 The metadata file is open and locked whenever the mailbox is open.
yuuji@0 314 Normally this is a shared lock, but it will be upgraded to exclusive
yuuji@0 315 if the mailbox is expunged. As a guard (since there is no true
yuuji@0 316 lock-upgrade/downgrade on UNIX), the index exclusive lock must be
yuuji@0 317 acquired first before upgrading to exclusive.
yuuji@0 318
yuuji@0 319 The index file is shared locked when reading the index, and exclusive
yuuji@0 320 locked (and read) when appending new messages to the index or when
yuuji@0 321 expunging (note that expunging also requires an exclusive lock on
yuuji@0 322 metadata). Normally, the index file is not open or locked.
yuuji@0 323
yuuji@0 324 The status file is shared locked when reading status, and exclusive
yuuji@0 325 locked (and read) when updating status. Normally, the status file is
yuuji@0 326 not open or locked.
yuuji@0 327
yuuji@0 328 It isn't necessary to lock any of the data files as long as we only
yuuji@0 329 have exclusive burping.
yuuji@0 330
yuuji@0 331
yuuji@0 332 10. Memory usage
yuuji@0 333
yuuji@0 334 The mix format returns a file stringstruct, which is the modern
yuuji@0 335 c-client behavior. This prevents imapd from growing to enormous sizes
yuuji@0 336 due to a godzillagram (how it affects other programs depends upon what
yuuji@0 337 they do with the returned stringstruct).
yuuji@0 338
yuuji@0 339
yuuji@0 340 11. Future extensions
yuuji@0 341
yuuji@0 342 Cached ENVELOPE, BODYSTRUCTURE. Cyrus does, and this will eliminate
yuuji@0 343 most of the reason to access the data files. Possibly cached overviews,
yuuji@0 344 ala NNTP, instead?
yuuji@0 345
yuuji@0 346
yuuji@0 347 Support for ANNOTATION.
yuuji@0 348
yuuji@0 349
yuuji@0 350 12. RENAME issues
yuuji@0 351
yuuji@0 352 Mix currently makes no attempt to address the IMAP RENAME problem.
yuuji@0 353 This occurs when a mailbox is deleted, and another mailbox is renamed
yuuji@0 354 with that name in place, no attempt is made to reassign UIDVALIDITY
yuuji@0 355 for this mailbox and all the inferior mailboxes. This potentially can
yuuji@0 356 cause problems for a disconnected-use client that has cached status
yuuji@0 357 for the old mailbox which had that name.
yuuji@0 358
yuuji@0 359 The RENAME problem is a well known flaw in the IMAP protocol. Few
yuuji@0 360 servers correctly handle it (among other things, not only do all the
yuuji@0 361 UIDVALIDITY values have to be changed but this has to be done
yuuji@0 362 atomically!). It was a mistake to add RENAME into IMAP, but it's much
yuuji@0 363 too late to remove it now.

UW-IMAP'd extensions by yuuji