imapext-2007
diff docs/mixfmt.txt @ 0:ada5e610ab86
imap-2007e
author | yuuji@gentei.org |
---|---|
date | Mon, 14 Sep 2009 15:17:45 +0900 |
parents | |
children |
line diff
1.1 --- /dev/null Thu Jan 01 00:00:00 1970 +0000 1.2 +++ b/docs/mixfmt.txt Mon Sep 14 15:17:45 2009 +0900 1.3 @@ -0,0 +1,363 @@ 1.4 +/* ======================================================================== 1.5 + * Copyright 1988-2006 University of Washington 1.6 + * 1.7 + * Licensed under the Apache License, Version 2.0 (the "License"); 1.8 + * you may not use this file except in compliance with the License. 1.9 + * You may obtain a copy of the License at 1.10 + * 1.11 + * http://www.apache.org/licenses/LICENSE-2.0 1.12 + * 1.13 + * 1.14 + * ======================================================================== 1.15 + */ 1.16 + 1.17 +Last update: 18 December 2006 1.18 + 1.19 +INTRODUCTION 1.20 + 1.21 +This file is the descendant of a design document used to specify the 1.22 +mix format. An attempt is being made to keep this document more or 1.23 +less current with the way the mix format actually works. 1.24 + 1.25 + 1.26 +1. Mix mailbox naming 1.27 + 1.28 +Mailbox names correspond to directory names; thus mix format mailboxes 1.29 +are "dual-use" (lack both \NoInferiors and \NoSelect). This will 1.30 +satisfy some long-standing requests. 1.31 + 1.32 + 1.33 +2. Mailbox files 1.34 + 1.35 +A mix format mailbox is a directory with regular files with filenames 1.36 +of: 1.37 + .mixmeta mailbox metadata file 1.38 + .mixindex message index file (message static data) 1.39 + .mixstatus message status file (message dynamic data) 1.40 + .mix######## (where ######### is a <hex8>) secondary message 1.41 + data files. 1.42 + .mix primary message data file (used in experimental 1.43 + versions, supported for compatibility only) 1.44 + 1.45 +2.1 Metadata, index, and status files 1.46 + 1.47 +The mailbox metadata, index, and status files contain a sequence of 1.48 +CRLF-terminated lines. These files have an update sequence, which is 1.49 +a strictly-ascending sequence value. Any time the file is changed, 1.50 +the update sequence is increased; this allows easy detection of 1.51 +whether the file has been changed by another process. For now, this 1.52 +update sequence is a modseq (see below). 1.53 + 1.54 +2.1.1 Metadata file 1.55 + 1.56 +The mailbox metadata file is called ".mixmeta". It contains a series 1.57 +of CRLF-terminated lines. The first character of the line is a key that 1.58 +identifies the payload of the line, and the remainder of the line is the 1.59 +payload. 1.60 + Key Payload 1.61 + --- ------- 1.62 + S <hex8> ;; update sequence 1.63 + V <hex8> ;; UIDVALIDITY 1.64 + L <hex8> ;; UIDLAST 1.65 + N <hex8> ;; current new message file 1.66 + K [atom 0*(SP atom)] ;; keyword list 1.67 + 1.68 +All other keys are reserved for future assignment and must be ignored 1.69 +(and may be discarded) by software which does not recognize them. The 1.70 +mailbox metadata file is rewritten as part of new mail delivery (so 1.71 +APPENDUID/COPYUID can work) and when new keywords are added. 1.72 + 1.73 +2.1.2 Message static index file 1.74 + 1.75 +The mailbox message static index file is called ".mixindex". It contains 1.76 +a series of CRLF-terminated lines. The first character of the line is a 1.77 +key that identifies the payload of the line, and the remainder of the line 1.78 +is the payload. 1.79 + Key Payload 1.80 + --- ------- 1.81 + S <hex8> ;; update sequence 1.82 + : <uid>:<date>:<size>:<file>:<pos>:<isiz>:<hsiz> 1.83 + ;; per-message record 1.84 + 1.85 +The per-message records contain the following data: 1.86 + <uid> = <hex8> ;; message UID 1.87 + <date> = <yyyymmddhhmmss+zzzz> ;; internal date 1.88 + <size> = <hex8> ;; rfc822.size 1.89 + <file> = <hex8> ;; message data file (0 = .mix file) 1.90 + <pos> = <hex8> ;; message position in file 1.91 + <isiz> = <hex8> ;; message internal data size 1.92 + <hsiz> = <hex8> ;; header size (offset to body) 1.93 + 1.94 +All other keys, and subsequent fields in per-message records, are 1.95 +reserved for future assignment and must be ignored (and may be 1.96 +discarded) by software which does not recognize them. The mailbox 1.97 +metadata file is appended by new mail delivery and rewritten by 1.98 +expunge "burping", and otherwise is not altered. 1.99 + 1.100 +2.1.3 Message dynamic status file 1.101 + 1.102 +The mailbox message dynamic status file is called ".mixstatus". It contains 1.103 +a series of CRLF-terminated lines. The first character of the line is a 1.104 +key that identifies the payload of the line, and the remainder of the line 1.105 +is the payload. 1.106 + Key Payload 1.107 + --- ------- 1.108 + S <hex8> ;; update sequence 1.109 + : <uid>:<uf>:<sf>:<mod>: ;; per-message record 1.110 + 1.111 +The per-message records contain the following data: 1.112 + <uid> = <hex8> ;; message UID 1.113 + <keys> = <hex8> ;; keyword flags 1.114 + <flag> = <hex4> ;; system flags 1.115 + <mod> = <hex8> ;; date/time last modified (modseq) 1.116 + 1.117 +All other keys, and subsequent fields in per-message records, are 1.118 +reserved for future assignment and must be ignored (and may be 1.119 +discarded) by software which does not recognize them. The mailbox 1.120 +dynamic idex file is rewritten by flag changes (or any future change 1.121 +that alters dynamic data) and is re-read when a session sees that the 1.122 +mtime has changed (atime and ctime are not used). 1.123 + 1.124 +The modseq is an unsigned 32-bit date/time, along with a guarantee 1.125 +that this value can not go backwards. It currently corresponds to the 1.126 +time from time(); however, since it is unsigned, it won't run out until 1.127 +the year 2106. In the future, this may be used as a basic for implementing 1.128 +the IMAP CONDSTORE extension. 1.129 + 1.130 +2.2 Message data files 1.131 + 1.132 +A mix message file is a regular file with filename starting with 1.133 +".mix" followed by a <hex8> suffix which indicates the file number. It 1.134 +contains a series of CRLF-terminated lines. By special dispensation, the 1.135 +filename ".mix" is used for file number 0, which was used in experimental 1.136 +versions of mix as a "primary" file (this concept no longer exists). 1.137 + 1.138 +A file number is set to the current modseq when it is created. If a copy 1.139 +or append causes the file to exceed the compiled-in file size limit, a new 1.140 +file is started and the metadata is updated accordingly. 1.141 + 1.142 +Preceeding each message is per-message record with the following format: 1.143 + Key Payload 1.144 + --- ------- 1.145 + ;; per-message record 1.146 + : :<code>:<uid>:<date>:<size>: 1.147 + 1.148 +The per-message records contain the following data: 1.149 + <code> = "msg" ;; fixed code 1.150 + <uid> = <hex8> ;; message UID 1.151 + <date> = <yyyymmddhhmmss+zzzz> ;; internal date 1.152 + <size> = <hex8> ;; rfc822.size 1.153 +The message data begins on the next line 1.154 + 1.155 +Subsequent fields are reserved for future assignment and must be ignored. 1.156 + 1.157 + 1.158 +3. New mail delivery 1.159 + 1.160 +To deliver a new message, it is necessary to share lock the destination 1.161 +metadata file, then get an exclusive lock on the destination index and 1.162 +status files. Once this is done, the new message data is appended to the 1.163 +new message file. The metadata (UIDLAST value), index, and status 1.164 +files are all updated to add the new message. 1.165 + 1.166 +Then all the destination mailbox files are closed. 1.167 + 1.168 + 1.169 +4. Mailbox pinging 1.170 + 1.171 +The index and status files are share locked. Initially, sequences are 1.172 +remembered as zero, so at open time they are always "altered". 1.173 + 1.174 +The sequence from the index file is checked; if it is altered the index 1.175 +file is read and processed as follows: 1.176 + . If expunge is permitted, then any messages that are not in the index 1.177 + are reported as having been expunged via mm_expunged(). 1.178 + . new messages are announced via mm_exists()/mm_recent(). 1.179 + 1.180 +Next, the sequence from the status file is checked. If it is altered, 1.181 +the status file is read and the status updated for any message which is 1.182 +new or has an altered modseq in the status file. Altered modseq messages 1.183 +are announced via mm_flags(). 1.184 + 1.185 +Then the index and status files are closed. 1.186 + 1.187 + 1.188 +4. Flag alteration 1.189 + 1.190 +The status file is exclusive locked. 1.191 + 1.192 +The sequence from the status file is checked. If it is altered, the 1.193 +status file is read and the status updated for any message which is 1.194 +new or has an altered modseq in the status file. Altered modseq 1.195 +messages are announced via mm_flags(). 1.196 + 1.197 +The alterations are then applied for all requested messages, updating 1.198 +the modseq for each requestedmessage which changes flags as a result 1.199 +of the alteration (alterations which do not result in a change do not 1.200 +alter the modseq). Then the status file is rewritten with a new 1.201 +sequence, but only if flags of at least one message was changed. 1.202 + 1.203 +Then the status file is closed. 1.204 + 1.205 + 1.206 +5. Checkpoint and expunge 1.207 + 1.208 +Checkpoint is identical to expunge, however it skips the step of expunging 1.209 +deleted messages. 1.210 + 1.211 +The index and status files are locked exclusive. If expunging, all 1.212 +deleted messages are expunged from the index and announced via 1.213 +mm_expunged(). The message data is notremoved at this time. 1.214 + 1.215 +If a checkpoint was requested, or if any messages were expunged, or if 1.216 +it remembered that a "burp" was needed, then: 1.217 + . the metadata file is locked exclusive. If this fails, remember that 1.218 + a burp is needed. Otherwise perform a burp: 1.219 + . calculate the file byte ranges occupied by expunged messages 1.220 + . for each file needing "burping", open and slide down subsequent file 1.221 + data on top of the expunged messages 1.222 + . update the index and status files 1.223 + 1.224 +Then the index and status files are closed. 1.225 + 1.226 +5.1 More details on expunging and "burping" 1.227 + 1.228 +Shared expunge presents a problem due to the requirements of the IMAP 1.229 +protocol. You can't "burp" away a message until you are certain that 1.230 +no sharers have a pointer to any longer. Consequently, for the nonce 1.231 +"burping" out expunged data be defered to an exclusive expunge as in 1.232 +mbx format. 1.233 + 1.234 +If shared burping is ever implemented, then care will be needed not to 1.235 +burp data that a session still relies upon. It's easy enough to burp 1.236 +the index files; just create new index files, deleting the old, and 1.237 +require that you look for a new one appearing at mailbox ping time 1.238 +(when it's safe). The data files are a problem, since we 1.239 +intentionally don't want to keep them open and do want to avoid quota 1.240 +problems by overwriting in place. Also, when you burp you have to 1.241 +change the pointers in the index file. 1.242 + 1.243 +Bottom line: shared burping is too hairy right now, so the first 1.244 +version will do exclusive-only burping and not worry about it. If 1.245 +shared burping is really needed, then that routine will need to be 1.246 +rewritten. 1.247 + 1.248 +Shared burping has been a problem for every other IMAP server. Most 1.249 +get it wrong, and cause terrible confusion to clients (including 1.250 +client crashes). 1.251 + 1.252 + 1.253 +6. Message data file file roll out strategy 1.254 + 1.255 +The current new message file is finalized, and a new one started, when 1.256 +an append or copy is done that would cause the file to grow to larger 1.257 +than a preconfigured size (MIXDATAROLL). A multi-message copy or 1.258 +append is written into its entirety to a single new message file. In 1.259 +the case of multi-copy, the new message file is switched when the sum 1.260 +of the sizes of all messages to be copied would cause the current new 1.261 +message file to exceed MIXDATAROLL. In the case of multi-append, only 1.262 +the first message is considered; this is due to technical limitations. 1.263 + 1.264 +7. Error detection 1.265 + 1.266 +Mix detects bad data in the metadata, index, and status files; and 1.267 +declares the stream dead. It does not unilaterally reassign 1.268 +UIDVALIDITY the way that the flat file formats do. 1.269 + 1.270 +When mix reads a header from the message file, it also reads the 1.271 +per-message record and verifies that there is a per-message record there. 1.272 +This is a simple test for message file corruption. It doesn't declare 1.273 +the stream dead; it simply issues an error message and returns a 1.274 +zero-length string for the message header. This makes it possible for 1.275 +the user to fix the mailbox simply by deleting and expunging any messages 1.276 +that are in this state. 1.277 + 1.278 + 1.279 +8. Reconstruct tool 1.280 + 1.281 +[None of this is implemented yet.] 1.282 + 1.283 +The layout of these files is designed to make the reconstruct tool be 1.284 +as simple as possible. Much of the need for the reconstruct tool is 1.285 +eliminated since the mix format has a much more limited scope of 1.286 +writing than the flat file formats; thus there is "less collateral 1.287 +damage." 1.288 + 1.289 +If the metadata file is lost or corrupted, then all keywords are lost; 1.290 +if the mailbox has any keywords used in the .mixstatus file, it'll be 1.291 +necessary to create some placeholder names. Otherwise, a new 1.292 +UIDVALIDITY can be assigned, and a good UIDLAST value calculated by 1.293 +the reconstruct tool. Since this file is very small, it's not likely 1.294 +to be damaged. 1.295 + 1.296 +If the index file is lost or corrupted, it is possible to reconstruct 1.297 +it with no loss by reading all the data files. However, this could 1.298 +cause expunged but not yet burped messages to reappear. 1.299 + 1.300 +If the status file is lost or corrupted, then flags are lost and 1.301 +will revert to a default state of no flags set. Just deleting the 1.302 +corrupted file is good enough. 1.303 + 1.304 +The reconstruct tool can use the per-message record in the message 1.305 +file to locate messages if the recorded sizes and/or messages are 1.306 +corrupt. If that happens, it will need to rebuild the index file 1.307 +(with associated changes to the metadata file to change the 1.308 +UIDVALIDITY). That should probably be a manual operation and not be 1.309 +part of the default operation or auto-reconstruct. 1.310 + 1.311 + 1.312 +9. Locking strategy 1.313 + 1.314 +The mix format does not use the traditional c-client /tmp file locking. 1.315 + 1.316 +The metadata file is open and locked whenever the mailbox is open. 1.317 +Normally this is a shared lock, but it will be upgraded to exclusive 1.318 +if the mailbox is expunged. As a guard (since there is no true 1.319 +lock-upgrade/downgrade on UNIX), the index exclusive lock must be 1.320 +acquired first before upgrading to exclusive. 1.321 + 1.322 +The index file is shared locked when reading the index, and exclusive 1.323 +locked (and read) when appending new messages to the index or when 1.324 +expunging (note that expunging also requires an exclusive lock on 1.325 +metadata). Normally, the index file is not open or locked. 1.326 + 1.327 +The status file is shared locked when reading status, and exclusive 1.328 +locked (and read) when updating status. Normally, the status file is 1.329 +not open or locked. 1.330 + 1.331 +It isn't necessary to lock any of the data files as long as we only 1.332 +have exclusive burping. 1.333 + 1.334 + 1.335 +10. Memory usage 1.336 + 1.337 +The mix format returns a file stringstruct, which is the modern 1.338 +c-client behavior. This prevents imapd from growing to enormous sizes 1.339 +due to a godzillagram (how it affects other programs depends upon what 1.340 +they do with the returned stringstruct). 1.341 + 1.342 + 1.343 +11. Future extensions 1.344 + 1.345 +Cached ENVELOPE, BODYSTRUCTURE. Cyrus does, and this will eliminate 1.346 +most of the reason to access the data files. Possibly cached overviews, 1.347 +ala NNTP, instead? 1.348 + 1.349 + 1.350 +Support for ANNOTATION. 1.351 + 1.352 + 1.353 +12. RENAME issues 1.354 + 1.355 +Mix currently makes no attempt to address the IMAP RENAME problem. 1.356 +This occurs when a mailbox is deleted, and another mailbox is renamed 1.357 +with that name in place, no attempt is made to reassign UIDVALIDITY 1.358 +for this mailbox and all the inferior mailboxes. This potentially can 1.359 +cause problems for a disconnected-use client that has cached status 1.360 +for the old mailbox which had that name. 1.361 + 1.362 +The RENAME problem is a well known flaw in the IMAP protocol. Few 1.363 +servers correctly handle it (among other things, not only do all the 1.364 +UIDVALIDITY values have to be changed but this has to be done 1.365 +atomically!). It was a mistake to add RENAME into IMAP, but it's much 1.366 +too late to remove it now.