imapext-2007

diff docs/mixfmt.txt @ 0:ada5e610ab86

imap-2007e
author yuuji@gentei.org
date Mon, 14 Sep 2009 15:17:45 +0900
parents
children
line diff
     1.1 --- /dev/null	Thu Jan 01 00:00:00 1970 +0000
     1.2 +++ b/docs/mixfmt.txt	Mon Sep 14 15:17:45 2009 +0900
     1.3 @@ -0,0 +1,363 @@
     1.4 +/* ========================================================================
     1.5 + * Copyright 1988-2006 University of Washington
     1.6 + *
     1.7 + * Licensed under the Apache License, Version 2.0 (the "License");
     1.8 + * you may not use this file except in compliance with the License.
     1.9 + * You may obtain a copy of the License at
    1.10 + *
    1.11 + *     http://www.apache.org/licenses/LICENSE-2.0
    1.12 + *
    1.13 + * 
    1.14 + * ========================================================================
    1.15 + */
    1.16 +
    1.17 +Last update: 18 December 2006
    1.18 +
    1.19 +INTRODUCTION
    1.20 +
    1.21 +This file is the descendant of a design document used to specify the
    1.22 +mix format.  An attempt is being made to keep this document more or
    1.23 +less current with the way the mix format actually works.
    1.24 +
    1.25 +
    1.26 +1. Mix mailbox naming
    1.27 +
    1.28 +Mailbox names correspond to directory names; thus mix format mailboxes
    1.29 +are "dual-use" (lack both \NoInferiors and \NoSelect).  This will
    1.30 +satisfy some long-standing requests.
    1.31 +
    1.32 +
    1.33 +2. Mailbox files
    1.34 +
    1.35 +A mix format mailbox is a directory with regular files with filenames
    1.36 +of:
    1.37 +	.mixmeta	mailbox metadata file
    1.38 +	.mixindex	message index file (message static data)
    1.39 +	.mixstatus	message status file (message dynamic data)
    1.40 +	.mix########	(where ######### is a <hex8>) secondary message
    1.41 +			 data files.
    1.42 +	.mix		primary message data file (used in experimental
    1.43 +			 versions, supported for compatibility only)
    1.44 +
    1.45 +2.1 Metadata, index, and status files
    1.46 +
    1.47 +The mailbox metadata, index, and status files contain a sequence of
    1.48 +CRLF-terminated lines.  These files have an update sequence, which is
    1.49 +a strictly-ascending sequence value.  Any time the file is changed,
    1.50 +the update sequence is increased; this allows easy detection of
    1.51 +whether the file has been changed by another process.  For now, this
    1.52 +update sequence is a modseq (see below).
    1.53 +
    1.54 +2.1.1 Metadata file
    1.55 +
    1.56 +The mailbox metadata file is called ".mixmeta".  It contains a series
    1.57 +of CRLF-terminated lines.  The first character of the line is a key that
    1.58 +identifies the payload of the line, and the remainder of the line is the
    1.59 +payload.
    1.60 +	Key	Payload
    1.61 +	---	-------
    1.62 +	 S	<hex8>			;; update sequence
    1.63 +	 V	<hex8>			;; UIDVALIDITY
    1.64 +	 L	<hex8>			;; UIDLAST
    1.65 +	 N	<hex8>			;; current new message file
    1.66 +	 K	[atom 0*(SP atom)]	;; keyword list
    1.67 +
    1.68 +All other keys are reserved for future assignment and must be ignored
    1.69 +(and may be discarded) by software which does not recognize them.  The
    1.70 +mailbox metadata file is rewritten as part of new mail delivery (so
    1.71 +APPENDUID/COPYUID can work) and when new keywords are added.
    1.72 +
    1.73 +2.1.2 Message static index file
    1.74 +
    1.75 +The mailbox message static index file is called ".mixindex".  It contains
    1.76 +a series of CRLF-terminated lines.  The first character of the line is a
    1.77 +key that identifies the payload of the line, and the remainder of the line
    1.78 +is the payload.
    1.79 +	Key	Payload
    1.80 +	---	-------
    1.81 +	 S	<hex8>			;; update sequence
    1.82 +	 :	<uid>:<date>:<size>:<file>:<pos>:<isiz>:<hsiz>
    1.83 +	 				;; per-message record
    1.84 +
    1.85 +The per-message records contain the following data:
    1.86 +	<uid>  = <hex8>			;; message UID
    1.87 +	<date> = <yyyymmddhhmmss+zzzz>	;; internal date
    1.88 +	<size> = <hex8>			;; rfc822.size
    1.89 +	<file> = <hex8>			;; message data file (0 = .mix file)
    1.90 +	<pos>  = <hex8>			;; message position in file
    1.91 +	<isiz> = <hex8>			;; message internal data size
    1.92 +	<hsiz> = <hex8>			;; header size (offset to body)
    1.93 +
    1.94 +All other keys, and subsequent fields in per-message records, are
    1.95 +reserved for future assignment and must be ignored (and may be
    1.96 +discarded) by software which does not recognize them.  The mailbox
    1.97 +metadata file is appended by new mail delivery and rewritten by
    1.98 +expunge "burping", and otherwise is not altered.
    1.99 +
   1.100 +2.1.3 Message dynamic status file
   1.101 +
   1.102 +The mailbox message dynamic status file is called ".mixstatus".  It contains
   1.103 +a series of CRLF-terminated lines.  The first character of the line is a
   1.104 +key that identifies the payload of the line, and the remainder of the line
   1.105 +is the payload.
   1.106 +   	Key	Payload
   1.107 +	---	-------
   1.108 +	 S	<hex8>			;; update sequence
   1.109 +	 :	<uid>:<uf>:<sf>:<mod>:	;; per-message record
   1.110 +
   1.111 +The per-message records contain the following data:
   1.112 +	<uid>  = <hex8>			;; message UID
   1.113 +	<keys> = <hex8>			;; keyword flags
   1.114 +	<flag> = <hex4>			;; system flags
   1.115 +	<mod>  = <hex8>			;; date/time last modified (modseq)
   1.116 +
   1.117 +All other keys, and subsequent fields in per-message records, are
   1.118 +reserved for future assignment and must be ignored (and may be
   1.119 +discarded) by software which does not recognize them.  The mailbox
   1.120 +dynamic idex file is rewritten by flag changes (or any future change
   1.121 +that alters dynamic data) and is re-read when a session sees that the
   1.122 +mtime has changed (atime and ctime are not used).
   1.123 +
   1.124 +The modseq is an unsigned 32-bit date/time, along with a guarantee
   1.125 +that this value can not go backwards.  It currently corresponds to the
   1.126 +time from time(); however, since it is unsigned, it won't run out until
   1.127 +the year 2106.  In the future, this may be used as a basic for implementing
   1.128 +the IMAP CONDSTORE extension.
   1.129 +
   1.130 +2.2 Message data files
   1.131 +
   1.132 +A mix message file is a regular file with filename starting with
   1.133 +".mix" followed by a <hex8> suffix which indicates the file number.  It
   1.134 +contains a series of CRLF-terminated lines.  By special dispensation, the
   1.135 +filename ".mix" is used for file number 0, which was used in experimental
   1.136 +versions of mix as a "primary" file (this concept no longer exists).
   1.137 +
   1.138 +A file number is set to the current modseq when it is created.  If a copy
   1.139 +or append causes the file to exceed the compiled-in file size limit, a new
   1.140 +file is started and the metadata is updated accordingly.
   1.141 +
   1.142 +Preceeding each message is per-message record with the following format:
   1.143 +   	Key	Payload
   1.144 +	---	-------
   1.145 +					;; per-message record
   1.146 +	:	:<code>:<uid>:<date>:<size>:
   1.147 +
   1.148 +The per-message records contain the following data:
   1.149 +	<code> = "msg"			;; fixed code
   1.150 +	<uid>  = <hex8>			;; message UID
   1.151 +	<date> = <yyyymmddhhmmss+zzzz>	;; internal date
   1.152 +	<size> = <hex8>			;; rfc822.size
   1.153 +The message data begins on the next line
   1.154 +
   1.155 +Subsequent fields are reserved for future assignment and must be ignored.
   1.156 +
   1.157 +
   1.158 +3. New mail delivery
   1.159 +
   1.160 +To deliver a new message, it is necessary to share lock the destination
   1.161 +metadata file, then get an exclusive lock on the destination index and
   1.162 +status files.  Once this is done, the new message data is appended to the
   1.163 +new message file.  The metadata (UIDLAST value), index, and status
   1.164 +files are all updated to add the new message.
   1.165 +
   1.166 +Then all the destination mailbox files are closed.
   1.167 +
   1.168 +
   1.169 +4. Mailbox pinging
   1.170 +
   1.171 +The index and status files are share locked.  Initially, sequences are
   1.172 +remembered as zero, so at open time they are always "altered".
   1.173 +
   1.174 +The sequence from the index file is checked; if it is altered the index
   1.175 +file is read and processed as follows:
   1.176 + . If expunge is permitted, then any messages that are not in the index
   1.177 +   are reported as having been expunged via mm_expunged().
   1.178 + . new messages are announced via mm_exists()/mm_recent().
   1.179 +
   1.180 +Next, the sequence from the status file is checked.  If it is altered,
   1.181 +the status file is read and the status updated for any message which is
   1.182 +new or has an altered modseq in the status file.  Altered modseq messages
   1.183 +are announced via mm_flags().
   1.184 +
   1.185 +Then the index and status files are closed.
   1.186 +
   1.187 +
   1.188 +4. Flag alteration
   1.189 +
   1.190 +The status file is exclusive locked.
   1.191 +
   1.192 +The sequence from the status file is checked.  If it is altered, the
   1.193 +status file is read and the status updated for any message which is
   1.194 +new or has an altered modseq in the status file.  Altered modseq
   1.195 +messages are announced via mm_flags().
   1.196 +
   1.197 +The alterations are then applied for all requested messages, updating
   1.198 +the modseq for each requestedmessage which changes flags as a result
   1.199 +of the alteration (alterations which do not result in a change do not
   1.200 +alter the modseq).  Then the status file is rewritten with a new
   1.201 +sequence, but only if flags of at least one message was changed.
   1.202 +
   1.203 +Then the status file is closed.
   1.204 +
   1.205 +
   1.206 +5. Checkpoint and expunge
   1.207 +
   1.208 +Checkpoint is identical to expunge, however it skips the step of expunging
   1.209 +deleted messages.
   1.210 +
   1.211 +The index and status files are locked exclusive.  If expunging, all
   1.212 +deleted messages are expunged from the index and announced via
   1.213 +mm_expunged().  The message data is notremoved at this time.
   1.214 +
   1.215 +If a checkpoint was requested, or if any messages were expunged, or if
   1.216 +it remembered that a "burp" was needed, then:
   1.217 + . the metadata file is locked exclusive.  If this fails, remember that
   1.218 +   a burp is needed.  Otherwise perform a burp:
   1.219 +   . calculate the file byte ranges occupied by expunged messages
   1.220 +   . for each file needing "burping", open and slide down subsequent file
   1.221 +     data on top of the expunged messages
   1.222 + . update the index and status files
   1.223 +
   1.224 +Then the index and status files are closed.
   1.225 +
   1.226 +5.1 More details on expunging and "burping"
   1.227 +
   1.228 +Shared expunge presents a problem due to the requirements of the IMAP
   1.229 +protocol.  You can't "burp" away a message until you are certain that
   1.230 +no sharers have a pointer to any longer.  Consequently, for the nonce
   1.231 +"burping" out expunged data be defered to an exclusive expunge as in
   1.232 +mbx format.
   1.233 +
   1.234 +If shared burping is ever implemented, then care will be needed not to
   1.235 +burp data that a session still relies upon.  It's easy enough to burp
   1.236 +the index files; just create new index files, deleting the old, and
   1.237 +require that you look for a new one appearing at mailbox ping time
   1.238 +(when it's safe).  The data files are a problem, since we
   1.239 +intentionally don't want to keep them open and do want to avoid quota
   1.240 +problems by overwriting in place.  Also, when you burp you have to
   1.241 +change the pointers in the index file.
   1.242 +
   1.243 +Bottom line: shared burping is too hairy right now, so the first
   1.244 +version will do exclusive-only burping and not worry about it.  If
   1.245 +shared burping is really needed, then that routine will need to be
   1.246 +rewritten.
   1.247 +
   1.248 +Shared burping has been a problem for every other IMAP server.  Most
   1.249 +get it wrong, and cause terrible confusion to clients (including
   1.250 +client crashes).
   1.251 +
   1.252 +
   1.253 +6. Message data file file roll out strategy
   1.254 +
   1.255 +The current new message file is finalized, and a new one started, when
   1.256 +an append or copy is done that would cause the file to grow to larger
   1.257 +than a preconfigured size (MIXDATAROLL).  A multi-message copy or
   1.258 +append is written into its entirety to a single new message file.  In
   1.259 +the case of multi-copy, the new message file is switched when the sum
   1.260 +of the sizes of all messages to be copied would cause the current new
   1.261 +message file to exceed MIXDATAROLL.  In the case of multi-append, only
   1.262 +the first message is considered; this is due to technical limitations.
   1.263 +
   1.264 +7. Error detection
   1.265 +
   1.266 +Mix detects bad data in the metadata, index, and status files; and
   1.267 +declares the stream dead.  It does not unilaterally reassign
   1.268 +UIDVALIDITY the way that the flat file formats do.
   1.269 +
   1.270 +When mix reads a header from the message file, it also reads the
   1.271 +per-message record and verifies that there is a per-message record there.
   1.272 +This is a simple test for message file corruption.  It doesn't declare
   1.273 +the stream dead; it simply issues an error message and returns a
   1.274 +zero-length string for the message header.  This makes it possible for
   1.275 +the user to fix the mailbox simply by deleting and expunging any messages
   1.276 +that are in this state.
   1.277 +
   1.278 +
   1.279 +8. Reconstruct tool
   1.280 +
   1.281 +[None of this is implemented yet.]
   1.282 +
   1.283 +The layout of these files is designed to make the reconstruct tool be
   1.284 +as simple as possible.  Much of the need for the reconstruct tool is
   1.285 +eliminated since the mix format has a much more limited scope of
   1.286 +writing than the flat file formats; thus there is "less collateral
   1.287 +damage."
   1.288 +
   1.289 +If the metadata file is lost or corrupted, then all keywords are lost;
   1.290 +if the mailbox has any keywords used in the .mixstatus file, it'll be
   1.291 +necessary to create some placeholder names.  Otherwise, a new
   1.292 +UIDVALIDITY can be assigned, and a good UIDLAST value calculated by
   1.293 +the reconstruct tool.  Since this file is very small, it's not likely
   1.294 +to be damaged.
   1.295 +
   1.296 +If the index file is lost or corrupted, it is possible to reconstruct
   1.297 +it with no loss by reading all the data files.  However, this could
   1.298 +cause expunged but not yet burped messages to reappear.
   1.299 +
   1.300 +If the status file is lost or corrupted, then flags are lost and
   1.301 +will revert to a default state of no flags set.  Just deleting the
   1.302 +corrupted file is good enough.
   1.303 +
   1.304 +The reconstruct tool can use the per-message record in the message
   1.305 +file to locate messages if the recorded sizes and/or messages are
   1.306 +corrupt.  If that happens, it will need to rebuild the index file
   1.307 +(with associated changes to the metadata file to change the
   1.308 +UIDVALIDITY).  That should probably be a manual operation and not be
   1.309 +part of the default operation or auto-reconstruct.
   1.310 +
   1.311 +
   1.312 +9. Locking strategy
   1.313 +
   1.314 +The mix format does not use the traditional c-client /tmp file locking.
   1.315 +
   1.316 +The metadata file is open and locked whenever the mailbox is open.
   1.317 +Normally this is a shared lock, but it will be upgraded to exclusive
   1.318 +if the mailbox is expunged.  As a guard (since there is no true
   1.319 +lock-upgrade/downgrade on UNIX), the index exclusive lock must be
   1.320 +acquired first before upgrading to exclusive.
   1.321 +
   1.322 +The index file is shared locked when reading the index, and exclusive
   1.323 +locked (and read) when appending new messages to the index or when
   1.324 +expunging (note that expunging also requires an exclusive lock on
   1.325 +metadata).  Normally, the index file is not open or locked.
   1.326 +
   1.327 +The status file is shared locked when reading status, and exclusive
   1.328 +locked (and read) when updating status.  Normally, the status file is
   1.329 +not open or locked.
   1.330 +
   1.331 +It isn't necessary to lock any of the data files as long as we only
   1.332 +have exclusive burping.
   1.333 +
   1.334 +
   1.335 +10. Memory usage
   1.336 +
   1.337 +The mix format returns a file stringstruct, which is the modern
   1.338 +c-client behavior.  This prevents imapd from growing to enormous sizes
   1.339 +due to a godzillagram (how it affects other programs depends upon what
   1.340 +they do with the returned stringstruct).
   1.341 +
   1.342 +
   1.343 +11. Future extensions
   1.344 +
   1.345 +Cached ENVELOPE, BODYSTRUCTURE.  Cyrus does, and this will eliminate
   1.346 +most of the reason to access the data files.  Possibly cached overviews,
   1.347 +ala NNTP, instead?
   1.348 +
   1.349 +
   1.350 +Support for ANNOTATION.
   1.351 +
   1.352 +
   1.353 +12. RENAME issues
   1.354 +
   1.355 +Mix currently makes no attempt to address the IMAP RENAME problem.
   1.356 +This occurs when a mailbox is deleted, and another mailbox is renamed
   1.357 +with that name in place, no attempt is made to reassign UIDVALIDITY
   1.358 +for this mailbox and all the inferior mailboxes.  This potentially can
   1.359 +cause problems for a disconnected-use client that has cached status
   1.360 +for the old mailbox which had that name.
   1.361 +
   1.362 +The RENAME problem is a well known flaw in the IMAP protocol.  Few
   1.363 +servers correctly handle it (among other things, not only do all the
   1.364 +UIDVALIDITY values have to be changed but this has to be done
   1.365 +atomically!).  It was a mistake to add RENAME into IMAP, but it's much
   1.366 +too late to remove it now.

UW-IMAP'd extensions by yuuji