imapext-2007

annotate docs/formats.txt @ 0:ada5e610ab86

imap-2007e
author yuuji@gentei.org
date Mon, 14 Sep 2009 15:17:45 +0900
parents
children
rev   line source
yuuji@0 1 /* ========================================================================
yuuji@0 2 * Copyright 1988-2006 University of Washington
yuuji@0 3 *
yuuji@0 4 * Licensed under the Apache License, Version 2.0 (the "License");
yuuji@0 5 * you may not use this file except in compliance with the License.
yuuji@0 6 * You may obtain a copy of the License at
yuuji@0 7 *
yuuji@0 8 * http://www.apache.org/licenses/LICENSE-2.0
yuuji@0 9 *
yuuji@0 10 *
yuuji@0 11 * ========================================================================
yuuji@0 12 */
yuuji@0 13
yuuji@0 14 Mailbox Format Characteristics
yuuji@0 15 Mark Crispin
yuuji@0 16 11 December 2006
yuuji@0 17
yuuji@0 18
yuuji@0 19 When a mailbox storage technology uses local files and
yuuji@0 20 directories directly, the file(s) and directories are layed out in a
yuuji@0 21 mailbox format.
yuuji@0 22
yuuji@0 23 I. Flat-File Formats
yuuji@0 24
yuuji@0 25 In these formats, a mailbox and all the messages inside are a
yuuji@0 26 single file on the filesystem. The mailbox name is the name of the
yuuji@0 27 file in the filesystem, relative to the user's "mail home directory."
yuuji@0 28
yuuji@0 29 A flat-file format mailbox is always a file, never a directory.
yuuji@0 30 This means that it is impossible to have a flat-file format mailbox
yuuji@0 31 that has inferior mailbox names under it (so-called "dual-usage"
yuuji@0 32 mailboxes). For some inexplicable reason, some people want this.
yuuji@0 33
yuuji@0 34 The mail home directory is usually the same as the user login
yuuji@0 35 home directory if that concept is meaningful; otherwise, it is some
yuuji@0 36 other default directory (e.g. "C:\My Documents" on Windows 98). This
yuuji@0 37 can be redefined by modifying the c-client source code or in an
yuuji@0 38 application via the SET_HOMEDIR mail_parameters() call.
yuuji@0 39
yuuji@0 40 For example, a mailbox named "project" is likely to be found in
yuuji@0 41 the file "project" in the user's home directory. Similarly, a mailbox
yuuji@0 42 named "test/trial1" (assuming a UNIX system) is likely to be found in
yuuji@0 43 the file "trial1" in the subdirectory "test" in the user's home
yuuji@0 44 directory.
yuuji@0 45
yuuji@0 46 Note that the name "INBOX" has special semantics and rules, as
yuuji@0 47 described in the file naming.txt.
yuuji@0 48
yuuji@0 49 The following flat-file formats are supported by c-client as of
yuuji@0 50 the time of this writing:
yuuji@0 51
yuuji@0 52 . unix This is the traditional UNIX mailbox format, in use for nearly
yuuji@0 53 30 years. It uses a line starting with "From " to indicate
yuuji@0 54 start of message, and stores the message status inside the
yuuji@0 55 RFC822 message header.
yuuji@0 56
yuuji@0 57 unix is not particularly efficient; the entire mailbox file
yuuji@0 58 must be read when the mailbox is open, and when reading message
yuuji@0 59 texts it is necessary to convert the newline convention to
yuuji@0 60 Internet standard CR LF form. unix preserves UIDs, and allows
yuuji@0 61 the creation of keywords.
yuuji@0 62
yuuji@0 63 Only one process may have a unix-format mailbox open
yuuji@0 64 read/write at a time.
yuuji@0 65
yuuji@0 66 . mmdf This is the format used by the MMDF mailer. It uses a line
yuuji@0 67 consisting of 4 <CTRL/A> (0x01) characters to indicate start
yuuji@0 68 and end of message. Optionally, there may also be a unix
yuuji@0 69 format "From " line. It otherwise has the same
yuuji@0 70 characteristics as unix format.
yuuji@0 71
yuuji@0 72 . mbx This is the current preferred mailbox format. It can be
yuuji@0 73 handled quite efficiently by c-client, without the problems
yuuji@0 74 that exist with unix and mmdf formats. Messages are stored
yuuji@0 75 in Internet standard CR LF format.
yuuji@0 76
yuuji@0 77 mbx permits shared access, including shared expunge. It
yuuji@0 78 preserves UIDs, and allows the creation of keywords.
yuuji@0 79
yuuji@0 80 . mtx This is supported for compatibility with the past. This is
yuuji@0 81 the old Tenex/TOPS-20 mail.txt format. It can be handled
yuuji@0 82 quite efficiently by c-client, and has most of the
yuuji@0 83 characteristics of mbx format.
yuuji@0 84
yuuji@0 85 mtx is deficient in that it does not support shared expunge;
yuuji@0 86 it has no means to store UIDs; and it has no way to define
yuuji@0 87 keywords except through an external configuration file.
yuuji@0 88
yuuji@0 89 . tenex This is supported for compatibility with the past. This is
yuuji@0 90 the old Columbia MM format. This is similar to mtx format,
yuuji@0 91 only it uses UNIX-style bare-LF newlines instead of CR LF
yuuji@0 92 newlines, thus incurring a performance penalty for newline
yuuji@0 93 conversion.
yuuji@0 94
yuuji@0 95 . phile This is not strictly a format. Any file which is not in a
yuuji@0 96 recognized format is in phile format, which treats the entire
yuuji@0 97 contents of the file as a single message.
yuuji@0 98
yuuji@0 99
yuuji@0 100 II. File/Message Formats
yuuji@0 101
yuuji@0 102 In these formats, a mailbox is a directory, and each the messages
yuuji@0 103 inside are separate files inside the directory. The file names of
yuuji@0 104 these files are generally the text form of a number, which also
yuuji@0 105 matches the UID of the message.
yuuji@0 106
yuuji@0 107 In the case of mx, the mailbox name is the name of the directory
yuuji@0 108 in the filesystem, relative to the user's "mail home directory." In
yuuji@0 109 the case of news and mh, the mailbox name is in a separate namespace
yuuji@0 110 as described in the file naming.txt.
yuuji@0 111
yuuji@0 112 A file/message format mailbox is always a directory. This means
yuuji@0 113 that it is possible to have a file/message format mailbox that has
yuuji@0 114 inferior mailbox names under it (so-called "dual-usage" mailboxes).
yuuji@0 115 For some inexplicable reason, some people want this.
yuuji@0 116
yuuji@0 117 Note that the name "INBOX" has special semantics and rules, as
yuuji@0 118 described in the file naming.txt.
yuuji@0 119
yuuji@0 120 The following file/message formats are supported by c-client as of
yuuji@0 121 the time of this writing:
yuuji@0 122
yuuji@0 123 . mx This is an experimental format, and may be removed in a future
yuuji@0 124 release. An mx format mailbox has a .mxindex file which holds
yuuji@0 125 the message status and unique identifiers. Messages are
yuuji@0 126 stored in Internet standard CF LF form, so the file size of
yuuji@0 127 the message file equals the size of the message.
yuuji@0 128
yuuji@0 129 mx is somewhat inefficient; the entire directory must be read
yuuji@0 130 and each file stat()'d. We found it intolerable for a
yuuji@0 131 moderate sized mailbox (2000 messages) and have more or less
yuuji@0 132 abandoned it.
yuuji@0 133
yuuji@0 134 . mh This is supported for compatibility with the past. This is
yuuji@0 135 the format used by the old mh program.
yuuji@0 136
yuuji@0 137 mh is very inefficient; the entire directory must be read
yuuji@0 138 and each file stat()'d, and in order to determine the size
yuuji@0 139 of a message, the entire file must be read and newline
yuuji@0 140 conversion performed.
yuuji@0 141
yuuji@0 142 mh is deficient in that it does not support any permanent
yuuji@0 143 flags or keywords; and has no means to store UIDs (because
yuuji@0 144 the mh "compress" command renames all the files, that's
yuuji@0 145 why).
yuuji@0 146
yuuji@0 147 . news This is an export of the local filesystem's news spool, e.g.
yuuji@0 148 /var/spool/news. Access to mailboxes in news format is read
yuuji@0 149 only; however, message "deleted" status is preserved in a
yuuji@0 150 .newsrc file in the user's home directory. There is no other
yuuji@0 151 status or keywords.
yuuji@0 152
yuuji@0 153 news is very inefficient; the entire directory must be
yuuji@0 154 read and each file stat()'d, and in order to determine the
yuuji@0 155 size of a message, the entire file must be read and newline
yuuji@0 156 conversion performed.
yuuji@0 157
yuuji@0 158 news is deficient in that it does not support permanent flags
yuuji@0 159 other than deleted; does not support keywords; and has no
yuuji@0 160 expunge.
yuuji@0 161
yuuji@0 162
yuuji@0 163 Soapbox on File/Message Formats
yuuji@0 164
yuuji@0 165 If it sounds from the above descriptions that we're not putting
yuuji@0 166 too much effort into file/message formats, you are correct.
yuuji@0 167
yuuji@0 168 There's a general reason why file/message formats are a bad idea.
yuuji@0 169 Just about every filesystem in existance serializes file creation and
yuuji@0 170 deletions because these manipulate the free space map. This turns out
yuuji@0 171 to be an enormous problem when you start creating/deleting more than a
yuuji@0 172 few messages per second; you spend all your time thrashing in the
yuuji@0 173 filesystem.
yuuji@0 174
yuuji@0 175 It is also extremely slow to do a text search through a
yuuji@0 176 file/message format mailbox. All of those open()s and close()s really
yuuji@0 177 add up to major filesystem thrashing.
yuuji@0 178
yuuji@0 179
yuuji@0 180 What about Cyrus and Maildir?
yuuji@0 181
yuuji@0 182 Both formats are vulnerable to the filesystem thrashing outlined
yuuji@0 183 above.
yuuji@0 184
yuuji@0 185 The Cyrus format used by CMU's Cyrus server (and Esys' server)
yuuji@0 186 has a special associated flat file in each directory that contains
yuuji@0 187 extensive data (including pre-parsed ENVELOPEs and BODYSTRUCTUREs)
yuuji@0 188 about the messages. Put another way, it's a (considerably) more
yuuji@0 189 featureful form of mx. It also uses certain operating system
yuuji@0 190 facilities (e.g. file/memory mapping) which are not available on older
yuuji@0 191 systems, at a cost of much more limited portability than c-client.
yuuji@0 192 These considerably ameliorate the fundamental problems with
yuuji@0 193 file/message formats; in fact, Cyrus is halfway to being a database.
yuuji@0 194 Rather than support Cyrus format in c-client, you should run Cyrus or
yuuji@0 195 Esys if you want that format.
yuuji@0 196
yuuji@0 197 The Maildir format used by qmail has all of the performance
yuuji@0 198 disadvantages of mh noted above, with the additional problem that the
yuuji@0 199 files are renamed in order to change their status so you end up having
yuuji@0 200 to rescan the directory frequently to locate the current names
yuuji@0 201 (particularly in a shared mailbox scenario). It doesn't scale, and it
yuuji@0 202 represents a support nightmare; it is therefore not supported in the
yuuji@0 203 official distribution. Maildir support code for c-client is available
yuuji@0 204 from third parties; but, if you use it, it is entirely at your own
yuuji@0 205 risk (read: don't complain about how poorly it performs or bugs).
yuuji@0 206
yuuji@0 207
yuuji@0 208 So what does this all mean?
yuuji@0 209
yuuji@0 210 A database (such as used by Exchange) is really a much better
yuuji@0 211 approach if you want to move away from flat files. mx and especially
yuuji@0 212 Cyrus take a tenative step in that direction; mx failed mostly because
yuuji@0 213 it didn't go anywhere near far enough. Cyrus goes much further, and
yuuji@0 214 scores remarkable benefits from doing so.
yuuji@0 215
yuuji@0 216 However, a well-designed pure database without the overhead of
yuuji@0 217 separate files would do even better.

UW-IMAP'd extensions by yuuji