rev |
line source |
yuuji@0
|
1 /* ========================================================================
|
yuuji@0
|
2 * Copyright 1988-2006 University of Washington
|
yuuji@0
|
3 *
|
yuuji@0
|
4 * Licensed under the Apache License, Version 2.0 (the "License");
|
yuuji@0
|
5 * you may not use this file except in compliance with the License.
|
yuuji@0
|
6 * You may obtain a copy of the License at
|
yuuji@0
|
7 *
|
yuuji@0
|
8 * http://www.apache.org/licenses/LICENSE-2.0
|
yuuji@0
|
9 *
|
yuuji@0
|
10 *
|
yuuji@0
|
11 * ========================================================================
|
yuuji@0
|
12 */
|
yuuji@0
|
13
|
yuuji@0
|
14 Mailbox Format Characteristics
|
yuuji@0
|
15 Mark Crispin
|
yuuji@0
|
16 11 December 2006
|
yuuji@0
|
17
|
yuuji@0
|
18
|
yuuji@0
|
19 When a mailbox storage technology uses local files and
|
yuuji@0
|
20 directories directly, the file(s) and directories are layed out in a
|
yuuji@0
|
21 mailbox format.
|
yuuji@0
|
22
|
yuuji@0
|
23 I. Flat-File Formats
|
yuuji@0
|
24
|
yuuji@0
|
25 In these formats, a mailbox and all the messages inside are a
|
yuuji@0
|
26 single file on the filesystem. The mailbox name is the name of the
|
yuuji@0
|
27 file in the filesystem, relative to the user's "mail home directory."
|
yuuji@0
|
28
|
yuuji@0
|
29 A flat-file format mailbox is always a file, never a directory.
|
yuuji@0
|
30 This means that it is impossible to have a flat-file format mailbox
|
yuuji@0
|
31 that has inferior mailbox names under it (so-called "dual-usage"
|
yuuji@0
|
32 mailboxes). For some inexplicable reason, some people want this.
|
yuuji@0
|
33
|
yuuji@0
|
34 The mail home directory is usually the same as the user login
|
yuuji@0
|
35 home directory if that concept is meaningful; otherwise, it is some
|
yuuji@0
|
36 other default directory (e.g. "C:\My Documents" on Windows 98). This
|
yuuji@0
|
37 can be redefined by modifying the c-client source code or in an
|
yuuji@0
|
38 application via the SET_HOMEDIR mail_parameters() call.
|
yuuji@0
|
39
|
yuuji@0
|
40 For example, a mailbox named "project" is likely to be found in
|
yuuji@0
|
41 the file "project" in the user's home directory. Similarly, a mailbox
|
yuuji@0
|
42 named "test/trial1" (assuming a UNIX system) is likely to be found in
|
yuuji@0
|
43 the file "trial1" in the subdirectory "test" in the user's home
|
yuuji@0
|
44 directory.
|
yuuji@0
|
45
|
yuuji@0
|
46 Note that the name "INBOX" has special semantics and rules, as
|
yuuji@0
|
47 described in the file naming.txt.
|
yuuji@0
|
48
|
yuuji@0
|
49 The following flat-file formats are supported by c-client as of
|
yuuji@0
|
50 the time of this writing:
|
yuuji@0
|
51
|
yuuji@0
|
52 . unix This is the traditional UNIX mailbox format, in use for nearly
|
yuuji@0
|
53 30 years. It uses a line starting with "From " to indicate
|
yuuji@0
|
54 start of message, and stores the message status inside the
|
yuuji@0
|
55 RFC822 message header.
|
yuuji@0
|
56
|
yuuji@0
|
57 unix is not particularly efficient; the entire mailbox file
|
yuuji@0
|
58 must be read when the mailbox is open, and when reading message
|
yuuji@0
|
59 texts it is necessary to convert the newline convention to
|
yuuji@0
|
60 Internet standard CR LF form. unix preserves UIDs, and allows
|
yuuji@0
|
61 the creation of keywords.
|
yuuji@0
|
62
|
yuuji@0
|
63 Only one process may have a unix-format mailbox open
|
yuuji@0
|
64 read/write at a time.
|
yuuji@0
|
65
|
yuuji@0
|
66 . mmdf This is the format used by the MMDF mailer. It uses a line
|
yuuji@0
|
67 consisting of 4 <CTRL/A> (0x01) characters to indicate start
|
yuuji@0
|
68 and end of message. Optionally, there may also be a unix
|
yuuji@0
|
69 format "From " line. It otherwise has the same
|
yuuji@0
|
70 characteristics as unix format.
|
yuuji@0
|
71
|
yuuji@0
|
72 . mbx This is the current preferred mailbox format. It can be
|
yuuji@0
|
73 handled quite efficiently by c-client, without the problems
|
yuuji@0
|
74 that exist with unix and mmdf formats. Messages are stored
|
yuuji@0
|
75 in Internet standard CR LF format.
|
yuuji@0
|
76
|
yuuji@0
|
77 mbx permits shared access, including shared expunge. It
|
yuuji@0
|
78 preserves UIDs, and allows the creation of keywords.
|
yuuji@0
|
79
|
yuuji@0
|
80 . mtx This is supported for compatibility with the past. This is
|
yuuji@0
|
81 the old Tenex/TOPS-20 mail.txt format. It can be handled
|
yuuji@0
|
82 quite efficiently by c-client, and has most of the
|
yuuji@0
|
83 characteristics of mbx format.
|
yuuji@0
|
84
|
yuuji@0
|
85 mtx is deficient in that it does not support shared expunge;
|
yuuji@0
|
86 it has no means to store UIDs; and it has no way to define
|
yuuji@0
|
87 keywords except through an external configuration file.
|
yuuji@0
|
88
|
yuuji@0
|
89 . tenex This is supported for compatibility with the past. This is
|
yuuji@0
|
90 the old Columbia MM format. This is similar to mtx format,
|
yuuji@0
|
91 only it uses UNIX-style bare-LF newlines instead of CR LF
|
yuuji@0
|
92 newlines, thus incurring a performance penalty for newline
|
yuuji@0
|
93 conversion.
|
yuuji@0
|
94
|
yuuji@0
|
95 . phile This is not strictly a format. Any file which is not in a
|
yuuji@0
|
96 recognized format is in phile format, which treats the entire
|
yuuji@0
|
97 contents of the file as a single message.
|
yuuji@0
|
98
|
yuuji@0
|
99
|
yuuji@0
|
100 II. File/Message Formats
|
yuuji@0
|
101
|
yuuji@0
|
102 In these formats, a mailbox is a directory, and each the messages
|
yuuji@0
|
103 inside are separate files inside the directory. The file names of
|
yuuji@0
|
104 these files are generally the text form of a number, which also
|
yuuji@0
|
105 matches the UID of the message.
|
yuuji@0
|
106
|
yuuji@0
|
107 In the case of mx, the mailbox name is the name of the directory
|
yuuji@0
|
108 in the filesystem, relative to the user's "mail home directory." In
|
yuuji@0
|
109 the case of news and mh, the mailbox name is in a separate namespace
|
yuuji@0
|
110 as described in the file naming.txt.
|
yuuji@0
|
111
|
yuuji@0
|
112 A file/message format mailbox is always a directory. This means
|
yuuji@0
|
113 that it is possible to have a file/message format mailbox that has
|
yuuji@0
|
114 inferior mailbox names under it (so-called "dual-usage" mailboxes).
|
yuuji@0
|
115 For some inexplicable reason, some people want this.
|
yuuji@0
|
116
|
yuuji@0
|
117 Note that the name "INBOX" has special semantics and rules, as
|
yuuji@0
|
118 described in the file naming.txt.
|
yuuji@0
|
119
|
yuuji@0
|
120 The following file/message formats are supported by c-client as of
|
yuuji@0
|
121 the time of this writing:
|
yuuji@0
|
122
|
yuuji@0
|
123 . mx This is an experimental format, and may be removed in a future
|
yuuji@0
|
124 release. An mx format mailbox has a .mxindex file which holds
|
yuuji@0
|
125 the message status and unique identifiers. Messages are
|
yuuji@0
|
126 stored in Internet standard CF LF form, so the file size of
|
yuuji@0
|
127 the message file equals the size of the message.
|
yuuji@0
|
128
|
yuuji@0
|
129 mx is somewhat inefficient; the entire directory must be read
|
yuuji@0
|
130 and each file stat()'d. We found it intolerable for a
|
yuuji@0
|
131 moderate sized mailbox (2000 messages) and have more or less
|
yuuji@0
|
132 abandoned it.
|
yuuji@0
|
133
|
yuuji@0
|
134 . mh This is supported for compatibility with the past. This is
|
yuuji@0
|
135 the format used by the old mh program.
|
yuuji@0
|
136
|
yuuji@0
|
137 mh is very inefficient; the entire directory must be read
|
yuuji@0
|
138 and each file stat()'d, and in order to determine the size
|
yuuji@0
|
139 of a message, the entire file must be read and newline
|
yuuji@0
|
140 conversion performed.
|
yuuji@0
|
141
|
yuuji@0
|
142 mh is deficient in that it does not support any permanent
|
yuuji@0
|
143 flags or keywords; and has no means to store UIDs (because
|
yuuji@0
|
144 the mh "compress" command renames all the files, that's
|
yuuji@0
|
145 why).
|
yuuji@0
|
146
|
yuuji@0
|
147 . news This is an export of the local filesystem's news spool, e.g.
|
yuuji@0
|
148 /var/spool/news. Access to mailboxes in news format is read
|
yuuji@0
|
149 only; however, message "deleted" status is preserved in a
|
yuuji@0
|
150 .newsrc file in the user's home directory. There is no other
|
yuuji@0
|
151 status or keywords.
|
yuuji@0
|
152
|
yuuji@0
|
153 news is very inefficient; the entire directory must be
|
yuuji@0
|
154 read and each file stat()'d, and in order to determine the
|
yuuji@0
|
155 size of a message, the entire file must be read and newline
|
yuuji@0
|
156 conversion performed.
|
yuuji@0
|
157
|
yuuji@0
|
158 news is deficient in that it does not support permanent flags
|
yuuji@0
|
159 other than deleted; does not support keywords; and has no
|
yuuji@0
|
160 expunge.
|
yuuji@0
|
161
|
yuuji@0
|
162
|
yuuji@0
|
163 Soapbox on File/Message Formats
|
yuuji@0
|
164
|
yuuji@0
|
165 If it sounds from the above descriptions that we're not putting
|
yuuji@0
|
166 too much effort into file/message formats, you are correct.
|
yuuji@0
|
167
|
yuuji@0
|
168 There's a general reason why file/message formats are a bad idea.
|
yuuji@0
|
169 Just about every filesystem in existance serializes file creation and
|
yuuji@0
|
170 deletions because these manipulate the free space map. This turns out
|
yuuji@0
|
171 to be an enormous problem when you start creating/deleting more than a
|
yuuji@0
|
172 few messages per second; you spend all your time thrashing in the
|
yuuji@0
|
173 filesystem.
|
yuuji@0
|
174
|
yuuji@0
|
175 It is also extremely slow to do a text search through a
|
yuuji@0
|
176 file/message format mailbox. All of those open()s and close()s really
|
yuuji@0
|
177 add up to major filesystem thrashing.
|
yuuji@0
|
178
|
yuuji@0
|
179
|
yuuji@0
|
180 What about Cyrus and Maildir?
|
yuuji@0
|
181
|
yuuji@0
|
182 Both formats are vulnerable to the filesystem thrashing outlined
|
yuuji@0
|
183 above.
|
yuuji@0
|
184
|
yuuji@0
|
185 The Cyrus format used by CMU's Cyrus server (and Esys' server)
|
yuuji@0
|
186 has a special associated flat file in each directory that contains
|
yuuji@0
|
187 extensive data (including pre-parsed ENVELOPEs and BODYSTRUCTUREs)
|
yuuji@0
|
188 about the messages. Put another way, it's a (considerably) more
|
yuuji@0
|
189 featureful form of mx. It also uses certain operating system
|
yuuji@0
|
190 facilities (e.g. file/memory mapping) which are not available on older
|
yuuji@0
|
191 systems, at a cost of much more limited portability than c-client.
|
yuuji@0
|
192 These considerably ameliorate the fundamental problems with
|
yuuji@0
|
193 file/message formats; in fact, Cyrus is halfway to being a database.
|
yuuji@0
|
194 Rather than support Cyrus format in c-client, you should run Cyrus or
|
yuuji@0
|
195 Esys if you want that format.
|
yuuji@0
|
196
|
yuuji@0
|
197 The Maildir format used by qmail has all of the performance
|
yuuji@0
|
198 disadvantages of mh noted above, with the additional problem that the
|
yuuji@0
|
199 files are renamed in order to change their status so you end up having
|
yuuji@0
|
200 to rescan the directory frequently to locate the current names
|
yuuji@0
|
201 (particularly in a shared mailbox scenario). It doesn't scale, and it
|
yuuji@0
|
202 represents a support nightmare; it is therefore not supported in the
|
yuuji@0
|
203 official distribution. Maildir support code for c-client is available
|
yuuji@0
|
204 from third parties; but, if you use it, it is entirely at your own
|
yuuji@0
|
205 risk (read: don't complain about how poorly it performs or bugs).
|
yuuji@0
|
206
|
yuuji@0
|
207
|
yuuji@0
|
208 So what does this all mean?
|
yuuji@0
|
209
|
yuuji@0
|
210 A database (such as used by Exchange) is really a much better
|
yuuji@0
|
211 approach if you want to move away from flat files. mx and especially
|
yuuji@0
|
212 Cyrus take a tenative step in that direction; mx failed mostly because
|
yuuji@0
|
213 it didn't go anywhere near far enough. Cyrus goes much further, and
|
yuuji@0
|
214 scores remarkable benefits from doing so.
|
yuuji@0
|
215
|
yuuji@0
|
216 However, a well-designed pure database without the overhead of
|
yuuji@0
|
217 separate files would do even better.
|