yuuji@0: /* ======================================================================== yuuji@0: * Copyright 1988-2006 University of Washington yuuji@0: * yuuji@0: * Licensed under the Apache License, Version 2.0 (the "License"); yuuji@0: * you may not use this file except in compliance with the License. yuuji@0: * You may obtain a copy of the License at yuuji@0: * yuuji@0: * http://www.apache.org/licenses/LICENSE-2.0 yuuji@0: * yuuji@0: * yuuji@0: * ======================================================================== yuuji@0: */ yuuji@0: yuuji@0: Mailbox Format Characteristics yuuji@0: Mark Crispin yuuji@0: 11 December 2006 yuuji@0: yuuji@0: yuuji@0: When a mailbox storage technology uses local files and yuuji@0: directories directly, the file(s) and directories are layed out in a yuuji@0: mailbox format. yuuji@0: yuuji@0: I. Flat-File Formats yuuji@0: yuuji@0: In these formats, a mailbox and all the messages inside are a yuuji@0: single file on the filesystem. The mailbox name is the name of the yuuji@0: file in the filesystem, relative to the user's "mail home directory." yuuji@0: yuuji@0: A flat-file format mailbox is always a file, never a directory. yuuji@0: This means that it is impossible to have a flat-file format mailbox yuuji@0: that has inferior mailbox names under it (so-called "dual-usage" yuuji@0: mailboxes). For some inexplicable reason, some people want this. yuuji@0: yuuji@0: The mail home directory is usually the same as the user login yuuji@0: home directory if that concept is meaningful; otherwise, it is some yuuji@0: other default directory (e.g. "C:\My Documents" on Windows 98). This yuuji@0: can be redefined by modifying the c-client source code or in an yuuji@0: application via the SET_HOMEDIR mail_parameters() call. yuuji@0: yuuji@0: For example, a mailbox named "project" is likely to be found in yuuji@0: the file "project" in the user's home directory. Similarly, a mailbox yuuji@0: named "test/trial1" (assuming a UNIX system) is likely to be found in yuuji@0: the file "trial1" in the subdirectory "test" in the user's home yuuji@0: directory. yuuji@0: yuuji@0: Note that the name "INBOX" has special semantics and rules, as yuuji@0: described in the file naming.txt. yuuji@0: yuuji@0: The following flat-file formats are supported by c-client as of yuuji@0: the time of this writing: yuuji@0: yuuji@0: . unix This is the traditional UNIX mailbox format, in use for nearly yuuji@0: 30 years. It uses a line starting with "From " to indicate yuuji@0: start of message, and stores the message status inside the yuuji@0: RFC822 message header. yuuji@0: yuuji@0: unix is not particularly efficient; the entire mailbox file yuuji@0: must be read when the mailbox is open, and when reading message yuuji@0: texts it is necessary to convert the newline convention to yuuji@0: Internet standard CR LF form. unix preserves UIDs, and allows yuuji@0: the creation of keywords. yuuji@0: yuuji@0: Only one process may have a unix-format mailbox open yuuji@0: read/write at a time. yuuji@0: yuuji@0: . mmdf This is the format used by the MMDF mailer. It uses a line yuuji@0: consisting of 4 (0x01) characters to indicate start yuuji@0: and end of message. Optionally, there may also be a unix yuuji@0: format "From " line. It otherwise has the same yuuji@0: characteristics as unix format. yuuji@0: yuuji@0: . mbx This is the current preferred mailbox format. It can be yuuji@0: handled quite efficiently by c-client, without the problems yuuji@0: that exist with unix and mmdf formats. Messages are stored yuuji@0: in Internet standard CR LF format. yuuji@0: yuuji@0: mbx permits shared access, including shared expunge. It yuuji@0: preserves UIDs, and allows the creation of keywords. yuuji@0: yuuji@0: . mtx This is supported for compatibility with the past. This is yuuji@0: the old Tenex/TOPS-20 mail.txt format. It can be handled yuuji@0: quite efficiently by c-client, and has most of the yuuji@0: characteristics of mbx format. yuuji@0: yuuji@0: mtx is deficient in that it does not support shared expunge; yuuji@0: it has no means to store UIDs; and it has no way to define yuuji@0: keywords except through an external configuration file. yuuji@0: yuuji@0: . tenex This is supported for compatibility with the past. This is yuuji@0: the old Columbia MM format. This is similar to mtx format, yuuji@0: only it uses UNIX-style bare-LF newlines instead of CR LF yuuji@0: newlines, thus incurring a performance penalty for newline yuuji@0: conversion. yuuji@0: yuuji@0: . phile This is not strictly a format. Any file which is not in a yuuji@0: recognized format is in phile format, which treats the entire yuuji@0: contents of the file as a single message. yuuji@0: yuuji@0: yuuji@0: II. File/Message Formats yuuji@0: yuuji@0: In these formats, a mailbox is a directory, and each the messages yuuji@0: inside are separate files inside the directory. The file names of yuuji@0: these files are generally the text form of a number, which also yuuji@0: matches the UID of the message. yuuji@0: yuuji@0: In the case of mx, the mailbox name is the name of the directory yuuji@0: in the filesystem, relative to the user's "mail home directory." In yuuji@0: the case of news and mh, the mailbox name is in a separate namespace yuuji@0: as described in the file naming.txt. yuuji@0: yuuji@0: A file/message format mailbox is always a directory. This means yuuji@0: that it is possible to have a file/message format mailbox that has yuuji@0: inferior mailbox names under it (so-called "dual-usage" mailboxes). yuuji@0: For some inexplicable reason, some people want this. yuuji@0: yuuji@0: Note that the name "INBOX" has special semantics and rules, as yuuji@0: described in the file naming.txt. yuuji@0: yuuji@0: The following file/message formats are supported by c-client as of yuuji@0: the time of this writing: yuuji@0: yuuji@0: . mx This is an experimental format, and may be removed in a future yuuji@0: release. An mx format mailbox has a .mxindex file which holds yuuji@0: the message status and unique identifiers. Messages are yuuji@0: stored in Internet standard CF LF form, so the file size of yuuji@0: the message file equals the size of the message. yuuji@0: yuuji@0: mx is somewhat inefficient; the entire directory must be read yuuji@0: and each file stat()'d. We found it intolerable for a yuuji@0: moderate sized mailbox (2000 messages) and have more or less yuuji@0: abandoned it. yuuji@0: yuuji@0: . mh This is supported for compatibility with the past. This is yuuji@0: the format used by the old mh program. yuuji@0: yuuji@0: mh is very inefficient; the entire directory must be read yuuji@0: and each file stat()'d, and in order to determine the size yuuji@0: of a message, the entire file must be read and newline yuuji@0: conversion performed. yuuji@0: yuuji@0: mh is deficient in that it does not support any permanent yuuji@0: flags or keywords; and has no means to store UIDs (because yuuji@0: the mh "compress" command renames all the files, that's yuuji@0: why). yuuji@0: yuuji@0: . news This is an export of the local filesystem's news spool, e.g. yuuji@0: /var/spool/news. Access to mailboxes in news format is read yuuji@0: only; however, message "deleted" status is preserved in a yuuji@0: .newsrc file in the user's home directory. There is no other yuuji@0: status or keywords. yuuji@0: yuuji@0: news is very inefficient; the entire directory must be yuuji@0: read and each file stat()'d, and in order to determine the yuuji@0: size of a message, the entire file must be read and newline yuuji@0: conversion performed. yuuji@0: yuuji@0: news is deficient in that it does not support permanent flags yuuji@0: other than deleted; does not support keywords; and has no yuuji@0: expunge. yuuji@0: yuuji@0: yuuji@0: Soapbox on File/Message Formats yuuji@0: yuuji@0: If it sounds from the above descriptions that we're not putting yuuji@0: too much effort into file/message formats, you are correct. yuuji@0: yuuji@0: There's a general reason why file/message formats are a bad idea. yuuji@0: Just about every filesystem in existance serializes file creation and yuuji@0: deletions because these manipulate the free space map. This turns out yuuji@0: to be an enormous problem when you start creating/deleting more than a yuuji@0: few messages per second; you spend all your time thrashing in the yuuji@0: filesystem. yuuji@0: yuuji@0: It is also extremely slow to do a text search through a yuuji@0: file/message format mailbox. All of those open()s and close()s really yuuji@0: add up to major filesystem thrashing. yuuji@0: yuuji@0: yuuji@0: What about Cyrus and Maildir? yuuji@0: yuuji@0: Both formats are vulnerable to the filesystem thrashing outlined yuuji@0: above. yuuji@0: yuuji@0: The Cyrus format used by CMU's Cyrus server (and Esys' server) yuuji@0: has a special associated flat file in each directory that contains yuuji@0: extensive data (including pre-parsed ENVELOPEs and BODYSTRUCTUREs) yuuji@0: about the messages. Put another way, it's a (considerably) more yuuji@0: featureful form of mx. It also uses certain operating system yuuji@0: facilities (e.g. file/memory mapping) which are not available on older yuuji@0: systems, at a cost of much more limited portability than c-client. yuuji@0: These considerably ameliorate the fundamental problems with yuuji@0: file/message formats; in fact, Cyrus is halfway to being a database. yuuji@0: Rather than support Cyrus format in c-client, you should run Cyrus or yuuji@0: Esys if you want that format. yuuji@0: yuuji@0: The Maildir format used by qmail has all of the performance yuuji@0: disadvantages of mh noted above, with the additional problem that the yuuji@0: files are renamed in order to change their status so you end up having yuuji@0: to rescan the directory frequently to locate the current names yuuji@0: (particularly in a shared mailbox scenario). It doesn't scale, and it yuuji@0: represents a support nightmare; it is therefore not supported in the yuuji@0: official distribution. Maildir support code for c-client is available yuuji@0: from third parties; but, if you use it, it is entirely at your own yuuji@0: risk (read: don't complain about how poorly it performs or bugs). yuuji@0: yuuji@0: yuuji@0: So what does this all mean? yuuji@0: yuuji@0: A database (such as used by Exchange) is really a much better yuuji@0: approach if you want to move away from flat files. mx and especially yuuji@0: Cyrus take a tenative step in that direction; mx failed mostly because yuuji@0: it didn't go anywhere near far enough. Cyrus goes much further, and yuuji@0: scores remarkable benefits from doing so. yuuji@0: yuuji@0: However, a well-designed pure database without the overhead of yuuji@0: separate files would do even better.