imapext-2007

diff docs/draft/sort.txt @ 0:ada5e610ab86

imap-2007e
author yuuji@gentei.org
date Mon, 14 Sep 2009 15:17:45 +0900
parents
children
line diff
     1.1 --- /dev/null	Thu Jan 01 00:00:00 1970 +0000
     1.2 +++ b/docs/draft/sort.txt	Mon Sep 14 15:17:45 2009 +0900
     1.3 @@ -0,0 +1,885 @@
     1.4 +IMAP Extensions Working Group                                 M. Crispin
     1.5 +Internet-Draft                                              K. Murchison
     1.6 +Intended status: Proposed Standard                        March 10, 2008
     1.7 +Expires: September 10, 2008
     1.8 +Document: internet-drafts/draft-ietf-imapext-sort-20.txt
     1.9 +
    1.10 +     INTERNET MESSAGE ACCESS PROTOCOL - SORT AND THREAD EXTENSIONS
    1.11 +
    1.12 +Status of this Memo
    1.13 +
    1.14 +   By submitting this Internet-Draft, each author represents that
    1.15 +   any applicable patent or other IPR claims of which he or she is
    1.16 +   aware have been or will be disclosed, and any of which he or she
    1.17 +   becomes aware will be disclosed, in accordance with Section 6 of
    1.18 +   BCP 79.
    1.19 +
    1.20 +   Internet-Drafts are working documents of the Internet Engineering
    1.21 +   Task Force (IETF), its areas, and its working groups.  Note that
    1.22 +   other groups may also distribute working documents as
    1.23 +   Internet-Drafts.
    1.24 +
    1.25 +   Internet-Drafts are draft documents valid for a maximum of six months
    1.26 +   and may be updated, replaced, or obsoleted by other documents at any
    1.27 +   time.  It is inappropriate to use Internet-Drafts as reference
    1.28 +   material or to cite them other than as "work in progress."
    1.29 +
    1.30 +   The list of current Internet-Drafts can be accessed at
    1.31 +   http://www.ietf.org/ietf/1id-abstracts.txt
    1.32 +
    1.33 +   The list of Internet-Draft Shadow Directories can be accessed at
    1.34 +   http://www.ietf.org/shadow.html.
    1.35 +
    1.36 +   A revised version of this draft document will be submitted to the RFC
    1.37 +   editor as a Proposed Standard for the Internet Community.  Discussion
    1.38 +   and suggestions for improvement are requested, and should be sent to
    1.39 +   ietf-imapext@IMC.ORG.
    1.40 +
    1.41 +   Distribution of this memo is unlimited.
    1.42 +
    1.43 +Abstract
    1.44 +
    1.45 +   This document describes the base-level server-based sorting and
    1.46 +   threading extensions to the [IMAP] protocol.  These extensions
    1.47 +   provide substantial performance improvements for IMAP clients which
    1.48 +   offer sorted and threaded views.
    1.49 +
    1.50 +1. Introduction
    1.51 +
    1.52 +   The SORT and THREAD extensions to the [IMAP] protocol provide a means
    1.53 +   of server-based sorting and threading of messages, without requiring
    1.54 +   that the client download the necessary data to do so itself.  This is
    1.55 +   particularly useful for online clients as described in [IMAP-MODELS].
    1.56 +
    1.57 +   A server which supports the base-level SORT extension indicates this
    1.58 +   with a capability name which starts with "SORT".  Future,
    1.59 +   upwards-compatible extensions to the SORT extension will all start
    1.60 +   with "SORT", indicating support for this base level.
    1.61 +
    1.62 +   A server which supports the THREAD extension indicates this with one
    1.63 +   or more capability names consisting of "THREAD=" followed by a
    1.64 +   supported threading algorithm name as described in this document.
    1.65 +   This provides for future upwards-compatible extensions.
    1.66 +
    1.67 +   A server which implements the SORT and/or THREAD extensions MUST
    1.68 +   collate strings in accordance with the requirements of I18NLEVEL=1,
    1.69 +   as described in [IMAP-I18N], and SHOULD implement and advertise the
    1.70 +   I18NLEVEL=1 extension.  Alternatively, a server MAY implement
    1.71 +   I18NLEVEL=2 (or higher) and comply with the rules of that level.
    1.72 +
    1.73 +      Discussion: the SORT and THREAD extensions predate [IMAP-I18N] by
    1.74 +      several years.  At the time of this writing, all known server
    1.75 +      implementations of SORT and THREAD comply with the rules of
    1.76 +      I18NLEVEL=1, but do not necessarily advertise it.  As discussed
    1.77 +      in [IMAP-I18N] section 4.5, all server implementations should
    1.78 +      eventually be updated to comply with the I18NLEVEL=2 extension.
    1.79 +
    1.80 +   Historical note: the REFERENCES threading algorithm is based on the
    1.81 +   [THREADING] algorithm written used in "Netscape Mail and News"
    1.82 +   versions 2.0 through 3.0.  
    1.83 +
    1.84 +2. Terminology
    1.85 +
    1.86 +   The key words "MUST", "MUST NOT", "REQUIRED", "SHALL", "SHALL NOT",
    1.87 +   "SHOULD", "SHOULD NOT", "RECOMMENDED", "MAY", and "OPTIONAL" in this
    1.88 +   document are to be interpreted as described in [KEYWORDS].
    1.89 +
    1.90 +   The word "can" (not "may") is used to refer to a possible
    1.91 +   circumstance or situation, as opposed to an optional facility of the
    1.92 +   protocol.
    1.93 +
    1.94 +   "User" is used to refer to a human user, whereas "client" refers to
    1.95 +   the software being run by the user.
    1.96 +
    1.97 +   In examples, "C:" and "S:" indicate lines sent by the client and
    1.98 +   server respectively.
    1.99 +
   1.100 +2.1 Base Subject
   1.101 +
   1.102 +   Subject sorting and threading use the "base subject," which has
   1.103 +   specific subject artifacts removed.  Due to the complexity of these
   1.104 +   artifacts, the formal syntax for the subject extraction rules is
   1.105 +   ambiguous.  The following procedure is followed to determine the
   1.106 +   "base subject", using the [ABNF] formal syntax rules described in
   1.107 +   section 5:
   1.108 +
   1.109 +        (1) Convert any RFC 2047 encoded-words in the subject to
   1.110 +        UTF-8 as described in "internationalization
   1.111 +        considerations."  Convert all tabs and continuations to
   1.112 +        space.  Convert all multiple spaces to a single space.
   1.113 +
   1.114 +        (2) Remove all trailing text of the subject that matches
   1.115 +        the subj-trailer ABNF, repeat until no more matches are
   1.116 +        possible.
   1.117 +
   1.118 +        (3) Remove all prefix text of the subject that matches the
   1.119 +        subj-leader ABNF.
   1.120 +
   1.121 +        (4) If there is prefix text of the subject that matches the
   1.122 +        subj-blob ABNF, and removing that prefix leaves a non-empty
   1.123 +        subj-base, then remove the prefix text.
   1.124 +
   1.125 +        (5) Repeat (3) and (4) until no matches remain.
   1.126 +
   1.127 +   Note: it is possible to defer step (2) until step (6), but this
   1.128 +   requires checking for subj-trailer in step (4).
   1.129 +
   1.130 +        (6) If the resulting text begins with the subj-fwd-hdr ABNF
   1.131 +        and ends with the subj-fwd-trl ABNF, remove the
   1.132 +        subj-fwd-hdr and subj-fwd-trl and repeat from step (2).
   1.133 +
   1.134 +        (7) The resulting text is the "base subject" used in the
   1.135 +        SORT.
   1.136 +
   1.137 +   All servers and disconnected (as described in [IMAP-MODELS]) clients
   1.138 +   MUST use exactly this algorithm to determine the "base subject".
   1.139 +   Otherwise there is potential for a user to get inconsistent results
   1.140 +   based on whether they are running in connected or disconnected mode.
   1.141 +
   1.142 +2.2 Sent Date
   1.143 +
   1.144 +   As used in this document, the term "sent date" refers to the date and
   1.145 +   time from the Date: header, adjusted by time zone to normalize to
   1.146 +   UTC.  For example, "31 Dec 2000 16:01:33 -0800" is equivalent to the
   1.147 +   UTC date and time of "1 Jan 2001 00:01:33 +0000".
   1.148 +
   1.149 +   If the time zone is invalid, the date and time SHOULD be treated as
   1.150 +   UTC.  If the time is also invalid, the time SHOULD be treated as
   1.151 +   00:00:00.  If there is no valid date or time, the date and time
   1.152 +   SHOULD be treated as 00:00:00 on the earliest possible date.
   1.153 +
   1.154 +   This differs from the date-related criteria in the SEARCH command
   1.155 +   (described in [IMAP] section 6.4.4), which use just the date and not
   1.156 +   the time, and are not adjusted by time zone.
   1.157 +
   1.158 +   If the sent date can not be determined (a Date: header is missing or
   1.159 +   can not be parsed), the INTERNALDATE for that message is used as the
   1.160 +   sent date.
   1.161 +
   1.162 +   When comparing two sent dates that match exactly, the order in which
   1.163 +   the two messages appear in the mailbox (that is, by sequence number)
   1.164 +   is used as a tie-breaker to determine the order.
   1.165 +
   1.166 +3. Additional Commands
   1.167 +
   1.168 +   These commands are extension to the [IMAP] base protocol.
   1.169 +
   1.170 +   The section headings are intended to correspond with where they would
   1.171 +   be located in the main document if they were part of the base
   1.172 +   specification.
   1.173 +
   1.174 +BASE.6.4.SORT. SORT Command
   1.175 +
   1.176 +   Arguments:  sort program
   1.177 +               charset specification
   1.178 +               searching criteria (one or more)
   1.179 +
   1.180 +   Data:       untagged responses: SORT
   1.181 +
   1.182 +   Result:     OK - sort completed
   1.183 +               NO - sort error: can't sort that charset or
   1.184 +                    criteria
   1.185 +               BAD - command unknown or arguments invalid
   1.186 +
   1.187 +      The SORT command is a variant of SEARCH with sorting semantics for
   1.188 +      the results.  Sort has two arguments before the searching criteria
   1.189 +      argument; a parenthesized list of sort criteria, and the searching
   1.190 +      charset.
   1.191 +
   1.192 +      The charset argument is mandatory (unlike SEARCH) and indicates
   1.193 +      the [CHARSET] of the strings that appear in the searching
   1.194 +      criteria.  The US-ASCII and UTF-8 charsets MUST be implemented.
   1.195 +      All other charsets are optional.
   1.196 +
   1.197 +      There is also a UID SORT command which returns unique identifiers
   1.198 +      instead of message sequence numbers.  Note that there are separate
   1.199 +      searching criteria for message sequence numbers and UIDs; thus the
   1.200 +      arguments to UID SORT are interpreted the same as in SORT.  This
   1.201 +      is analogous to the behavior of UID SEARCH, as opposed to UID
   1.202 +      COPY, UID FETCH, or UID STORE.
   1.203 +
   1.204 +      The SORT command first searches the mailbox for messages that
   1.205 +      match the given searching criteria using the charset argument for
   1.206 +      the interpretation of strings in the searching criteria.  It then
   1.207 +      returns the matching messages in an untagged SORT response, sorted
   1.208 +      according to one or more sort criteria.
   1.209 +
   1.210 +      Sorting is in ascending order.  Earlier dates sort before later
   1.211 +      dates; smaller sizes sort before larger sizes; and strings are
   1.212 +      sorted according to ascending values established by their
   1.213 +      collation algorithm (see under "Internationalization
   1.214 +      Considerations").
   1.215 +
   1.216 +      If two or more messages exactly match according to the sorting
   1.217 +      criteria, these messages are sorted according to the order in
   1.218 +      which they appear in the mailbox.  In other words, there is an
   1.219 +      implicit sort criterion of "sequence number".
   1.220 +
   1.221 +      When multiple sort criteria are specified, the result is sorted in
   1.222 +      the priority order that the criteria appear.  For example,
   1.223 +      (SUBJECT DATE) will sort messages in order by their base subject
   1.224 +      text; and for messages with the same base subject text will sort
   1.225 +      by their sent date.
   1.226 +
   1.227 +      Untagged EXPUNGE responses are not permitted while the server is
   1.228 +      responding to a SORT command, but are permitted during a UID SORT
   1.229 +      command.
   1.230 +
   1.231 +      The defined sort criteria are as follows.  Refer to the Formal
   1.232 +      Syntax section for the precise syntactic definitions of the
   1.233 +      arguments.  If the associated RFC-822 header for a particular
   1.234 +      criterion is absent, it is treated as the empty string.  The empty
   1.235 +      string always collates before non-empty strings.
   1.236 +
   1.237 +      ARRIVAL
   1.238 +         Internal date and time of the message.  This differs from the
   1.239 +         ON criteria in SEARCH, which uses just the internal date.
   1.240 +
   1.241 +      CC
   1.242 +         [IMAP] addr-mailbox of the first "cc" address.
   1.243 +
   1.244 +      DATE
   1.245 +         Sent date and time, as described in section 2.2.
   1.246 +
   1.247 +      FROM
   1.248 +         [IMAP] addr-mailbox of the first "From" address.
   1.249 +
   1.250 +      REVERSE
   1.251 +         Followed by another sort criterion, has the effect of that
   1.252 +         criterion but in reverse (descending) order.
   1.253 +            Note: REVERSE only reverses a single criterion, and does not
   1.254 +            affect the implicit "sequence number" sort criterion if all
   1.255 +            other criteria are identicial.  Consequently, a sort of
   1.256 +            REVERSE SUBJECT is not the same as a reverse ordering of a
   1.257 +            SUBJECT sort.  This can be avoided by use of additional
   1.258 +            criteria, e.g. SUBJECT DATE vs. REVERSE SUBJECT REVERSE
   1.259 +            DATE.  In general, however, it's better (and faster, if the
   1.260 +            client has a "reverse current ordering" command) to reverse
   1.261 +            the results in the client instead of issuing a new SORT.
   1.262 +
   1.263 +      SIZE
   1.264 +         Size of the message in octets.
   1.265 +
   1.266 +      SUBJECT
   1.267 +         Base subject text.
   1.268 +
   1.269 +      TO
   1.270 +         [IMAP] addr-mailbox of the first "To" address.
   1.271 +
   1.272 +   Example:    C: A282 SORT (SUBJECT) UTF-8 SINCE 1-Feb-1994
   1.273 +               S: * SORT 2 84 882
   1.274 +               S: A282 OK SORT completed
   1.275 +               C: A283 SORT (SUBJECT REVERSE DATE) UTF-8 ALL
   1.276 +               S: * SORT 5 3 4 1 2
   1.277 +               S: A283 OK SORT completed
   1.278 +               C: A284 SORT (SUBJECT) US-ASCII TEXT "not in mailbox"
   1.279 +               S: * SORT
   1.280 +               S: A284 OK SORT completed
   1.281 +
   1.282 +BASE.6.4.THREAD. THREAD Command
   1.283 +
   1.284 +Arguments:  threading algorithm
   1.285 +            charset specification
   1.286 +            searching criteria (one or more)
   1.287 +
   1.288 +Data:       untagged responses: THREAD
   1.289 +
   1.290 +Result:     OK - thread completed
   1.291 +            NO - thread error: can't thread that charset or
   1.292 +                 criteria
   1.293 +            BAD - command unknown or arguments invalid
   1.294 +
   1.295 +      The THREAD command is a variant of SEARCH with threading semantics
   1.296 +      for the results.  Thread has two arguments before the searching
   1.297 +      criteria argument; a threading algorithm, and the searching
   1.298 +      charset.
   1.299 +
   1.300 +      The charset argument is mandatory (unlike SEARCH) and indicates
   1.301 +      the [CHARSET] of the strings that appear in the searching
   1.302 +      criteria.  The US-ASCII and UTF-8 charsets MUST be implemented.
   1.303 +      All other charsets are optional.
   1.304 +
   1.305 +      There is also a UID THREAD command which returns unique
   1.306 +      identifiers instead of message sequence numbers.  Note that there
   1.307 +      are separate searching criteria for message sequence numbers and
   1.308 +      UIDs; thus the arguments to UID THREAD are interpreted the same as
   1.309 +      in THREAD.  This is analogous to the behavior of UID SEARCH, as
   1.310 +      opposed to UID COPY, UID FETCH, or UID STORE.
   1.311 +
   1.312 +      The THREAD command first searches the mailbox for messages that
   1.313 +      match the given searching criteria using the charset argument for
   1.314 +      the interpretation of strings in the searching criteria.  It then
   1.315 +      returns the matching messages in an untagged THREAD response,
   1.316 +      threaded according to the specified threading algorithm.
   1.317 +
   1.318 +      All collation is in ascending order.  Earlier dates collate before
   1.319 +      later dates and strings are collated according to ascending values
   1.320 +      established by their collation algorithm (see under
   1.321 +      "Internationalization Considerations").
   1.322 +
   1.323 +      Untagged EXPUNGE responses are not permitted while the server is
   1.324 +      responding to a THREAD command, but are permitted during a UID
   1.325 +      THREAD command.
   1.326 +
   1.327 +      The defined threading algorithms are as follows:
   1.328 +
   1.329 +      ORDEREDSUBJECT
   1.330 +
   1.331 +         The ORDEREDSUBJECT threading algorithm is also referred to as
   1.332 +         "poor man's threading."  The searched messages are sorted by
   1.333 +         base subject and then by the sent date.  The messages are then
   1.334 +         split into separate threads, with each thread containing
   1.335 +         messages with the same base subject text.  Finally, the threads
   1.336 +         are sorted by the sent date of the first message in the thread.
   1.337 +
   1.338 +         The first message of each thread are siblings of each other
   1.339 +         (the "root").  The second message of a thread is the child of
   1.340 +         the first message, and subsequent messages of the thread are
   1.341 +         siblings of the second message and hence children of the
   1.342 +         message at the root.  Hence, there are no grandchildren in
   1.343 +         ORDEREDSUBJECT threading.
   1.344 +
   1.345 +         Children in ORDEREDSUBJECT threading do not have descendents.
   1.346 +         Client implementations SHOULD treat descendents of a child in
   1.347 +         a server response as being siblings of that child.
   1.348 +
   1.349 +      REFERENCES
   1.350 +
   1.351 +         The REFERENCES threading algorithm threads the searched
   1.352 +         messages by grouping them together in parent/child
   1.353 +         relationships based on which messages are replies to others.
   1.354 +         The parent/child relationships are built using two methods:
   1.355 +         reconstructing a message's ancestry using the references
   1.356 +         contained within it; and checking the original (not base)
   1.357 +         subject of a message to see if it is a reply to (or forward of)
   1.358 +         another message.
   1.359 +
   1.360 +            Note: "Message ID" in the following description refers to a
   1.361 +            normalized form of the msg-id in [RFC-2822].  The actual
   1.362 +            text in an RFC 2822 may use quoting, resulting in multiple
   1.363 +            ways of expressing the same Message ID.  Implementations of
   1.364 +            the REFERENCES threading algorithm MUST normalize any msg-id
   1.365 +            in order to avoid false non-matches due to differences in
   1.366 +            quoting.
   1.367 +
   1.368 +            For example, the msg-id
   1.369 +               <"01KF8JCEOCBS0045PS"@xxx.yyy.com>
   1.370 +            and the msg-id
   1.371 +               <01KF8JCEOCBS0045PS@xxx.yyy.com>
   1.372 +            MUST be interpreted as being the same Message ID.
   1.373 +
   1.374 +         The references used for reconstructing a message's ancestry are
   1.375 +         found using the following rules:
   1.376 +
   1.377 +            If a message contains a References header line, then use the
   1.378 +            Message IDs in the References header line as the references.
   1.379 +
   1.380 +            If a message does not contain a References header line, or
   1.381 +            the References header line does not contain any valid
   1.382 +            Message IDs, then use the first (if any) valid Message ID
   1.383 +            found in the In-Reply-To header line as the only reference
   1.384 +            (parent) for this message.
   1.385 +
   1.386 +               Note: Although [RFC-2822] permits multiple Message IDs in
   1.387 +               the In-Reply-To header, in actual practice this
   1.388 +               discipline has not been followed.  For example,
   1.389 +               In-Reply-To headers have been observed with message
   1.390 +               addresses after the Message ID, and there are no good
   1.391 +               heuristics for software to determine the difference.
   1.392 +               This is not a problem with the References header however.
   1.393 +
   1.394 +            If a message does not contain an In-Reply-To header line, or
   1.395 +            the In-Reply-To header line does not contain a valid Message
   1.396 +            ID, then the message does not have any references (NIL).
   1.397 +
   1.398 +         A message is considered to be a reply or forward if the base
   1.399 +         subject extraction rules, applied to the original subject,
   1.400 +         remove any of the following: a subj-refwd, a "(fwd)"
   1.401 +         subj-trailer, or a subj-fwd-hdr and subj-fwd-trl.
   1.402 +
   1.403 +         The REFERENCES algorithm is significantly more complex than
   1.404 +         ORDEREDSUBJECT and consists of six main steps.  These steps are
   1.405 +         outlined in detail below.
   1.406 +
   1.407 +         (1) For each searched message:
   1.408 +
   1.409 +            (A) Using the Message IDs in the message's references, link
   1.410 +            the corresponding messages (those whose Message-ID header
   1.411 +            line contains the given reference Message ID) together as
   1.412 +            parent/child.  Make the first reference the parent of the
   1.413 +            second (and the second a child of the first), the second the
   1.414 +            parent of the third (and the third a child of the second),
   1.415 +            etc.  The following rules govern the creation of these
   1.416 +            links:
   1.417 +
   1.418 +               If a message does not contain a Message-ID header line,
   1.419 +               or the Message-ID header line does not contain a valid
   1.420 +               Message ID, then assign a unique Message ID to this
   1.421 +               message.
   1.422 +
   1.423 +               If two or more messages have the same Message ID, then
   1.424 +               only use that Message ID in the first (lowest sequence
   1.425 +               number) message, and assign a unique Message ID to each
   1.426 +               of the subsequent messages with a duplicate of that
   1.427 +               Message ID.
   1.428 +
   1.429 +               If no message can be found with a given Message ID,
   1.430 +               create a dummy message with this ID.  Use this dummy
   1.431 +               message for all subsequent references to this ID.
   1.432 +
   1.433 +               If a message already has a parent, don't change the
   1.434 +               existing link.  This is done because the References
   1.435 +               header line may have been truncated by a MUA.  As a
   1.436 +               result, there is no guarantee that the messages
   1.437 +               corresponding to adjacent Message IDs in the References
   1.438 +               header line are parent and child.
   1.439 +
   1.440 +               Do not create a parent/child link if creating that link
   1.441 +               would introduce a loop.  For example, before making
   1.442 +               message A the parent of B, make sure that A is not a
   1.443 +               descendent of B.
   1.444 +
   1.445 +                  Note: Message ID comparisons are case-sensitive.
   1.446 +
   1.447 +            (B) Create a parent/child link between the last reference
   1.448 +            (or NIL if there are no references) and the current message.
   1.449 +            If the current message already has a parent, it is probably
   1.450 +            the result of a truncated References header line, so break
   1.451 +            the current parent/child link before creating the new
   1.452 +            correct one.  As in step 1.A, do not create the parent/child
   1.453 +            link if creating that link would introduce a loop.  Note
   1.454 +            that if this message has no references, that it will now
   1.455 +            have no parent.
   1.456 +
   1.457 +               Note: The parent/child links created in steps 1.A and 1.B
   1.458 +               MUST be kept consistent with one another at ALL times.
   1.459 +
   1.460 +         (2) Gather together all of the messages that have no parents
   1.461 +         and make them all children (siblings of one another) of a dummy
   1.462 +         parent (the "root").  These messages constitute the first
   1.463 +         (head) message of the threads created thus far.
   1.464 +
   1.465 +         (3) Prune dummy messages from the thread tree.  Traverse each
   1.466 +         thread under the root, and for each message:
   1.467 +
   1.468 +            If it is a dummy message with NO children, delete it.
   1.469 +
   1.470 +            If it is a dummy message with children, delete it, but
   1.471 +            promote its children to the current level.  In other words,
   1.472 +            splice them in with the dummy's siblings.
   1.473 +
   1.474 +            Do not promote the children if doing so would make them
   1.475 +            children of the root, unless there is only one child.
   1.476 +
   1.477 +         (4) Sort the messages under the root (top-level siblings only)
   1.478 +         by sent date as described in section 2.2.  In the case of a
   1.479 +         dummy message, sort its children by sent date and then use the
   1.480 +         first child for the top-level sort.
   1.481 +
   1.482 +         (5) Gather together messages under the root that have the same
   1.483 +         base subject text.
   1.484 +
   1.485 +            (A) Create a table for associating base subjects with
   1.486 +            messages, called the subject table.
   1.487 +
   1.488 +            (B) Populate the subject table with one message per each
   1.489 +            base subject.  For each child of the root:
   1.490 +
   1.491 +               (i) Find the subject of this thread, by using the base
   1.492 +               subject from either the current message or its first
   1.493 +               child if the current message is a dummy.  This is the
   1.494 +               thread subject.
   1.495 +
   1.496 +               (ii) If the thread subject is empty, skip this message.
   1.497 +
   1.498 +               (iii) Look up the message associated with the thread
   1.499 +               subject in the subject table.
   1.500 +
   1.501 +               (iv) If there is no message in the subject table with the
   1.502 +               thread subject, add the current message and the thread
   1.503 +               subject to the subject table.
   1.504 +
   1.505 +               Otherwise, if the message in the subject table is not a
   1.506 +               dummy, AND either of the following criteria are true:
   1.507 +
   1.508 +                  The current message is a dummy, OR
   1.509 +
   1.510 +                  The message in the subject table is a reply or forward
   1.511 +                  and the current message is not.
   1.512 +
   1.513 +            then replace the message in the subject table with the
   1.514 +            current message.
   1.515 +
   1.516 +            (C) Merge threads with the same thread subject.  For each
   1.517 +            child of the root:
   1.518 +
   1.519 +               (i) Find the message's thread subject as in step 5.B.i
   1.520 +               above.
   1.521 +
   1.522 +               (ii) If the thread subject is empty, skip this message.
   1.523 +
   1.524 +               (iii) Lookup the message associated with this thread
   1.525 +               subject in the subject table.
   1.526 +
   1.527 +               (iv) If the message in the subject table is the current
   1.528 +               message, skip this message.
   1.529 +
   1.530 +               Otherwise, merge the current message with the one in the
   1.531 +               subject table using the following rules:
   1.532 +
   1.533 +                  If both messages are dummies, append the current
   1.534 +                  message's children to the children of the message in
   1.535 +                  the subject table (the children of both messages
   1.536 +                  become siblings), and then delete the current message.
   1.537 +
   1.538 +                  If the message in the subject table is a dummy and the
   1.539 +                  current message is not, make the current message a
   1.540 +                  child of the message in the subject table (a sibling
   1.541 +                  of its children).
   1.542 +
   1.543 +                  If the current message is a reply or forward and the
   1.544 +                  message in the subject table is not, make the current
   1.545 +                  message a child of the message in the subject table (a
   1.546 +                  sibling of its children).
   1.547 +
   1.548 +                  Otherwise, create a new dummy message and make both
   1.549 +                  the current message and the message in the subject
   1.550 +                  table children of the dummy.  Then replace the message
   1.551 +                  in the subject table with the dummy message.
   1.552 +
   1.553 +                     Note: Subject comparisons are case-insensitive, as
   1.554 +                     described under "Internationalization
   1.555 +                     Considerations."
   1.556 +
   1.557 +         (6) Traverse the messages under the root and sort each set of
   1.558 +         siblings by sent date as described in section 2.2.  Traverse
   1.559 +         the messages in such a way that the "youngest" set of siblings
   1.560 +         are sorted first, and the "oldest" set of siblings are sorted
   1.561 +         last (grandchildren are sorted before children, etc).  In the
   1.562 +         case of a dummy message (which can only occur with top-level
   1.563 +         siblings), use its first child for sorting.
   1.564 +
   1.565 +   Example:    C: A283 THREAD ORDEREDSUBJECT UTF-8 SINCE 5-MAR-2000
   1.566 +               S: * THREAD (166)(167)(168)(169)(172)(170)(171)
   1.567 +                  (173)(174 (175)(176)(178)(181)(180))(179)(177
   1.568 +                  (183)(182)(188)(184)(185)(186)(187)(189))(190)
   1.569 +                  (191)(192)(193)(194 195)(196 (197)(198))(199)
   1.570 +                  (200 202)(201)(203)(204)(205)(206 207)(208)
   1.571 +               S: A283 OK THREAD completed
   1.572 +               C: A284 THREAD ORDEREDSUBJECT US-ASCII TEXT "gewp"
   1.573 +               S: * THREAD
   1.574 +               S: A284 OK THREAD completed
   1.575 +               C: A285 THREAD REFERENCES UTF-8 SINCE 5-MAR-2000
   1.576 +               S: * THREAD (166)(167)(168)(169)(172)((170)(179))
   1.577 +                  (171)(173)((174)(175)(176)(178)(181)(180))
   1.578 +                  ((177)(183)(182)(188 (184)(189))(185 186)(187))
   1.579 +                  (190)(191)(192)(193)((194)(195 196))(197 198)
   1.580 +                  (199)(200 202)(201)(203)(204)(205 206 207)(208)
   1.581 +               S: A285 OK THREAD completed
   1.582 +
   1.583 +        Note: The line breaks in the first and third server
   1.584 +        responses are for editorial clarity and do not appear in
   1.585 +        real THREAD responses.
   1.586 +
   1.587 +4. Additional Responses
   1.588 +
   1.589 +   These responses are extensions to the [IMAP] base protocol.
   1.590 +
   1.591 +   The section headings of these responses are intended to correspond
   1.592 +   with where they would be located in the main document.
   1.593 +
   1.594 +BASE.7.2.SORT. SORT Response
   1.595 +
   1.596 +   Data:       zero or more numbers
   1.597 +
   1.598 +      The SORT response occurs as a result of a SORT or UID SORT
   1.599 +      command.  The number(s) refer to those messages that match the
   1.600 +      search criteria.  For SORT, these are message sequence numbers;
   1.601 +      for UID SORT, these are unique identifiers.  Each number is
   1.602 +      delimited by a space.
   1.603 +
   1.604 +   Example:    S: * SORT 2 3 6
   1.605 +
   1.606 +BASE.7.2.THREAD. THREAD Response
   1.607 +
   1.608 +   Data:       zero or more threads
   1.609 +
   1.610 +      The THREAD response occurs as a result of a THREAD or UID THREAD
   1.611 +      command.  It contains zero or more threads.  A thread consists of
   1.612 +      a parenthesized list of thread members.
   1.613 +
   1.614 +      Thread members consist of zero or more message numbers, delimited
   1.615 +      by spaces, indicating successive parent and child.  This continues
   1.616 +      until the thread splits into multiple sub-threads, at which point
   1.617 +      the thread nests into multiple sub-threads with the first member
   1.618 +      of each subthread being siblings at this level.  There is no limit
   1.619 +      to the nesting of threads.
   1.620 +
   1.621 +      The messages numbers refer to those messages that match the search
   1.622 +      criteria.  For THREAD, these are message sequence numbers; for UID
   1.623 +      THREAD, these are unique identifiers.
   1.624 +
   1.625 +   Example:    S: * THREAD (2)(3 6 (4 23)(44 7 96))
   1.626 +
   1.627 +      The first thread consists only of message 2.  The second thread
   1.628 +      consists of the messages 3 (parent) and 6 (child), after which it
   1.629 +      splits into two subthreads; the first of which contains messages 4
   1.630 +      (child of 6, sibling of 44) and 23 (child of 4), and the second of
   1.631 +      which contains messages 44 (child of 6, sibling of 4), 7 (child of
   1.632 +      44), and 96 (child of 7).  Since some later messages are parents
   1.633 +      of earlier messages, the messages were probably moved from some
   1.634 +      other mailbox at different times.
   1.635 +
   1.636 +      -- 2
   1.637 +
   1.638 +      -- 3
   1.639 +         \-- 6
   1.640 +             |-- 4
   1.641 +             |   \-- 23
   1.642 +             |
   1.643 +             \-- 44
   1.644 +                  \-- 7
   1.645 +                      \-- 96
   1.646 +
   1.647 +   Example:    S: * THREAD ((3)(5))
   1.648 +
   1.649 +      In this example, 3 and 5 are siblings of a parent which does not
   1.650 +      match the search criteria (and/or does not exist in the mailbox);
   1.651 +      however they are members of the same thread.
   1.652 +
   1.653 +5. Formal Syntax of SORT and THREAD Commands and Responses
   1.654 +
   1.655 +   The following syntax specification uses the Augmented Backus-Naur
   1.656 +   Form (ABNF) notation as specified in [ABNF].  It also uses [ABNF]
   1.657 +   rules defined in [IMAP].
   1.658 +
   1.659 +sort            = ["UID" SP] "SORT" SP sort-criteria SP search-criteria
   1.660 +
   1.661 +sort-criteria   = "(" sort-criterion *(SP sort-criterion) ")"
   1.662 +
   1.663 +sort-criterion  = ["REVERSE" SP] sort-key
   1.664 +
   1.665 +sort-key        = "ARRIVAL" / "CC" / "DATE" / "FROM" / "SIZE" /
   1.666 +                  "SUBJECT" / "TO"
   1.667 +
   1.668 +thread          = ["UID" SP] "THREAD" SP thread-alg SP search-criteria
   1.669 +
   1.670 +thread-alg      = "ORDEREDSUBJECT" / "REFERENCES" / thread-alg-ext
   1.671 +
   1.672 +thread-alg-ext  = atom
   1.673 +                    ; New algorithms MUST be registered with IANA
   1.674 +
   1.675 +search-criteria = charset 1*(SP search-key)
   1.676 +
   1.677 +charset         = atom / quoted
   1.678 +                    ; CHARSET values MUST be registered with IANA
   1.679 +
   1.680 +sort-data       = "SORT" *(SP nz-number)
   1.681 +
   1.682 +thread-data     = "THREAD" [SP 1*thread-list]
   1.683 +
   1.684 +thread-list     = "(" (thread-members / thread-nested) ")"
   1.685 +
   1.686 +thread-members  = nz-number *(SP nz-number) [SP thread-nested]
   1.687 +
   1.688 +thread-nested   = 2*thread-list
   1.689 +
   1.690 +   The following syntax describes base subject extraction rules (2)-(6):
   1.691 +
   1.692 +subject         = *subj-leader [subj-middle] *subj-trailer
   1.693 +
   1.694 +subj-refwd      = ("re" / ("fw" ["d"])) *WSP [subj-blob] ":"
   1.695 +
   1.696 +subj-blob       = "[" *BLOBCHAR "]" *WSP
   1.697 +
   1.698 +subj-fwd        = subj-fwd-hdr subject subj-fwd-trl
   1.699 +
   1.700 +subj-fwd-hdr    = "[fwd:"
   1.701 +
   1.702 +subj-fwd-trl    = "]"
   1.703 +
   1.704 +subj-leader     = (*subj-blob subj-refwd) / WSP
   1.705 +
   1.706 +subj-middle     = *subj-blob (subj-base / subj-fwd)
   1.707 +                    ; last subj-blob is subj-base if subj-base would
   1.708 +                    ; otherwise be empty
   1.709 +
   1.710 +subj-trailer    = "(fwd)" / WSP
   1.711 +
   1.712 +subj-base       = NONWSP *(*WSP NONWSP)
   1.713 +                    ; can be a subj-blob
   1.714 +
   1.715 +BLOBCHAR        = %x01-5a / %x5c / %x5e-ff
   1.716 +                    ; any CHAR8 except '[' and ']'
   1.717 +
   1.718 +NONWSP          = %x01-08 / %x0a-1f / %x21-ff
   1.719 +                    ; any CHAR8 other than WSP
   1.720 +
   1.721 +6. Security Considerations
   1.722 +
   1.723 +   The SORT and THREAD extensions do not raise any security
   1.724 +   considerations that are not present in the base [IMAP] protocol, and
   1.725 +   these issues are discussed in [IMAP].  Nevertheless, it is important
   1.726 +   to remember that [IMAP] protocol transactions, including message
   1.727 +   data, are sent in the clear over the network unless protection from
   1.728 +   snooping is negotiated, either by the use of STARTTLS, privacy
   1.729 +   protection is negotiated in the AUTHENTICATE command, or some other
   1.730 +   protection mechanism.
   1.731 +
   1.732 +   Although not a security consideration, it is important to recognize
   1.733 +   that sorting by REFERENCES can lead to misleading threading trees.
   1.734 +   For example, a message with false References: header data will cause
   1.735 +   a thread to be incorporated into another thread.
   1.736 +
   1.737 +   The process of extracting the base subject may lead to incorrect
   1.738 +   collation if the extracted data was significant text as opposed to
   1.739 +   a subject artifact.
   1.740 +
   1.741 +7. Internationalization Considerations
   1.742 +
   1.743 +   As stated in the introduction, the rules of I18NLEVEL=1 as described
   1.744 +   in [IMAP-I18N] MUST be followed; that is, the SORT and THREAD
   1.745 +   extensions MUST collate strings according to the i;unicode-casemap
   1.746 +   collation described in [UNICASEMAP].  Servers SHOULD also advertise
   1.747 +   the I18NLEVEL=1 extension.  Alternatively, a server MAY implement
   1.748 +   I18NLEVEL=2 (or higher) and comply with the rules of that level.
   1.749 +
   1.750 +   As discussed in [IMAP-I18N] section 4.5, all server implementations
   1.751 +   should eventually be updated to support the [IMAP-I18N] I18NLEVEL=2
   1.752 +   extension.
   1.753 +
   1.754 +   Translations of the "re" or "fw"/"fwd" tokens are not specified for
   1.755 +   removal in the base subject extraction process.  An attempt to add
   1.756 +   such translated tokens would result in a geometrically complex, and
   1.757 +   ultimately unimplementable, task.
   1.758 +
   1.759 +   Instead, note that [RFC-2822] section 3.6.5 recommends that "re:"
   1.760 +   (from the Latin "res", in the matter of) be used to identify a reply.
   1.761 +   Although it is evident that, from the multiple forms of token to
   1.762 +   identify a forwarded message, there is considerable variation found
   1.763 +   in the wild, the variations are (still) manageable.  Consequently, it
   1.764 +   is suggested that "re:" and one of the variations of the tokens for
   1.765 +   forward supported by the base subject extraction rules be adopted for
   1.766 +   Internet mail messages, since doing so makes it a simple display time
   1.767 +   task to localize the token language for the user.
   1.768 +
   1.769 +8. IANA Considerations
   1.770 +
   1.771 +   [IMAP] capabilities are registered by publishing a standards track or
   1.772 +   IESG approved experimental RFC.  This document constitutes
   1.773 +   registration of the SORT and THREAD capabilities in the [IMAP]
   1.774 +   capabilities registry.
   1.775 +
   1.776 +   This document creates a new [IMAP] threading algorithms registry,
   1.777 +   which registers threading algorithms by publishing a standards track
   1.778 +   or IESG approved experimental RFC.  This document constitutes
   1.779 +   registration of the ORDEREDSUBJECT and REFERENCES algorithms in that
   1.780 +   registry.
   1.781 +
   1.782 +9. Normative References
   1.783 +
   1.784 +   The following documents are normative to this document:
   1.785 +
   1.786 +   [ABNF]                Crocker, D. and Overell, P. "Augmented BNF
   1.787 +                         for Syntax Specifications: ABNF", RFC 5234
   1.788 +                         January 2008
   1.789 +
   1.790 +   [CHARSET]             Freed, N. and Postel, J. "IANA Character Set
   1.791 +                         Registration Procedures", RFC 2978, October
   1.792 +                         2000.
   1.793 +
   1.794 +   [IMAP]                Crispin, M. "Internet Message Access Protocol -
   1.795 +                         Version 4rev1", RFC 3501, March 2003.
   1.796 +
   1.797 +   [IMAP-I18N]           Newman, C. and Gulbrandsen, A. "Internet
   1.798 +                         Message Access Protocol Internationalization",
   1.799 +                         Work in Progress.
   1.800 +
   1.801 +   [KEYWORDS]            Bradner, S. "Key words for use in RFCs to
   1.802 +                         Indicate Requirement Levels", BCP 14, RFC 2119,
   1.803 +                         March 1997.
   1.804 +
   1.805 +   [RFC-2822]            Resnick, P. "Internet Message Format", RFC
   1.806 +                         2822, April 2001.
   1.807 +
   1.808 +   [UNICASEMAP]          Crispin, M. "i;unicode-casemap - Simple Unicode
   1.809 +                         Collation Algorithm", RFC 5051.
   1.810 +
   1.811 +10. Informative References
   1.812 +
   1.813 +   The following documents are informative to this document:
   1.814 +
   1.815 +   [IMAP-MODELS]         Crispin, M. "Distributed Electronic Mail Models
   1.816 +                         in IMAP4", RFC 1733, December 1994.
   1.817 +
   1.818 +   [THREADING]           Zawinski, J. "Message Threading",
   1.819 +                         http://www.jwz.org/doc/threading.html,
   1.820 +                         1997-2002.
   1.821 +
   1.822 +Appendices
   1.823 +
   1.824 +Author's Address
   1.825 +
   1.826 +   Mark R. Crispin
   1.827 +   Networks and Distributed Computing
   1.828 +   University of Washington
   1.829 +   4545 15th Avenue NE
   1.830 +   Seattle, WA  98105-4527
   1.831 +
   1.832 +   Phone: +1 (206) 543-5762
   1.833 +
   1.834 +   EMail: MRC@CAC.Washington.EDU
   1.835 +
   1.836 +   Kenneth Murchison
   1.837 +   Carnegie Mellon University
   1.838 +   5000 Forbes Avenue
   1.839 +   Cyert Hall 285
   1.840 +   Pittsburgh, PA  15213
   1.841 +
   1.842 +   Phone: +1 (412) 268-2638
   1.843 +   Email: murch@andrew.cmu.edu
   1.844 +
   1.845 +Full Copyright Statement
   1.846 +
   1.847 +   Copyright (C) The IETF Trust (2008).
   1.848 +
   1.849 +   This document is subject to the rights, licenses and restrictions
   1.850 +   contained in BCP 78, and except as set forth therein, the authors
   1.851 +   retain all their rights.
   1.852 +
   1.853 +   This document and the information contained herein are provided on an
   1.854 +   "AS IS" basis and THE CONTRIBUTOR, THE ORGANIZATION HE/SHE REPRESENTS
   1.855 +   OR IS SPONSORED BY (IF ANY), THE INTERNET SOCIETY, THE IETF TRUST AND
   1.856 +   THE INTERNET ENGINEERING TASK FORCE DISCLAIM ALL WARRANTIES, EXPRESS
   1.857 +   OR IMPLIED, INCLUDING BUT NOT LIMITED TO ANY WARRANTY THAT THE USE OF
   1.858 +   THE INFORMATION HEREIN WILL NOT INFRINGE ANY RIGHTS OR ANY IMPLIED
   1.859 +   WARRANTIES OF MERCHANTABILITY OR FITNESS FOR A PARTICULAR PURPOSE.
   1.860 +
   1.861 +Intellectual Property
   1.862 +
   1.863 +   The IETF takes no position regarding the validity or scope of any
   1.864 +   Intellectual Property Rights or other rights that might be claimed to
   1.865 +   pertain to the implementation or use of the technology described in
   1.866 +   this document or the extent to which any license under such rights
   1.867 +   might or might not be available; nor does it represent that it has
   1.868 +   made any independent effort to identify any such rights.  Information
   1.869 +   on the procedures with respect to rights in RFC documents can be
   1.870 +   found in BCP 78 and BCP 79.
   1.871 +
   1.872 +   Copies of IPR disclosures made to the IETF Secretariat and any
   1.873 +   assurances of licenses to be made available, or the result of an
   1.874 +   attempt made to obtain a general license or permission for the use of
   1.875 +   such proprietary rights by implementers or users of this
   1.876 +   specification can be obtained from the IETF on-line IPR repository at
   1.877 +   http://www.ietf.org/ipr.
   1.878 +
   1.879 +   The IETF invites any interested party to bring to its attention any
   1.880 +   copyrights, patents or patent applications, or other proprietary
   1.881 +   rights that may cover technology that may be required to implement
   1.882 +   this standard.  Please address the information to the IETF at ietf-
   1.883 +   ipr@ietf.org.
   1.884 +
   1.885 +Acknowledgement
   1.886 +
   1.887 +   Funding for the RFC Editor function is currently provided by the
   1.888 +   Internet Society.

UW-IMAP'd extensions by yuuji