imapext-2007

annotate docs/draft/sort.txt @ 0:ada5e610ab86

imap-2007e
author yuuji@gentei.org
date Mon, 14 Sep 2009 15:17:45 +0900
parents
children
rev   line source
yuuji@0 1 IMAP Extensions Working Group M. Crispin
yuuji@0 2 Internet-Draft K. Murchison
yuuji@0 3 Intended status: Proposed Standard March 10, 2008
yuuji@0 4 Expires: September 10, 2008
yuuji@0 5 Document: internet-drafts/draft-ietf-imapext-sort-20.txt
yuuji@0 6
yuuji@0 7 INTERNET MESSAGE ACCESS PROTOCOL - SORT AND THREAD EXTENSIONS
yuuji@0 8
yuuji@0 9 Status of this Memo
yuuji@0 10
yuuji@0 11 By submitting this Internet-Draft, each author represents that
yuuji@0 12 any applicable patent or other IPR claims of which he or she is
yuuji@0 13 aware have been or will be disclosed, and any of which he or she
yuuji@0 14 becomes aware will be disclosed, in accordance with Section 6 of
yuuji@0 15 BCP 79.
yuuji@0 16
yuuji@0 17 Internet-Drafts are working documents of the Internet Engineering
yuuji@0 18 Task Force (IETF), its areas, and its working groups. Note that
yuuji@0 19 other groups may also distribute working documents as
yuuji@0 20 Internet-Drafts.
yuuji@0 21
yuuji@0 22 Internet-Drafts are draft documents valid for a maximum of six months
yuuji@0 23 and may be updated, replaced, or obsoleted by other documents at any
yuuji@0 24 time. It is inappropriate to use Internet-Drafts as reference
yuuji@0 25 material or to cite them other than as "work in progress."
yuuji@0 26
yuuji@0 27 The list of current Internet-Drafts can be accessed at
yuuji@0 28 http://www.ietf.org/ietf/1id-abstracts.txt
yuuji@0 29
yuuji@0 30 The list of Internet-Draft Shadow Directories can be accessed at
yuuji@0 31 http://www.ietf.org/shadow.html.
yuuji@0 32
yuuji@0 33 A revised version of this draft document will be submitted to the RFC
yuuji@0 34 editor as a Proposed Standard for the Internet Community. Discussion
yuuji@0 35 and suggestions for improvement are requested, and should be sent to
yuuji@0 36 ietf-imapext@IMC.ORG.
yuuji@0 37
yuuji@0 38 Distribution of this memo is unlimited.
yuuji@0 39
yuuji@0 40 Abstract
yuuji@0 41
yuuji@0 42 This document describes the base-level server-based sorting and
yuuji@0 43 threading extensions to the [IMAP] protocol. These extensions
yuuji@0 44 provide substantial performance improvements for IMAP clients which
yuuji@0 45 offer sorted and threaded views.
yuuji@0 46
yuuji@0 47 1. Introduction
yuuji@0 48
yuuji@0 49 The SORT and THREAD extensions to the [IMAP] protocol provide a means
yuuji@0 50 of server-based sorting and threading of messages, without requiring
yuuji@0 51 that the client download the necessary data to do so itself. This is
yuuji@0 52 particularly useful for online clients as described in [IMAP-MODELS].
yuuji@0 53
yuuji@0 54 A server which supports the base-level SORT extension indicates this
yuuji@0 55 with a capability name which starts with "SORT". Future,
yuuji@0 56 upwards-compatible extensions to the SORT extension will all start
yuuji@0 57 with "SORT", indicating support for this base level.
yuuji@0 58
yuuji@0 59 A server which supports the THREAD extension indicates this with one
yuuji@0 60 or more capability names consisting of "THREAD=" followed by a
yuuji@0 61 supported threading algorithm name as described in this document.
yuuji@0 62 This provides for future upwards-compatible extensions.
yuuji@0 63
yuuji@0 64 A server which implements the SORT and/or THREAD extensions MUST
yuuji@0 65 collate strings in accordance with the requirements of I18NLEVEL=1,
yuuji@0 66 as described in [IMAP-I18N], and SHOULD implement and advertise the
yuuji@0 67 I18NLEVEL=1 extension. Alternatively, a server MAY implement
yuuji@0 68 I18NLEVEL=2 (or higher) and comply with the rules of that level.
yuuji@0 69
yuuji@0 70 Discussion: the SORT and THREAD extensions predate [IMAP-I18N] by
yuuji@0 71 several years. At the time of this writing, all known server
yuuji@0 72 implementations of SORT and THREAD comply with the rules of
yuuji@0 73 I18NLEVEL=1, but do not necessarily advertise it. As discussed
yuuji@0 74 in [IMAP-I18N] section 4.5, all server implementations should
yuuji@0 75 eventually be updated to comply with the I18NLEVEL=2 extension.
yuuji@0 76
yuuji@0 77 Historical note: the REFERENCES threading algorithm is based on the
yuuji@0 78 [THREADING] algorithm written used in "Netscape Mail and News"
yuuji@0 79 versions 2.0 through 3.0.
yuuji@0 80
yuuji@0 81 2. Terminology
yuuji@0 82
yuuji@0 83 The key words "MUST", "MUST NOT", "REQUIRED", "SHALL", "SHALL NOT",
yuuji@0 84 "SHOULD", "SHOULD NOT", "RECOMMENDED", "MAY", and "OPTIONAL" in this
yuuji@0 85 document are to be interpreted as described in [KEYWORDS].
yuuji@0 86
yuuji@0 87 The word "can" (not "may") is used to refer to a possible
yuuji@0 88 circumstance or situation, as opposed to an optional facility of the
yuuji@0 89 protocol.
yuuji@0 90
yuuji@0 91 "User" is used to refer to a human user, whereas "client" refers to
yuuji@0 92 the software being run by the user.
yuuji@0 93
yuuji@0 94 In examples, "C:" and "S:" indicate lines sent by the client and
yuuji@0 95 server respectively.
yuuji@0 96
yuuji@0 97 2.1 Base Subject
yuuji@0 98
yuuji@0 99 Subject sorting and threading use the "base subject," which has
yuuji@0 100 specific subject artifacts removed. Due to the complexity of these
yuuji@0 101 artifacts, the formal syntax for the subject extraction rules is
yuuji@0 102 ambiguous. The following procedure is followed to determine the
yuuji@0 103 "base subject", using the [ABNF] formal syntax rules described in
yuuji@0 104 section 5:
yuuji@0 105
yuuji@0 106 (1) Convert any RFC 2047 encoded-words in the subject to
yuuji@0 107 UTF-8 as described in "internationalization
yuuji@0 108 considerations." Convert all tabs and continuations to
yuuji@0 109 space. Convert all multiple spaces to a single space.
yuuji@0 110
yuuji@0 111 (2) Remove all trailing text of the subject that matches
yuuji@0 112 the subj-trailer ABNF, repeat until no more matches are
yuuji@0 113 possible.
yuuji@0 114
yuuji@0 115 (3) Remove all prefix text of the subject that matches the
yuuji@0 116 subj-leader ABNF.
yuuji@0 117
yuuji@0 118 (4) If there is prefix text of the subject that matches the
yuuji@0 119 subj-blob ABNF, and removing that prefix leaves a non-empty
yuuji@0 120 subj-base, then remove the prefix text.
yuuji@0 121
yuuji@0 122 (5) Repeat (3) and (4) until no matches remain.
yuuji@0 123
yuuji@0 124 Note: it is possible to defer step (2) until step (6), but this
yuuji@0 125 requires checking for subj-trailer in step (4).
yuuji@0 126
yuuji@0 127 (6) If the resulting text begins with the subj-fwd-hdr ABNF
yuuji@0 128 and ends with the subj-fwd-trl ABNF, remove the
yuuji@0 129 subj-fwd-hdr and subj-fwd-trl and repeat from step (2).
yuuji@0 130
yuuji@0 131 (7) The resulting text is the "base subject" used in the
yuuji@0 132 SORT.
yuuji@0 133
yuuji@0 134 All servers and disconnected (as described in [IMAP-MODELS]) clients
yuuji@0 135 MUST use exactly this algorithm to determine the "base subject".
yuuji@0 136 Otherwise there is potential for a user to get inconsistent results
yuuji@0 137 based on whether they are running in connected or disconnected mode.
yuuji@0 138
yuuji@0 139 2.2 Sent Date
yuuji@0 140
yuuji@0 141 As used in this document, the term "sent date" refers to the date and
yuuji@0 142 time from the Date: header, adjusted by time zone to normalize to
yuuji@0 143 UTC. For example, "31 Dec 2000 16:01:33 -0800" is equivalent to the
yuuji@0 144 UTC date and time of "1 Jan 2001 00:01:33 +0000".
yuuji@0 145
yuuji@0 146 If the time zone is invalid, the date and time SHOULD be treated as
yuuji@0 147 UTC. If the time is also invalid, the time SHOULD be treated as
yuuji@0 148 00:00:00. If there is no valid date or time, the date and time
yuuji@0 149 SHOULD be treated as 00:00:00 on the earliest possible date.
yuuji@0 150
yuuji@0 151 This differs from the date-related criteria in the SEARCH command
yuuji@0 152 (described in [IMAP] section 6.4.4), which use just the date and not
yuuji@0 153 the time, and are not adjusted by time zone.
yuuji@0 154
yuuji@0 155 If the sent date can not be determined (a Date: header is missing or
yuuji@0 156 can not be parsed), the INTERNALDATE for that message is used as the
yuuji@0 157 sent date.
yuuji@0 158
yuuji@0 159 When comparing two sent dates that match exactly, the order in which
yuuji@0 160 the two messages appear in the mailbox (that is, by sequence number)
yuuji@0 161 is used as a tie-breaker to determine the order.
yuuji@0 162
yuuji@0 163 3. Additional Commands
yuuji@0 164
yuuji@0 165 These commands are extension to the [IMAP] base protocol.
yuuji@0 166
yuuji@0 167 The section headings are intended to correspond with where they would
yuuji@0 168 be located in the main document if they were part of the base
yuuji@0 169 specification.
yuuji@0 170
yuuji@0 171 BASE.6.4.SORT. SORT Command
yuuji@0 172
yuuji@0 173 Arguments: sort program
yuuji@0 174 charset specification
yuuji@0 175 searching criteria (one or more)
yuuji@0 176
yuuji@0 177 Data: untagged responses: SORT
yuuji@0 178
yuuji@0 179 Result: OK - sort completed
yuuji@0 180 NO - sort error: can't sort that charset or
yuuji@0 181 criteria
yuuji@0 182 BAD - command unknown or arguments invalid
yuuji@0 183
yuuji@0 184 The SORT command is a variant of SEARCH with sorting semantics for
yuuji@0 185 the results. Sort has two arguments before the searching criteria
yuuji@0 186 argument; a parenthesized list of sort criteria, and the searching
yuuji@0 187 charset.
yuuji@0 188
yuuji@0 189 The charset argument is mandatory (unlike SEARCH) and indicates
yuuji@0 190 the [CHARSET] of the strings that appear in the searching
yuuji@0 191 criteria. The US-ASCII and UTF-8 charsets MUST be implemented.
yuuji@0 192 All other charsets are optional.
yuuji@0 193
yuuji@0 194 There is also a UID SORT command which returns unique identifiers
yuuji@0 195 instead of message sequence numbers. Note that there are separate
yuuji@0 196 searching criteria for message sequence numbers and UIDs; thus the
yuuji@0 197 arguments to UID SORT are interpreted the same as in SORT. This
yuuji@0 198 is analogous to the behavior of UID SEARCH, as opposed to UID
yuuji@0 199 COPY, UID FETCH, or UID STORE.
yuuji@0 200
yuuji@0 201 The SORT command first searches the mailbox for messages that
yuuji@0 202 match the given searching criteria using the charset argument for
yuuji@0 203 the interpretation of strings in the searching criteria. It then
yuuji@0 204 returns the matching messages in an untagged SORT response, sorted
yuuji@0 205 according to one or more sort criteria.
yuuji@0 206
yuuji@0 207 Sorting is in ascending order. Earlier dates sort before later
yuuji@0 208 dates; smaller sizes sort before larger sizes; and strings are
yuuji@0 209 sorted according to ascending values established by their
yuuji@0 210 collation algorithm (see under "Internationalization
yuuji@0 211 Considerations").
yuuji@0 212
yuuji@0 213 If two or more messages exactly match according to the sorting
yuuji@0 214 criteria, these messages are sorted according to the order in
yuuji@0 215 which they appear in the mailbox. In other words, there is an
yuuji@0 216 implicit sort criterion of "sequence number".
yuuji@0 217
yuuji@0 218 When multiple sort criteria are specified, the result is sorted in
yuuji@0 219 the priority order that the criteria appear. For example,
yuuji@0 220 (SUBJECT DATE) will sort messages in order by their base subject
yuuji@0 221 text; and for messages with the same base subject text will sort
yuuji@0 222 by their sent date.
yuuji@0 223
yuuji@0 224 Untagged EXPUNGE responses are not permitted while the server is
yuuji@0 225 responding to a SORT command, but are permitted during a UID SORT
yuuji@0 226 command.
yuuji@0 227
yuuji@0 228 The defined sort criteria are as follows. Refer to the Formal
yuuji@0 229 Syntax section for the precise syntactic definitions of the
yuuji@0 230 arguments. If the associated RFC-822 header for a particular
yuuji@0 231 criterion is absent, it is treated as the empty string. The empty
yuuji@0 232 string always collates before non-empty strings.
yuuji@0 233
yuuji@0 234 ARRIVAL
yuuji@0 235 Internal date and time of the message. This differs from the
yuuji@0 236 ON criteria in SEARCH, which uses just the internal date.
yuuji@0 237
yuuji@0 238 CC
yuuji@0 239 [IMAP] addr-mailbox of the first "cc" address.
yuuji@0 240
yuuji@0 241 DATE
yuuji@0 242 Sent date and time, as described in section 2.2.
yuuji@0 243
yuuji@0 244 FROM
yuuji@0 245 [IMAP] addr-mailbox of the first "From" address.
yuuji@0 246
yuuji@0 247 REVERSE
yuuji@0 248 Followed by another sort criterion, has the effect of that
yuuji@0 249 criterion but in reverse (descending) order.
yuuji@0 250 Note: REVERSE only reverses a single criterion, and does not
yuuji@0 251 affect the implicit "sequence number" sort criterion if all
yuuji@0 252 other criteria are identicial. Consequently, a sort of
yuuji@0 253 REVERSE SUBJECT is not the same as a reverse ordering of a
yuuji@0 254 SUBJECT sort. This can be avoided by use of additional
yuuji@0 255 criteria, e.g. SUBJECT DATE vs. REVERSE SUBJECT REVERSE
yuuji@0 256 DATE. In general, however, it's better (and faster, if the
yuuji@0 257 client has a "reverse current ordering" command) to reverse
yuuji@0 258 the results in the client instead of issuing a new SORT.
yuuji@0 259
yuuji@0 260 SIZE
yuuji@0 261 Size of the message in octets.
yuuji@0 262
yuuji@0 263 SUBJECT
yuuji@0 264 Base subject text.
yuuji@0 265
yuuji@0 266 TO
yuuji@0 267 [IMAP] addr-mailbox of the first "To" address.
yuuji@0 268
yuuji@0 269 Example: C: A282 SORT (SUBJECT) UTF-8 SINCE 1-Feb-1994
yuuji@0 270 S: * SORT 2 84 882
yuuji@0 271 S: A282 OK SORT completed
yuuji@0 272 C: A283 SORT (SUBJECT REVERSE DATE) UTF-8 ALL
yuuji@0 273 S: * SORT 5 3 4 1 2
yuuji@0 274 S: A283 OK SORT completed
yuuji@0 275 C: A284 SORT (SUBJECT) US-ASCII TEXT "not in mailbox"
yuuji@0 276 S: * SORT
yuuji@0 277 S: A284 OK SORT completed
yuuji@0 278
yuuji@0 279 BASE.6.4.THREAD. THREAD Command
yuuji@0 280
yuuji@0 281 Arguments: threading algorithm
yuuji@0 282 charset specification
yuuji@0 283 searching criteria (one or more)
yuuji@0 284
yuuji@0 285 Data: untagged responses: THREAD
yuuji@0 286
yuuji@0 287 Result: OK - thread completed
yuuji@0 288 NO - thread error: can't thread that charset or
yuuji@0 289 criteria
yuuji@0 290 BAD - command unknown or arguments invalid
yuuji@0 291
yuuji@0 292 The THREAD command is a variant of SEARCH with threading semantics
yuuji@0 293 for the results. Thread has two arguments before the searching
yuuji@0 294 criteria argument; a threading algorithm, and the searching
yuuji@0 295 charset.
yuuji@0 296
yuuji@0 297 The charset argument is mandatory (unlike SEARCH) and indicates
yuuji@0 298 the [CHARSET] of the strings that appear in the searching
yuuji@0 299 criteria. The US-ASCII and UTF-8 charsets MUST be implemented.
yuuji@0 300 All other charsets are optional.
yuuji@0 301
yuuji@0 302 There is also a UID THREAD command which returns unique
yuuji@0 303 identifiers instead of message sequence numbers. Note that there
yuuji@0 304 are separate searching criteria for message sequence numbers and
yuuji@0 305 UIDs; thus the arguments to UID THREAD are interpreted the same as
yuuji@0 306 in THREAD. This is analogous to the behavior of UID SEARCH, as
yuuji@0 307 opposed to UID COPY, UID FETCH, or UID STORE.
yuuji@0 308
yuuji@0 309 The THREAD command first searches the mailbox for messages that
yuuji@0 310 match the given searching criteria using the charset argument for
yuuji@0 311 the interpretation of strings in the searching criteria. It then
yuuji@0 312 returns the matching messages in an untagged THREAD response,
yuuji@0 313 threaded according to the specified threading algorithm.
yuuji@0 314
yuuji@0 315 All collation is in ascending order. Earlier dates collate before
yuuji@0 316 later dates and strings are collated according to ascending values
yuuji@0 317 established by their collation algorithm (see under
yuuji@0 318 "Internationalization Considerations").
yuuji@0 319
yuuji@0 320 Untagged EXPUNGE responses are not permitted while the server is
yuuji@0 321 responding to a THREAD command, but are permitted during a UID
yuuji@0 322 THREAD command.
yuuji@0 323
yuuji@0 324 The defined threading algorithms are as follows:
yuuji@0 325
yuuji@0 326 ORDEREDSUBJECT
yuuji@0 327
yuuji@0 328 The ORDEREDSUBJECT threading algorithm is also referred to as
yuuji@0 329 "poor man's threading." The searched messages are sorted by
yuuji@0 330 base subject and then by the sent date. The messages are then
yuuji@0 331 split into separate threads, with each thread containing
yuuji@0 332 messages with the same base subject text. Finally, the threads
yuuji@0 333 are sorted by the sent date of the first message in the thread.
yuuji@0 334
yuuji@0 335 The first message of each thread are siblings of each other
yuuji@0 336 (the "root"). The second message of a thread is the child of
yuuji@0 337 the first message, and subsequent messages of the thread are
yuuji@0 338 siblings of the second message and hence children of the
yuuji@0 339 message at the root. Hence, there are no grandchildren in
yuuji@0 340 ORDEREDSUBJECT threading.
yuuji@0 341
yuuji@0 342 Children in ORDEREDSUBJECT threading do not have descendents.
yuuji@0 343 Client implementations SHOULD treat descendents of a child in
yuuji@0 344 a server response as being siblings of that child.
yuuji@0 345
yuuji@0 346 REFERENCES
yuuji@0 347
yuuji@0 348 The REFERENCES threading algorithm threads the searched
yuuji@0 349 messages by grouping them together in parent/child
yuuji@0 350 relationships based on which messages are replies to others.
yuuji@0 351 The parent/child relationships are built using two methods:
yuuji@0 352 reconstructing a message's ancestry using the references
yuuji@0 353 contained within it; and checking the original (not base)
yuuji@0 354 subject of a message to see if it is a reply to (or forward of)
yuuji@0 355 another message.
yuuji@0 356
yuuji@0 357 Note: "Message ID" in the following description refers to a
yuuji@0 358 normalized form of the msg-id in [RFC-2822]. The actual
yuuji@0 359 text in an RFC 2822 may use quoting, resulting in multiple
yuuji@0 360 ways of expressing the same Message ID. Implementations of
yuuji@0 361 the REFERENCES threading algorithm MUST normalize any msg-id
yuuji@0 362 in order to avoid false non-matches due to differences in
yuuji@0 363 quoting.
yuuji@0 364
yuuji@0 365 For example, the msg-id
yuuji@0 366 <"01KF8JCEOCBS0045PS"@xxx.yyy.com>
yuuji@0 367 and the msg-id
yuuji@0 368 <01KF8JCEOCBS0045PS@xxx.yyy.com>
yuuji@0 369 MUST be interpreted as being the same Message ID.
yuuji@0 370
yuuji@0 371 The references used for reconstructing a message's ancestry are
yuuji@0 372 found using the following rules:
yuuji@0 373
yuuji@0 374 If a message contains a References header line, then use the
yuuji@0 375 Message IDs in the References header line as the references.
yuuji@0 376
yuuji@0 377 If a message does not contain a References header line, or
yuuji@0 378 the References header line does not contain any valid
yuuji@0 379 Message IDs, then use the first (if any) valid Message ID
yuuji@0 380 found in the In-Reply-To header line as the only reference
yuuji@0 381 (parent) for this message.
yuuji@0 382
yuuji@0 383 Note: Although [RFC-2822] permits multiple Message IDs in
yuuji@0 384 the In-Reply-To header, in actual practice this
yuuji@0 385 discipline has not been followed. For example,
yuuji@0 386 In-Reply-To headers have been observed with message
yuuji@0 387 addresses after the Message ID, and there are no good
yuuji@0 388 heuristics for software to determine the difference.
yuuji@0 389 This is not a problem with the References header however.
yuuji@0 390
yuuji@0 391 If a message does not contain an In-Reply-To header line, or
yuuji@0 392 the In-Reply-To header line does not contain a valid Message
yuuji@0 393 ID, then the message does not have any references (NIL).
yuuji@0 394
yuuji@0 395 A message is considered to be a reply or forward if the base
yuuji@0 396 subject extraction rules, applied to the original subject,
yuuji@0 397 remove any of the following: a subj-refwd, a "(fwd)"
yuuji@0 398 subj-trailer, or a subj-fwd-hdr and subj-fwd-trl.
yuuji@0 399
yuuji@0 400 The REFERENCES algorithm is significantly more complex than
yuuji@0 401 ORDEREDSUBJECT and consists of six main steps. These steps are
yuuji@0 402 outlined in detail below.
yuuji@0 403
yuuji@0 404 (1) For each searched message:
yuuji@0 405
yuuji@0 406 (A) Using the Message IDs in the message's references, link
yuuji@0 407 the corresponding messages (those whose Message-ID header
yuuji@0 408 line contains the given reference Message ID) together as
yuuji@0 409 parent/child. Make the first reference the parent of the
yuuji@0 410 second (and the second a child of the first), the second the
yuuji@0 411 parent of the third (and the third a child of the second),
yuuji@0 412 etc. The following rules govern the creation of these
yuuji@0 413 links:
yuuji@0 414
yuuji@0 415 If a message does not contain a Message-ID header line,
yuuji@0 416 or the Message-ID header line does not contain a valid
yuuji@0 417 Message ID, then assign a unique Message ID to this
yuuji@0 418 message.
yuuji@0 419
yuuji@0 420 If two or more messages have the same Message ID, then
yuuji@0 421 only use that Message ID in the first (lowest sequence
yuuji@0 422 number) message, and assign a unique Message ID to each
yuuji@0 423 of the subsequent messages with a duplicate of that
yuuji@0 424 Message ID.
yuuji@0 425
yuuji@0 426 If no message can be found with a given Message ID,
yuuji@0 427 create a dummy message with this ID. Use this dummy
yuuji@0 428 message for all subsequent references to this ID.
yuuji@0 429
yuuji@0 430 If a message already has a parent, don't change the
yuuji@0 431 existing link. This is done because the References
yuuji@0 432 header line may have been truncated by a MUA. As a
yuuji@0 433 result, there is no guarantee that the messages
yuuji@0 434 corresponding to adjacent Message IDs in the References
yuuji@0 435 header line are parent and child.
yuuji@0 436
yuuji@0 437 Do not create a parent/child link if creating that link
yuuji@0 438 would introduce a loop. For example, before making
yuuji@0 439 message A the parent of B, make sure that A is not a
yuuji@0 440 descendent of B.
yuuji@0 441
yuuji@0 442 Note: Message ID comparisons are case-sensitive.
yuuji@0 443
yuuji@0 444 (B) Create a parent/child link between the last reference
yuuji@0 445 (or NIL if there are no references) and the current message.
yuuji@0 446 If the current message already has a parent, it is probably
yuuji@0 447 the result of a truncated References header line, so break
yuuji@0 448 the current parent/child link before creating the new
yuuji@0 449 correct one. As in step 1.A, do not create the parent/child
yuuji@0 450 link if creating that link would introduce a loop. Note
yuuji@0 451 that if this message has no references, that it will now
yuuji@0 452 have no parent.
yuuji@0 453
yuuji@0 454 Note: The parent/child links created in steps 1.A and 1.B
yuuji@0 455 MUST be kept consistent with one another at ALL times.
yuuji@0 456
yuuji@0 457 (2) Gather together all of the messages that have no parents
yuuji@0 458 and make them all children (siblings of one another) of a dummy
yuuji@0 459 parent (the "root"). These messages constitute the first
yuuji@0 460 (head) message of the threads created thus far.
yuuji@0 461
yuuji@0 462 (3) Prune dummy messages from the thread tree. Traverse each
yuuji@0 463 thread under the root, and for each message:
yuuji@0 464
yuuji@0 465 If it is a dummy message with NO children, delete it.
yuuji@0 466
yuuji@0 467 If it is a dummy message with children, delete it, but
yuuji@0 468 promote its children to the current level. In other words,
yuuji@0 469 splice them in with the dummy's siblings.
yuuji@0 470
yuuji@0 471 Do not promote the children if doing so would make them
yuuji@0 472 children of the root, unless there is only one child.
yuuji@0 473
yuuji@0 474 (4) Sort the messages under the root (top-level siblings only)
yuuji@0 475 by sent date as described in section 2.2. In the case of a
yuuji@0 476 dummy message, sort its children by sent date and then use the
yuuji@0 477 first child for the top-level sort.
yuuji@0 478
yuuji@0 479 (5) Gather together messages under the root that have the same
yuuji@0 480 base subject text.
yuuji@0 481
yuuji@0 482 (A) Create a table for associating base subjects with
yuuji@0 483 messages, called the subject table.
yuuji@0 484
yuuji@0 485 (B) Populate the subject table with one message per each
yuuji@0 486 base subject. For each child of the root:
yuuji@0 487
yuuji@0 488 (i) Find the subject of this thread, by using the base
yuuji@0 489 subject from either the current message or its first
yuuji@0 490 child if the current message is a dummy. This is the
yuuji@0 491 thread subject.
yuuji@0 492
yuuji@0 493 (ii) If the thread subject is empty, skip this message.
yuuji@0 494
yuuji@0 495 (iii) Look up the message associated with the thread
yuuji@0 496 subject in the subject table.
yuuji@0 497
yuuji@0 498 (iv) If there is no message in the subject table with the
yuuji@0 499 thread subject, add the current message and the thread
yuuji@0 500 subject to the subject table.
yuuji@0 501
yuuji@0 502 Otherwise, if the message in the subject table is not a
yuuji@0 503 dummy, AND either of the following criteria are true:
yuuji@0 504
yuuji@0 505 The current message is a dummy, OR
yuuji@0 506
yuuji@0 507 The message in the subject table is a reply or forward
yuuji@0 508 and the current message is not.
yuuji@0 509
yuuji@0 510 then replace the message in the subject table with the
yuuji@0 511 current message.
yuuji@0 512
yuuji@0 513 (C) Merge threads with the same thread subject. For each
yuuji@0 514 child of the root:
yuuji@0 515
yuuji@0 516 (i) Find the message's thread subject as in step 5.B.i
yuuji@0 517 above.
yuuji@0 518
yuuji@0 519 (ii) If the thread subject is empty, skip this message.
yuuji@0 520
yuuji@0 521 (iii) Lookup the message associated with this thread
yuuji@0 522 subject in the subject table.
yuuji@0 523
yuuji@0 524 (iv) If the message in the subject table is the current
yuuji@0 525 message, skip this message.
yuuji@0 526
yuuji@0 527 Otherwise, merge the current message with the one in the
yuuji@0 528 subject table using the following rules:
yuuji@0 529
yuuji@0 530 If both messages are dummies, append the current
yuuji@0 531 message's children to the children of the message in
yuuji@0 532 the subject table (the children of both messages
yuuji@0 533 become siblings), and then delete the current message.
yuuji@0 534
yuuji@0 535 If the message in the subject table is a dummy and the
yuuji@0 536 current message is not, make the current message a
yuuji@0 537 child of the message in the subject table (a sibling
yuuji@0 538 of its children).
yuuji@0 539
yuuji@0 540 If the current message is a reply or forward and the
yuuji@0 541 message in the subject table is not, make the current
yuuji@0 542 message a child of the message in the subject table (a
yuuji@0 543 sibling of its children).
yuuji@0 544
yuuji@0 545 Otherwise, create a new dummy message and make both
yuuji@0 546 the current message and the message in the subject
yuuji@0 547 table children of the dummy. Then replace the message
yuuji@0 548 in the subject table with the dummy message.
yuuji@0 549
yuuji@0 550 Note: Subject comparisons are case-insensitive, as
yuuji@0 551 described under "Internationalization
yuuji@0 552 Considerations."
yuuji@0 553
yuuji@0 554 (6) Traverse the messages under the root and sort each set of
yuuji@0 555 siblings by sent date as described in section 2.2. Traverse
yuuji@0 556 the messages in such a way that the "youngest" set of siblings
yuuji@0 557 are sorted first, and the "oldest" set of siblings are sorted
yuuji@0 558 last (grandchildren are sorted before children, etc). In the
yuuji@0 559 case of a dummy message (which can only occur with top-level
yuuji@0 560 siblings), use its first child for sorting.
yuuji@0 561
yuuji@0 562 Example: C: A283 THREAD ORDEREDSUBJECT UTF-8 SINCE 5-MAR-2000
yuuji@0 563 S: * THREAD (166)(167)(168)(169)(172)(170)(171)
yuuji@0 564 (173)(174 (175)(176)(178)(181)(180))(179)(177
yuuji@0 565 (183)(182)(188)(184)(185)(186)(187)(189))(190)
yuuji@0 566 (191)(192)(193)(194 195)(196 (197)(198))(199)
yuuji@0 567 (200 202)(201)(203)(204)(205)(206 207)(208)
yuuji@0 568 S: A283 OK THREAD completed
yuuji@0 569 C: A284 THREAD ORDEREDSUBJECT US-ASCII TEXT "gewp"
yuuji@0 570 S: * THREAD
yuuji@0 571 S: A284 OK THREAD completed
yuuji@0 572 C: A285 THREAD REFERENCES UTF-8 SINCE 5-MAR-2000
yuuji@0 573 S: * THREAD (166)(167)(168)(169)(172)((170)(179))
yuuji@0 574 (171)(173)((174)(175)(176)(178)(181)(180))
yuuji@0 575 ((177)(183)(182)(188 (184)(189))(185 186)(187))
yuuji@0 576 (190)(191)(192)(193)((194)(195 196))(197 198)
yuuji@0 577 (199)(200 202)(201)(203)(204)(205 206 207)(208)
yuuji@0 578 S: A285 OK THREAD completed
yuuji@0 579
yuuji@0 580 Note: The line breaks in the first and third server
yuuji@0 581 responses are for editorial clarity and do not appear in
yuuji@0 582 real THREAD responses.
yuuji@0 583
yuuji@0 584 4. Additional Responses
yuuji@0 585
yuuji@0 586 These responses are extensions to the [IMAP] base protocol.
yuuji@0 587
yuuji@0 588 The section headings of these responses are intended to correspond
yuuji@0 589 with where they would be located in the main document.
yuuji@0 590
yuuji@0 591 BASE.7.2.SORT. SORT Response
yuuji@0 592
yuuji@0 593 Data: zero or more numbers
yuuji@0 594
yuuji@0 595 The SORT response occurs as a result of a SORT or UID SORT
yuuji@0 596 command. The number(s) refer to those messages that match the
yuuji@0 597 search criteria. For SORT, these are message sequence numbers;
yuuji@0 598 for UID SORT, these are unique identifiers. Each number is
yuuji@0 599 delimited by a space.
yuuji@0 600
yuuji@0 601 Example: S: * SORT 2 3 6
yuuji@0 602
yuuji@0 603 BASE.7.2.THREAD. THREAD Response
yuuji@0 604
yuuji@0 605 Data: zero or more threads
yuuji@0 606
yuuji@0 607 The THREAD response occurs as a result of a THREAD or UID THREAD
yuuji@0 608 command. It contains zero or more threads. A thread consists of
yuuji@0 609 a parenthesized list of thread members.
yuuji@0 610
yuuji@0 611 Thread members consist of zero or more message numbers, delimited
yuuji@0 612 by spaces, indicating successive parent and child. This continues
yuuji@0 613 until the thread splits into multiple sub-threads, at which point
yuuji@0 614 the thread nests into multiple sub-threads with the first member
yuuji@0 615 of each subthread being siblings at this level. There is no limit
yuuji@0 616 to the nesting of threads.
yuuji@0 617
yuuji@0 618 The messages numbers refer to those messages that match the search
yuuji@0 619 criteria. For THREAD, these are message sequence numbers; for UID
yuuji@0 620 THREAD, these are unique identifiers.
yuuji@0 621
yuuji@0 622 Example: S: * THREAD (2)(3 6 (4 23)(44 7 96))
yuuji@0 623
yuuji@0 624 The first thread consists only of message 2. The second thread
yuuji@0 625 consists of the messages 3 (parent) and 6 (child), after which it
yuuji@0 626 splits into two subthreads; the first of which contains messages 4
yuuji@0 627 (child of 6, sibling of 44) and 23 (child of 4), and the second of
yuuji@0 628 which contains messages 44 (child of 6, sibling of 4), 7 (child of
yuuji@0 629 44), and 96 (child of 7). Since some later messages are parents
yuuji@0 630 of earlier messages, the messages were probably moved from some
yuuji@0 631 other mailbox at different times.
yuuji@0 632
yuuji@0 633 -- 2
yuuji@0 634
yuuji@0 635 -- 3
yuuji@0 636 \-- 6
yuuji@0 637 |-- 4
yuuji@0 638 | \-- 23
yuuji@0 639 |
yuuji@0 640 \-- 44
yuuji@0 641 \-- 7
yuuji@0 642 \-- 96
yuuji@0 643
yuuji@0 644 Example: S: * THREAD ((3)(5))
yuuji@0 645
yuuji@0 646 In this example, 3 and 5 are siblings of a parent which does not
yuuji@0 647 match the search criteria (and/or does not exist in the mailbox);
yuuji@0 648 however they are members of the same thread.
yuuji@0 649
yuuji@0 650 5. Formal Syntax of SORT and THREAD Commands and Responses
yuuji@0 651
yuuji@0 652 The following syntax specification uses the Augmented Backus-Naur
yuuji@0 653 Form (ABNF) notation as specified in [ABNF]. It also uses [ABNF]
yuuji@0 654 rules defined in [IMAP].
yuuji@0 655
yuuji@0 656 sort = ["UID" SP] "SORT" SP sort-criteria SP search-criteria
yuuji@0 657
yuuji@0 658 sort-criteria = "(" sort-criterion *(SP sort-criterion) ")"
yuuji@0 659
yuuji@0 660 sort-criterion = ["REVERSE" SP] sort-key
yuuji@0 661
yuuji@0 662 sort-key = "ARRIVAL" / "CC" / "DATE" / "FROM" / "SIZE" /
yuuji@0 663 "SUBJECT" / "TO"
yuuji@0 664
yuuji@0 665 thread = ["UID" SP] "THREAD" SP thread-alg SP search-criteria
yuuji@0 666
yuuji@0 667 thread-alg = "ORDEREDSUBJECT" / "REFERENCES" / thread-alg-ext
yuuji@0 668
yuuji@0 669 thread-alg-ext = atom
yuuji@0 670 ; New algorithms MUST be registered with IANA
yuuji@0 671
yuuji@0 672 search-criteria = charset 1*(SP search-key)
yuuji@0 673
yuuji@0 674 charset = atom / quoted
yuuji@0 675 ; CHARSET values MUST be registered with IANA
yuuji@0 676
yuuji@0 677 sort-data = "SORT" *(SP nz-number)
yuuji@0 678
yuuji@0 679 thread-data = "THREAD" [SP 1*thread-list]
yuuji@0 680
yuuji@0 681 thread-list = "(" (thread-members / thread-nested) ")"
yuuji@0 682
yuuji@0 683 thread-members = nz-number *(SP nz-number) [SP thread-nested]
yuuji@0 684
yuuji@0 685 thread-nested = 2*thread-list
yuuji@0 686
yuuji@0 687 The following syntax describes base subject extraction rules (2)-(6):
yuuji@0 688
yuuji@0 689 subject = *subj-leader [subj-middle] *subj-trailer
yuuji@0 690
yuuji@0 691 subj-refwd = ("re" / ("fw" ["d"])) *WSP [subj-blob] ":"
yuuji@0 692
yuuji@0 693 subj-blob = "[" *BLOBCHAR "]" *WSP
yuuji@0 694
yuuji@0 695 subj-fwd = subj-fwd-hdr subject subj-fwd-trl
yuuji@0 696
yuuji@0 697 subj-fwd-hdr = "[fwd:"
yuuji@0 698
yuuji@0 699 subj-fwd-trl = "]"
yuuji@0 700
yuuji@0 701 subj-leader = (*subj-blob subj-refwd) / WSP
yuuji@0 702
yuuji@0 703 subj-middle = *subj-blob (subj-base / subj-fwd)
yuuji@0 704 ; last subj-blob is subj-base if subj-base would
yuuji@0 705 ; otherwise be empty
yuuji@0 706
yuuji@0 707 subj-trailer = "(fwd)" / WSP
yuuji@0 708
yuuji@0 709 subj-base = NONWSP *(*WSP NONWSP)
yuuji@0 710 ; can be a subj-blob
yuuji@0 711
yuuji@0 712 BLOBCHAR = %x01-5a / %x5c / %x5e-ff
yuuji@0 713 ; any CHAR8 except '[' and ']'
yuuji@0 714
yuuji@0 715 NONWSP = %x01-08 / %x0a-1f / %x21-ff
yuuji@0 716 ; any CHAR8 other than WSP
yuuji@0 717
yuuji@0 718 6. Security Considerations
yuuji@0 719
yuuji@0 720 The SORT and THREAD extensions do not raise any security
yuuji@0 721 considerations that are not present in the base [IMAP] protocol, and
yuuji@0 722 these issues are discussed in [IMAP]. Nevertheless, it is important
yuuji@0 723 to remember that [IMAP] protocol transactions, including message
yuuji@0 724 data, are sent in the clear over the network unless protection from
yuuji@0 725 snooping is negotiated, either by the use of STARTTLS, privacy
yuuji@0 726 protection is negotiated in the AUTHENTICATE command, or some other
yuuji@0 727 protection mechanism.
yuuji@0 728
yuuji@0 729 Although not a security consideration, it is important to recognize
yuuji@0 730 that sorting by REFERENCES can lead to misleading threading trees.
yuuji@0 731 For example, a message with false References: header data will cause
yuuji@0 732 a thread to be incorporated into another thread.
yuuji@0 733
yuuji@0 734 The process of extracting the base subject may lead to incorrect
yuuji@0 735 collation if the extracted data was significant text as opposed to
yuuji@0 736 a subject artifact.
yuuji@0 737
yuuji@0 738 7. Internationalization Considerations
yuuji@0 739
yuuji@0 740 As stated in the introduction, the rules of I18NLEVEL=1 as described
yuuji@0 741 in [IMAP-I18N] MUST be followed; that is, the SORT and THREAD
yuuji@0 742 extensions MUST collate strings according to the i;unicode-casemap
yuuji@0 743 collation described in [UNICASEMAP]. Servers SHOULD also advertise
yuuji@0 744 the I18NLEVEL=1 extension. Alternatively, a server MAY implement
yuuji@0 745 I18NLEVEL=2 (or higher) and comply with the rules of that level.
yuuji@0 746
yuuji@0 747 As discussed in [IMAP-I18N] section 4.5, all server implementations
yuuji@0 748 should eventually be updated to support the [IMAP-I18N] I18NLEVEL=2
yuuji@0 749 extension.
yuuji@0 750
yuuji@0 751 Translations of the "re" or "fw"/"fwd" tokens are not specified for
yuuji@0 752 removal in the base subject extraction process. An attempt to add
yuuji@0 753 such translated tokens would result in a geometrically complex, and
yuuji@0 754 ultimately unimplementable, task.
yuuji@0 755
yuuji@0 756 Instead, note that [RFC-2822] section 3.6.5 recommends that "re:"
yuuji@0 757 (from the Latin "res", in the matter of) be used to identify a reply.
yuuji@0 758 Although it is evident that, from the multiple forms of token to
yuuji@0 759 identify a forwarded message, there is considerable variation found
yuuji@0 760 in the wild, the variations are (still) manageable. Consequently, it
yuuji@0 761 is suggested that "re:" and one of the variations of the tokens for
yuuji@0 762 forward supported by the base subject extraction rules be adopted for
yuuji@0 763 Internet mail messages, since doing so makes it a simple display time
yuuji@0 764 task to localize the token language for the user.
yuuji@0 765
yuuji@0 766 8. IANA Considerations
yuuji@0 767
yuuji@0 768 [IMAP] capabilities are registered by publishing a standards track or
yuuji@0 769 IESG approved experimental RFC. This document constitutes
yuuji@0 770 registration of the SORT and THREAD capabilities in the [IMAP]
yuuji@0 771 capabilities registry.
yuuji@0 772
yuuji@0 773 This document creates a new [IMAP] threading algorithms registry,
yuuji@0 774 which registers threading algorithms by publishing a standards track
yuuji@0 775 or IESG approved experimental RFC. This document constitutes
yuuji@0 776 registration of the ORDEREDSUBJECT and REFERENCES algorithms in that
yuuji@0 777 registry.
yuuji@0 778
yuuji@0 779 9. Normative References
yuuji@0 780
yuuji@0 781 The following documents are normative to this document:
yuuji@0 782
yuuji@0 783 [ABNF] Crocker, D. and Overell, P. "Augmented BNF
yuuji@0 784 for Syntax Specifications: ABNF", RFC 5234
yuuji@0 785 January 2008
yuuji@0 786
yuuji@0 787 [CHARSET] Freed, N. and Postel, J. "IANA Character Set
yuuji@0 788 Registration Procedures", RFC 2978, October
yuuji@0 789 2000.
yuuji@0 790
yuuji@0 791 [IMAP] Crispin, M. "Internet Message Access Protocol -
yuuji@0 792 Version 4rev1", RFC 3501, March 2003.
yuuji@0 793
yuuji@0 794 [IMAP-I18N] Newman, C. and Gulbrandsen, A. "Internet
yuuji@0 795 Message Access Protocol Internationalization",
yuuji@0 796 Work in Progress.
yuuji@0 797
yuuji@0 798 [KEYWORDS] Bradner, S. "Key words for use in RFCs to
yuuji@0 799 Indicate Requirement Levels", BCP 14, RFC 2119,
yuuji@0 800 March 1997.
yuuji@0 801
yuuji@0 802 [RFC-2822] Resnick, P. "Internet Message Format", RFC
yuuji@0 803 2822, April 2001.
yuuji@0 804
yuuji@0 805 [UNICASEMAP] Crispin, M. "i;unicode-casemap - Simple Unicode
yuuji@0 806 Collation Algorithm", RFC 5051.
yuuji@0 807
yuuji@0 808 10. Informative References
yuuji@0 809
yuuji@0 810 The following documents are informative to this document:
yuuji@0 811
yuuji@0 812 [IMAP-MODELS] Crispin, M. "Distributed Electronic Mail Models
yuuji@0 813 in IMAP4", RFC 1733, December 1994.
yuuji@0 814
yuuji@0 815 [THREADING] Zawinski, J. "Message Threading",
yuuji@0 816 http://www.jwz.org/doc/threading.html,
yuuji@0 817 1997-2002.
yuuji@0 818
yuuji@0 819 Appendices
yuuji@0 820
yuuji@0 821 Author's Address
yuuji@0 822
yuuji@0 823 Mark R. Crispin
yuuji@0 824 Networks and Distributed Computing
yuuji@0 825 University of Washington
yuuji@0 826 4545 15th Avenue NE
yuuji@0 827 Seattle, WA 98105-4527
yuuji@0 828
yuuji@0 829 Phone: +1 (206) 543-5762
yuuji@0 830
yuuji@0 831 EMail: MRC@CAC.Washington.EDU
yuuji@0 832
yuuji@0 833 Kenneth Murchison
yuuji@0 834 Carnegie Mellon University
yuuji@0 835 5000 Forbes Avenue
yuuji@0 836 Cyert Hall 285
yuuji@0 837 Pittsburgh, PA 15213
yuuji@0 838
yuuji@0 839 Phone: +1 (412) 268-2638
yuuji@0 840 Email: murch@andrew.cmu.edu
yuuji@0 841
yuuji@0 842 Full Copyright Statement
yuuji@0 843
yuuji@0 844 Copyright (C) The IETF Trust (2008).
yuuji@0 845
yuuji@0 846 This document is subject to the rights, licenses and restrictions
yuuji@0 847 contained in BCP 78, and except as set forth therein, the authors
yuuji@0 848 retain all their rights.
yuuji@0 849
yuuji@0 850 This document and the information contained herein are provided on an
yuuji@0 851 "AS IS" basis and THE CONTRIBUTOR, THE ORGANIZATION HE/SHE REPRESENTS
yuuji@0 852 OR IS SPONSORED BY (IF ANY), THE INTERNET SOCIETY, THE IETF TRUST AND
yuuji@0 853 THE INTERNET ENGINEERING TASK FORCE DISCLAIM ALL WARRANTIES, EXPRESS
yuuji@0 854 OR IMPLIED, INCLUDING BUT NOT LIMITED TO ANY WARRANTY THAT THE USE OF
yuuji@0 855 THE INFORMATION HEREIN WILL NOT INFRINGE ANY RIGHTS OR ANY IMPLIED
yuuji@0 856 WARRANTIES OF MERCHANTABILITY OR FITNESS FOR A PARTICULAR PURPOSE.
yuuji@0 857
yuuji@0 858 Intellectual Property
yuuji@0 859
yuuji@0 860 The IETF takes no position regarding the validity or scope of any
yuuji@0 861 Intellectual Property Rights or other rights that might be claimed to
yuuji@0 862 pertain to the implementation or use of the technology described in
yuuji@0 863 this document or the extent to which any license under such rights
yuuji@0 864 might or might not be available; nor does it represent that it has
yuuji@0 865 made any independent effort to identify any such rights. Information
yuuji@0 866 on the procedures with respect to rights in RFC documents can be
yuuji@0 867 found in BCP 78 and BCP 79.
yuuji@0 868
yuuji@0 869 Copies of IPR disclosures made to the IETF Secretariat and any
yuuji@0 870 assurances of licenses to be made available, or the result of an
yuuji@0 871 attempt made to obtain a general license or permission for the use of
yuuji@0 872 such proprietary rights by implementers or users of this
yuuji@0 873 specification can be obtained from the IETF on-line IPR repository at
yuuji@0 874 http://www.ietf.org/ipr.
yuuji@0 875
yuuji@0 876 The IETF invites any interested party to bring to its attention any
yuuji@0 877 copyrights, patents or patent applications, or other proprietary
yuuji@0 878 rights that may cover technology that may be required to implement
yuuji@0 879 this standard. Please address the information to the IETF at ietf-
yuuji@0 880 ipr@ietf.org.
yuuji@0 881
yuuji@0 882 Acknowledgement
yuuji@0 883
yuuji@0 884 Funding for the RFC Editor function is currently provided by the
yuuji@0 885 Internet Society.

UW-IMAP'd extensions by yuuji