imapext-2007
diff docs/draft/sort.txt @ 0:ada5e610ab86
imap-2007e
author | yuuji@gentei.org |
---|---|
date | Mon, 14 Sep 2009 15:17:45 +0900 |
parents | |
children |
line diff
1.1 --- /dev/null Thu Jan 01 00:00:00 1970 +0000 1.2 +++ b/docs/draft/sort.txt Mon Sep 14 15:17:45 2009 +0900 1.3 @@ -0,0 +1,885 @@ 1.4 +IMAP Extensions Working Group M. Crispin 1.5 +Internet-Draft K. Murchison 1.6 +Intended status: Proposed Standard March 10, 2008 1.7 +Expires: September 10, 2008 1.8 +Document: internet-drafts/draft-ietf-imapext-sort-20.txt 1.9 + 1.10 + INTERNET MESSAGE ACCESS PROTOCOL - SORT AND THREAD EXTENSIONS 1.11 + 1.12 +Status of this Memo 1.13 + 1.14 + By submitting this Internet-Draft, each author represents that 1.15 + any applicable patent or other IPR claims of which he or she is 1.16 + aware have been or will be disclosed, and any of which he or she 1.17 + becomes aware will be disclosed, in accordance with Section 6 of 1.18 + BCP 79. 1.19 + 1.20 + Internet-Drafts are working documents of the Internet Engineering 1.21 + Task Force (IETF), its areas, and its working groups. Note that 1.22 + other groups may also distribute working documents as 1.23 + Internet-Drafts. 1.24 + 1.25 + Internet-Drafts are draft documents valid for a maximum of six months 1.26 + and may be updated, replaced, or obsoleted by other documents at any 1.27 + time. It is inappropriate to use Internet-Drafts as reference 1.28 + material or to cite them other than as "work in progress." 1.29 + 1.30 + The list of current Internet-Drafts can be accessed at 1.31 + http://www.ietf.org/ietf/1id-abstracts.txt 1.32 + 1.33 + The list of Internet-Draft Shadow Directories can be accessed at 1.34 + http://www.ietf.org/shadow.html. 1.35 + 1.36 + A revised version of this draft document will be submitted to the RFC 1.37 + editor as a Proposed Standard for the Internet Community. Discussion 1.38 + and suggestions for improvement are requested, and should be sent to 1.39 + ietf-imapext@IMC.ORG. 1.40 + 1.41 + Distribution of this memo is unlimited. 1.42 + 1.43 +Abstract 1.44 + 1.45 + This document describes the base-level server-based sorting and 1.46 + threading extensions to the [IMAP] protocol. These extensions 1.47 + provide substantial performance improvements for IMAP clients which 1.48 + offer sorted and threaded views. 1.49 + 1.50 +1. Introduction 1.51 + 1.52 + The SORT and THREAD extensions to the [IMAP] protocol provide a means 1.53 + of server-based sorting and threading of messages, without requiring 1.54 + that the client download the necessary data to do so itself. This is 1.55 + particularly useful for online clients as described in [IMAP-MODELS]. 1.56 + 1.57 + A server which supports the base-level SORT extension indicates this 1.58 + with a capability name which starts with "SORT". Future, 1.59 + upwards-compatible extensions to the SORT extension will all start 1.60 + with "SORT", indicating support for this base level. 1.61 + 1.62 + A server which supports the THREAD extension indicates this with one 1.63 + or more capability names consisting of "THREAD=" followed by a 1.64 + supported threading algorithm name as described in this document. 1.65 + This provides for future upwards-compatible extensions. 1.66 + 1.67 + A server which implements the SORT and/or THREAD extensions MUST 1.68 + collate strings in accordance with the requirements of I18NLEVEL=1, 1.69 + as described in [IMAP-I18N], and SHOULD implement and advertise the 1.70 + I18NLEVEL=1 extension. Alternatively, a server MAY implement 1.71 + I18NLEVEL=2 (or higher) and comply with the rules of that level. 1.72 + 1.73 + Discussion: the SORT and THREAD extensions predate [IMAP-I18N] by 1.74 + several years. At the time of this writing, all known server 1.75 + implementations of SORT and THREAD comply with the rules of 1.76 + I18NLEVEL=1, but do not necessarily advertise it. As discussed 1.77 + in [IMAP-I18N] section 4.5, all server implementations should 1.78 + eventually be updated to comply with the I18NLEVEL=2 extension. 1.79 + 1.80 + Historical note: the REFERENCES threading algorithm is based on the 1.81 + [THREADING] algorithm written used in "Netscape Mail and News" 1.82 + versions 2.0 through 3.0. 1.83 + 1.84 +2. Terminology 1.85 + 1.86 + The key words "MUST", "MUST NOT", "REQUIRED", "SHALL", "SHALL NOT", 1.87 + "SHOULD", "SHOULD NOT", "RECOMMENDED", "MAY", and "OPTIONAL" in this 1.88 + document are to be interpreted as described in [KEYWORDS]. 1.89 + 1.90 + The word "can" (not "may") is used to refer to a possible 1.91 + circumstance or situation, as opposed to an optional facility of the 1.92 + protocol. 1.93 + 1.94 + "User" is used to refer to a human user, whereas "client" refers to 1.95 + the software being run by the user. 1.96 + 1.97 + In examples, "C:" and "S:" indicate lines sent by the client and 1.98 + server respectively. 1.99 + 1.100 +2.1 Base Subject 1.101 + 1.102 + Subject sorting and threading use the "base subject," which has 1.103 + specific subject artifacts removed. Due to the complexity of these 1.104 + artifacts, the formal syntax for the subject extraction rules is 1.105 + ambiguous. The following procedure is followed to determine the 1.106 + "base subject", using the [ABNF] formal syntax rules described in 1.107 + section 5: 1.108 + 1.109 + (1) Convert any RFC 2047 encoded-words in the subject to 1.110 + UTF-8 as described in "internationalization 1.111 + considerations." Convert all tabs and continuations to 1.112 + space. Convert all multiple spaces to a single space. 1.113 + 1.114 + (2) Remove all trailing text of the subject that matches 1.115 + the subj-trailer ABNF, repeat until no more matches are 1.116 + possible. 1.117 + 1.118 + (3) Remove all prefix text of the subject that matches the 1.119 + subj-leader ABNF. 1.120 + 1.121 + (4) If there is prefix text of the subject that matches the 1.122 + subj-blob ABNF, and removing that prefix leaves a non-empty 1.123 + subj-base, then remove the prefix text. 1.124 + 1.125 + (5) Repeat (3) and (4) until no matches remain. 1.126 + 1.127 + Note: it is possible to defer step (2) until step (6), but this 1.128 + requires checking for subj-trailer in step (4). 1.129 + 1.130 + (6) If the resulting text begins with the subj-fwd-hdr ABNF 1.131 + and ends with the subj-fwd-trl ABNF, remove the 1.132 + subj-fwd-hdr and subj-fwd-trl and repeat from step (2). 1.133 + 1.134 + (7) The resulting text is the "base subject" used in the 1.135 + SORT. 1.136 + 1.137 + All servers and disconnected (as described in [IMAP-MODELS]) clients 1.138 + MUST use exactly this algorithm to determine the "base subject". 1.139 + Otherwise there is potential for a user to get inconsistent results 1.140 + based on whether they are running in connected or disconnected mode. 1.141 + 1.142 +2.2 Sent Date 1.143 + 1.144 + As used in this document, the term "sent date" refers to the date and 1.145 + time from the Date: header, adjusted by time zone to normalize to 1.146 + UTC. For example, "31 Dec 2000 16:01:33 -0800" is equivalent to the 1.147 + UTC date and time of "1 Jan 2001 00:01:33 +0000". 1.148 + 1.149 + If the time zone is invalid, the date and time SHOULD be treated as 1.150 + UTC. If the time is also invalid, the time SHOULD be treated as 1.151 + 00:00:00. If there is no valid date or time, the date and time 1.152 + SHOULD be treated as 00:00:00 on the earliest possible date. 1.153 + 1.154 + This differs from the date-related criteria in the SEARCH command 1.155 + (described in [IMAP] section 6.4.4), which use just the date and not 1.156 + the time, and are not adjusted by time zone. 1.157 + 1.158 + If the sent date can not be determined (a Date: header is missing or 1.159 + can not be parsed), the INTERNALDATE for that message is used as the 1.160 + sent date. 1.161 + 1.162 + When comparing two sent dates that match exactly, the order in which 1.163 + the two messages appear in the mailbox (that is, by sequence number) 1.164 + is used as a tie-breaker to determine the order. 1.165 + 1.166 +3. Additional Commands 1.167 + 1.168 + These commands are extension to the [IMAP] base protocol. 1.169 + 1.170 + The section headings are intended to correspond with where they would 1.171 + be located in the main document if they were part of the base 1.172 + specification. 1.173 + 1.174 +BASE.6.4.SORT. SORT Command 1.175 + 1.176 + Arguments: sort program 1.177 + charset specification 1.178 + searching criteria (one or more) 1.179 + 1.180 + Data: untagged responses: SORT 1.181 + 1.182 + Result: OK - sort completed 1.183 + NO - sort error: can't sort that charset or 1.184 + criteria 1.185 + BAD - command unknown or arguments invalid 1.186 + 1.187 + The SORT command is a variant of SEARCH with sorting semantics for 1.188 + the results. Sort has two arguments before the searching criteria 1.189 + argument; a parenthesized list of sort criteria, and the searching 1.190 + charset. 1.191 + 1.192 + The charset argument is mandatory (unlike SEARCH) and indicates 1.193 + the [CHARSET] of the strings that appear in the searching 1.194 + criteria. The US-ASCII and UTF-8 charsets MUST be implemented. 1.195 + All other charsets are optional. 1.196 + 1.197 + There is also a UID SORT command which returns unique identifiers 1.198 + instead of message sequence numbers. Note that there are separate 1.199 + searching criteria for message sequence numbers and UIDs; thus the 1.200 + arguments to UID SORT are interpreted the same as in SORT. This 1.201 + is analogous to the behavior of UID SEARCH, as opposed to UID 1.202 + COPY, UID FETCH, or UID STORE. 1.203 + 1.204 + The SORT command first searches the mailbox for messages that 1.205 + match the given searching criteria using the charset argument for 1.206 + the interpretation of strings in the searching criteria. It then 1.207 + returns the matching messages in an untagged SORT response, sorted 1.208 + according to one or more sort criteria. 1.209 + 1.210 + Sorting is in ascending order. Earlier dates sort before later 1.211 + dates; smaller sizes sort before larger sizes; and strings are 1.212 + sorted according to ascending values established by their 1.213 + collation algorithm (see under "Internationalization 1.214 + Considerations"). 1.215 + 1.216 + If two or more messages exactly match according to the sorting 1.217 + criteria, these messages are sorted according to the order in 1.218 + which they appear in the mailbox. In other words, there is an 1.219 + implicit sort criterion of "sequence number". 1.220 + 1.221 + When multiple sort criteria are specified, the result is sorted in 1.222 + the priority order that the criteria appear. For example, 1.223 + (SUBJECT DATE) will sort messages in order by their base subject 1.224 + text; and for messages with the same base subject text will sort 1.225 + by their sent date. 1.226 + 1.227 + Untagged EXPUNGE responses are not permitted while the server is 1.228 + responding to a SORT command, but are permitted during a UID SORT 1.229 + command. 1.230 + 1.231 + The defined sort criteria are as follows. Refer to the Formal 1.232 + Syntax section for the precise syntactic definitions of the 1.233 + arguments. If the associated RFC-822 header for a particular 1.234 + criterion is absent, it is treated as the empty string. The empty 1.235 + string always collates before non-empty strings. 1.236 + 1.237 + ARRIVAL 1.238 + Internal date and time of the message. This differs from the 1.239 + ON criteria in SEARCH, which uses just the internal date. 1.240 + 1.241 + CC 1.242 + [IMAP] addr-mailbox of the first "cc" address. 1.243 + 1.244 + DATE 1.245 + Sent date and time, as described in section 2.2. 1.246 + 1.247 + FROM 1.248 + [IMAP] addr-mailbox of the first "From" address. 1.249 + 1.250 + REVERSE 1.251 + Followed by another sort criterion, has the effect of that 1.252 + criterion but in reverse (descending) order. 1.253 + Note: REVERSE only reverses a single criterion, and does not 1.254 + affect the implicit "sequence number" sort criterion if all 1.255 + other criteria are identicial. Consequently, a sort of 1.256 + REVERSE SUBJECT is not the same as a reverse ordering of a 1.257 + SUBJECT sort. This can be avoided by use of additional 1.258 + criteria, e.g. SUBJECT DATE vs. REVERSE SUBJECT REVERSE 1.259 + DATE. In general, however, it's better (and faster, if the 1.260 + client has a "reverse current ordering" command) to reverse 1.261 + the results in the client instead of issuing a new SORT. 1.262 + 1.263 + SIZE 1.264 + Size of the message in octets. 1.265 + 1.266 + SUBJECT 1.267 + Base subject text. 1.268 + 1.269 + TO 1.270 + [IMAP] addr-mailbox of the first "To" address. 1.271 + 1.272 + Example: C: A282 SORT (SUBJECT) UTF-8 SINCE 1-Feb-1994 1.273 + S: * SORT 2 84 882 1.274 + S: A282 OK SORT completed 1.275 + C: A283 SORT (SUBJECT REVERSE DATE) UTF-8 ALL 1.276 + S: * SORT 5 3 4 1 2 1.277 + S: A283 OK SORT completed 1.278 + C: A284 SORT (SUBJECT) US-ASCII TEXT "not in mailbox" 1.279 + S: * SORT 1.280 + S: A284 OK SORT completed 1.281 + 1.282 +BASE.6.4.THREAD. THREAD Command 1.283 + 1.284 +Arguments: threading algorithm 1.285 + charset specification 1.286 + searching criteria (one or more) 1.287 + 1.288 +Data: untagged responses: THREAD 1.289 + 1.290 +Result: OK - thread completed 1.291 + NO - thread error: can't thread that charset or 1.292 + criteria 1.293 + BAD - command unknown or arguments invalid 1.294 + 1.295 + The THREAD command is a variant of SEARCH with threading semantics 1.296 + for the results. Thread has two arguments before the searching 1.297 + criteria argument; a threading algorithm, and the searching 1.298 + charset. 1.299 + 1.300 + The charset argument is mandatory (unlike SEARCH) and indicates 1.301 + the [CHARSET] of the strings that appear in the searching 1.302 + criteria. The US-ASCII and UTF-8 charsets MUST be implemented. 1.303 + All other charsets are optional. 1.304 + 1.305 + There is also a UID THREAD command which returns unique 1.306 + identifiers instead of message sequence numbers. Note that there 1.307 + are separate searching criteria for message sequence numbers and 1.308 + UIDs; thus the arguments to UID THREAD are interpreted the same as 1.309 + in THREAD. This is analogous to the behavior of UID SEARCH, as 1.310 + opposed to UID COPY, UID FETCH, or UID STORE. 1.311 + 1.312 + The THREAD command first searches the mailbox for messages that 1.313 + match the given searching criteria using the charset argument for 1.314 + the interpretation of strings in the searching criteria. It then 1.315 + returns the matching messages in an untagged THREAD response, 1.316 + threaded according to the specified threading algorithm. 1.317 + 1.318 + All collation is in ascending order. Earlier dates collate before 1.319 + later dates and strings are collated according to ascending values 1.320 + established by their collation algorithm (see under 1.321 + "Internationalization Considerations"). 1.322 + 1.323 + Untagged EXPUNGE responses are not permitted while the server is 1.324 + responding to a THREAD command, but are permitted during a UID 1.325 + THREAD command. 1.326 + 1.327 + The defined threading algorithms are as follows: 1.328 + 1.329 + ORDEREDSUBJECT 1.330 + 1.331 + The ORDEREDSUBJECT threading algorithm is also referred to as 1.332 + "poor man's threading." The searched messages are sorted by 1.333 + base subject and then by the sent date. The messages are then 1.334 + split into separate threads, with each thread containing 1.335 + messages with the same base subject text. Finally, the threads 1.336 + are sorted by the sent date of the first message in the thread. 1.337 + 1.338 + The first message of each thread are siblings of each other 1.339 + (the "root"). The second message of a thread is the child of 1.340 + the first message, and subsequent messages of the thread are 1.341 + siblings of the second message and hence children of the 1.342 + message at the root. Hence, there are no grandchildren in 1.343 + ORDEREDSUBJECT threading. 1.344 + 1.345 + Children in ORDEREDSUBJECT threading do not have descendents. 1.346 + Client implementations SHOULD treat descendents of a child in 1.347 + a server response as being siblings of that child. 1.348 + 1.349 + REFERENCES 1.350 + 1.351 + The REFERENCES threading algorithm threads the searched 1.352 + messages by grouping them together in parent/child 1.353 + relationships based on which messages are replies to others. 1.354 + The parent/child relationships are built using two methods: 1.355 + reconstructing a message's ancestry using the references 1.356 + contained within it; and checking the original (not base) 1.357 + subject of a message to see if it is a reply to (or forward of) 1.358 + another message. 1.359 + 1.360 + Note: "Message ID" in the following description refers to a 1.361 + normalized form of the msg-id in [RFC-2822]. The actual 1.362 + text in an RFC 2822 may use quoting, resulting in multiple 1.363 + ways of expressing the same Message ID. Implementations of 1.364 + the REFERENCES threading algorithm MUST normalize any msg-id 1.365 + in order to avoid false non-matches due to differences in 1.366 + quoting. 1.367 + 1.368 + For example, the msg-id 1.369 + <"01KF8JCEOCBS0045PS"@xxx.yyy.com> 1.370 + and the msg-id 1.371 + <01KF8JCEOCBS0045PS@xxx.yyy.com> 1.372 + MUST be interpreted as being the same Message ID. 1.373 + 1.374 + The references used for reconstructing a message's ancestry are 1.375 + found using the following rules: 1.376 + 1.377 + If a message contains a References header line, then use the 1.378 + Message IDs in the References header line as the references. 1.379 + 1.380 + If a message does not contain a References header line, or 1.381 + the References header line does not contain any valid 1.382 + Message IDs, then use the first (if any) valid Message ID 1.383 + found in the In-Reply-To header line as the only reference 1.384 + (parent) for this message. 1.385 + 1.386 + Note: Although [RFC-2822] permits multiple Message IDs in 1.387 + the In-Reply-To header, in actual practice this 1.388 + discipline has not been followed. For example, 1.389 + In-Reply-To headers have been observed with message 1.390 + addresses after the Message ID, and there are no good 1.391 + heuristics for software to determine the difference. 1.392 + This is not a problem with the References header however. 1.393 + 1.394 + If a message does not contain an In-Reply-To header line, or 1.395 + the In-Reply-To header line does not contain a valid Message 1.396 + ID, then the message does not have any references (NIL). 1.397 + 1.398 + A message is considered to be a reply or forward if the base 1.399 + subject extraction rules, applied to the original subject, 1.400 + remove any of the following: a subj-refwd, a "(fwd)" 1.401 + subj-trailer, or a subj-fwd-hdr and subj-fwd-trl. 1.402 + 1.403 + The REFERENCES algorithm is significantly more complex than 1.404 + ORDEREDSUBJECT and consists of six main steps. These steps are 1.405 + outlined in detail below. 1.406 + 1.407 + (1) For each searched message: 1.408 + 1.409 + (A) Using the Message IDs in the message's references, link 1.410 + the corresponding messages (those whose Message-ID header 1.411 + line contains the given reference Message ID) together as 1.412 + parent/child. Make the first reference the parent of the 1.413 + second (and the second a child of the first), the second the 1.414 + parent of the third (and the third a child of the second), 1.415 + etc. The following rules govern the creation of these 1.416 + links: 1.417 + 1.418 + If a message does not contain a Message-ID header line, 1.419 + or the Message-ID header line does not contain a valid 1.420 + Message ID, then assign a unique Message ID to this 1.421 + message. 1.422 + 1.423 + If two or more messages have the same Message ID, then 1.424 + only use that Message ID in the first (lowest sequence 1.425 + number) message, and assign a unique Message ID to each 1.426 + of the subsequent messages with a duplicate of that 1.427 + Message ID. 1.428 + 1.429 + If no message can be found with a given Message ID, 1.430 + create a dummy message with this ID. Use this dummy 1.431 + message for all subsequent references to this ID. 1.432 + 1.433 + If a message already has a parent, don't change the 1.434 + existing link. This is done because the References 1.435 + header line may have been truncated by a MUA. As a 1.436 + result, there is no guarantee that the messages 1.437 + corresponding to adjacent Message IDs in the References 1.438 + header line are parent and child. 1.439 + 1.440 + Do not create a parent/child link if creating that link 1.441 + would introduce a loop. For example, before making 1.442 + message A the parent of B, make sure that A is not a 1.443 + descendent of B. 1.444 + 1.445 + Note: Message ID comparisons are case-sensitive. 1.446 + 1.447 + (B) Create a parent/child link between the last reference 1.448 + (or NIL if there are no references) and the current message. 1.449 + If the current message already has a parent, it is probably 1.450 + the result of a truncated References header line, so break 1.451 + the current parent/child link before creating the new 1.452 + correct one. As in step 1.A, do not create the parent/child 1.453 + link if creating that link would introduce a loop. Note 1.454 + that if this message has no references, that it will now 1.455 + have no parent. 1.456 + 1.457 + Note: The parent/child links created in steps 1.A and 1.B 1.458 + MUST be kept consistent with one another at ALL times. 1.459 + 1.460 + (2) Gather together all of the messages that have no parents 1.461 + and make them all children (siblings of one another) of a dummy 1.462 + parent (the "root"). These messages constitute the first 1.463 + (head) message of the threads created thus far. 1.464 + 1.465 + (3) Prune dummy messages from the thread tree. Traverse each 1.466 + thread under the root, and for each message: 1.467 + 1.468 + If it is a dummy message with NO children, delete it. 1.469 + 1.470 + If it is a dummy message with children, delete it, but 1.471 + promote its children to the current level. In other words, 1.472 + splice them in with the dummy's siblings. 1.473 + 1.474 + Do not promote the children if doing so would make them 1.475 + children of the root, unless there is only one child. 1.476 + 1.477 + (4) Sort the messages under the root (top-level siblings only) 1.478 + by sent date as described in section 2.2. In the case of a 1.479 + dummy message, sort its children by sent date and then use the 1.480 + first child for the top-level sort. 1.481 + 1.482 + (5) Gather together messages under the root that have the same 1.483 + base subject text. 1.484 + 1.485 + (A) Create a table for associating base subjects with 1.486 + messages, called the subject table. 1.487 + 1.488 + (B) Populate the subject table with one message per each 1.489 + base subject. For each child of the root: 1.490 + 1.491 + (i) Find the subject of this thread, by using the base 1.492 + subject from either the current message or its first 1.493 + child if the current message is a dummy. This is the 1.494 + thread subject. 1.495 + 1.496 + (ii) If the thread subject is empty, skip this message. 1.497 + 1.498 + (iii) Look up the message associated with the thread 1.499 + subject in the subject table. 1.500 + 1.501 + (iv) If there is no message in the subject table with the 1.502 + thread subject, add the current message and the thread 1.503 + subject to the subject table. 1.504 + 1.505 + Otherwise, if the message in the subject table is not a 1.506 + dummy, AND either of the following criteria are true: 1.507 + 1.508 + The current message is a dummy, OR 1.509 + 1.510 + The message in the subject table is a reply or forward 1.511 + and the current message is not. 1.512 + 1.513 + then replace the message in the subject table with the 1.514 + current message. 1.515 + 1.516 + (C) Merge threads with the same thread subject. For each 1.517 + child of the root: 1.518 + 1.519 + (i) Find the message's thread subject as in step 5.B.i 1.520 + above. 1.521 + 1.522 + (ii) If the thread subject is empty, skip this message. 1.523 + 1.524 + (iii) Lookup the message associated with this thread 1.525 + subject in the subject table. 1.526 + 1.527 + (iv) If the message in the subject table is the current 1.528 + message, skip this message. 1.529 + 1.530 + Otherwise, merge the current message with the one in the 1.531 + subject table using the following rules: 1.532 + 1.533 + If both messages are dummies, append the current 1.534 + message's children to the children of the message in 1.535 + the subject table (the children of both messages 1.536 + become siblings), and then delete the current message. 1.537 + 1.538 + If the message in the subject table is a dummy and the 1.539 + current message is not, make the current message a 1.540 + child of the message in the subject table (a sibling 1.541 + of its children). 1.542 + 1.543 + If the current message is a reply or forward and the 1.544 + message in the subject table is not, make the current 1.545 + message a child of the message in the subject table (a 1.546 + sibling of its children). 1.547 + 1.548 + Otherwise, create a new dummy message and make both 1.549 + the current message and the message in the subject 1.550 + table children of the dummy. Then replace the message 1.551 + in the subject table with the dummy message. 1.552 + 1.553 + Note: Subject comparisons are case-insensitive, as 1.554 + described under "Internationalization 1.555 + Considerations." 1.556 + 1.557 + (6) Traverse the messages under the root and sort each set of 1.558 + siblings by sent date as described in section 2.2. Traverse 1.559 + the messages in such a way that the "youngest" set of siblings 1.560 + are sorted first, and the "oldest" set of siblings are sorted 1.561 + last (grandchildren are sorted before children, etc). In the 1.562 + case of a dummy message (which can only occur with top-level 1.563 + siblings), use its first child for sorting. 1.564 + 1.565 + Example: C: A283 THREAD ORDEREDSUBJECT UTF-8 SINCE 5-MAR-2000 1.566 + S: * THREAD (166)(167)(168)(169)(172)(170)(171) 1.567 + (173)(174 (175)(176)(178)(181)(180))(179)(177 1.568 + (183)(182)(188)(184)(185)(186)(187)(189))(190) 1.569 + (191)(192)(193)(194 195)(196 (197)(198))(199) 1.570 + (200 202)(201)(203)(204)(205)(206 207)(208) 1.571 + S: A283 OK THREAD completed 1.572 + C: A284 THREAD ORDEREDSUBJECT US-ASCII TEXT "gewp" 1.573 + S: * THREAD 1.574 + S: A284 OK THREAD completed 1.575 + C: A285 THREAD REFERENCES UTF-8 SINCE 5-MAR-2000 1.576 + S: * THREAD (166)(167)(168)(169)(172)((170)(179)) 1.577 + (171)(173)((174)(175)(176)(178)(181)(180)) 1.578 + ((177)(183)(182)(188 (184)(189))(185 186)(187)) 1.579 + (190)(191)(192)(193)((194)(195 196))(197 198) 1.580 + (199)(200 202)(201)(203)(204)(205 206 207)(208) 1.581 + S: A285 OK THREAD completed 1.582 + 1.583 + Note: The line breaks in the first and third server 1.584 + responses are for editorial clarity and do not appear in 1.585 + real THREAD responses. 1.586 + 1.587 +4. Additional Responses 1.588 + 1.589 + These responses are extensions to the [IMAP] base protocol. 1.590 + 1.591 + The section headings of these responses are intended to correspond 1.592 + with where they would be located in the main document. 1.593 + 1.594 +BASE.7.2.SORT. SORT Response 1.595 + 1.596 + Data: zero or more numbers 1.597 + 1.598 + The SORT response occurs as a result of a SORT or UID SORT 1.599 + command. The number(s) refer to those messages that match the 1.600 + search criteria. For SORT, these are message sequence numbers; 1.601 + for UID SORT, these are unique identifiers. Each number is 1.602 + delimited by a space. 1.603 + 1.604 + Example: S: * SORT 2 3 6 1.605 + 1.606 +BASE.7.2.THREAD. THREAD Response 1.607 + 1.608 + Data: zero or more threads 1.609 + 1.610 + The THREAD response occurs as a result of a THREAD or UID THREAD 1.611 + command. It contains zero or more threads. A thread consists of 1.612 + a parenthesized list of thread members. 1.613 + 1.614 + Thread members consist of zero or more message numbers, delimited 1.615 + by spaces, indicating successive parent and child. This continues 1.616 + until the thread splits into multiple sub-threads, at which point 1.617 + the thread nests into multiple sub-threads with the first member 1.618 + of each subthread being siblings at this level. There is no limit 1.619 + to the nesting of threads. 1.620 + 1.621 + The messages numbers refer to those messages that match the search 1.622 + criteria. For THREAD, these are message sequence numbers; for UID 1.623 + THREAD, these are unique identifiers. 1.624 + 1.625 + Example: S: * THREAD (2)(3 6 (4 23)(44 7 96)) 1.626 + 1.627 + The first thread consists only of message 2. The second thread 1.628 + consists of the messages 3 (parent) and 6 (child), after which it 1.629 + splits into two subthreads; the first of which contains messages 4 1.630 + (child of 6, sibling of 44) and 23 (child of 4), and the second of 1.631 + which contains messages 44 (child of 6, sibling of 4), 7 (child of 1.632 + 44), and 96 (child of 7). Since some later messages are parents 1.633 + of earlier messages, the messages were probably moved from some 1.634 + other mailbox at different times. 1.635 + 1.636 + -- 2 1.637 + 1.638 + -- 3 1.639 + \-- 6 1.640 + |-- 4 1.641 + | \-- 23 1.642 + | 1.643 + \-- 44 1.644 + \-- 7 1.645 + \-- 96 1.646 + 1.647 + Example: S: * THREAD ((3)(5)) 1.648 + 1.649 + In this example, 3 and 5 are siblings of a parent which does not 1.650 + match the search criteria (and/or does not exist in the mailbox); 1.651 + however they are members of the same thread. 1.652 + 1.653 +5. Formal Syntax of SORT and THREAD Commands and Responses 1.654 + 1.655 + The following syntax specification uses the Augmented Backus-Naur 1.656 + Form (ABNF) notation as specified in [ABNF]. It also uses [ABNF] 1.657 + rules defined in [IMAP]. 1.658 + 1.659 +sort = ["UID" SP] "SORT" SP sort-criteria SP search-criteria 1.660 + 1.661 +sort-criteria = "(" sort-criterion *(SP sort-criterion) ")" 1.662 + 1.663 +sort-criterion = ["REVERSE" SP] sort-key 1.664 + 1.665 +sort-key = "ARRIVAL" / "CC" / "DATE" / "FROM" / "SIZE" / 1.666 + "SUBJECT" / "TO" 1.667 + 1.668 +thread = ["UID" SP] "THREAD" SP thread-alg SP search-criteria 1.669 + 1.670 +thread-alg = "ORDEREDSUBJECT" / "REFERENCES" / thread-alg-ext 1.671 + 1.672 +thread-alg-ext = atom 1.673 + ; New algorithms MUST be registered with IANA 1.674 + 1.675 +search-criteria = charset 1*(SP search-key) 1.676 + 1.677 +charset = atom / quoted 1.678 + ; CHARSET values MUST be registered with IANA 1.679 + 1.680 +sort-data = "SORT" *(SP nz-number) 1.681 + 1.682 +thread-data = "THREAD" [SP 1*thread-list] 1.683 + 1.684 +thread-list = "(" (thread-members / thread-nested) ")" 1.685 + 1.686 +thread-members = nz-number *(SP nz-number) [SP thread-nested] 1.687 + 1.688 +thread-nested = 2*thread-list 1.689 + 1.690 + The following syntax describes base subject extraction rules (2)-(6): 1.691 + 1.692 +subject = *subj-leader [subj-middle] *subj-trailer 1.693 + 1.694 +subj-refwd = ("re" / ("fw" ["d"])) *WSP [subj-blob] ":" 1.695 + 1.696 +subj-blob = "[" *BLOBCHAR "]" *WSP 1.697 + 1.698 +subj-fwd = subj-fwd-hdr subject subj-fwd-trl 1.699 + 1.700 +subj-fwd-hdr = "[fwd:" 1.701 + 1.702 +subj-fwd-trl = "]" 1.703 + 1.704 +subj-leader = (*subj-blob subj-refwd) / WSP 1.705 + 1.706 +subj-middle = *subj-blob (subj-base / subj-fwd) 1.707 + ; last subj-blob is subj-base if subj-base would 1.708 + ; otherwise be empty 1.709 + 1.710 +subj-trailer = "(fwd)" / WSP 1.711 + 1.712 +subj-base = NONWSP *(*WSP NONWSP) 1.713 + ; can be a subj-blob 1.714 + 1.715 +BLOBCHAR = %x01-5a / %x5c / %x5e-ff 1.716 + ; any CHAR8 except '[' and ']' 1.717 + 1.718 +NONWSP = %x01-08 / %x0a-1f / %x21-ff 1.719 + ; any CHAR8 other than WSP 1.720 + 1.721 +6. Security Considerations 1.722 + 1.723 + The SORT and THREAD extensions do not raise any security 1.724 + considerations that are not present in the base [IMAP] protocol, and 1.725 + these issues are discussed in [IMAP]. Nevertheless, it is important 1.726 + to remember that [IMAP] protocol transactions, including message 1.727 + data, are sent in the clear over the network unless protection from 1.728 + snooping is negotiated, either by the use of STARTTLS, privacy 1.729 + protection is negotiated in the AUTHENTICATE command, or some other 1.730 + protection mechanism. 1.731 + 1.732 + Although not a security consideration, it is important to recognize 1.733 + that sorting by REFERENCES can lead to misleading threading trees. 1.734 + For example, a message with false References: header data will cause 1.735 + a thread to be incorporated into another thread. 1.736 + 1.737 + The process of extracting the base subject may lead to incorrect 1.738 + collation if the extracted data was significant text as opposed to 1.739 + a subject artifact. 1.740 + 1.741 +7. Internationalization Considerations 1.742 + 1.743 + As stated in the introduction, the rules of I18NLEVEL=1 as described 1.744 + in [IMAP-I18N] MUST be followed; that is, the SORT and THREAD 1.745 + extensions MUST collate strings according to the i;unicode-casemap 1.746 + collation described in [UNICASEMAP]. Servers SHOULD also advertise 1.747 + the I18NLEVEL=1 extension. Alternatively, a server MAY implement 1.748 + I18NLEVEL=2 (or higher) and comply with the rules of that level. 1.749 + 1.750 + As discussed in [IMAP-I18N] section 4.5, all server implementations 1.751 + should eventually be updated to support the [IMAP-I18N] I18NLEVEL=2 1.752 + extension. 1.753 + 1.754 + Translations of the "re" or "fw"/"fwd" tokens are not specified for 1.755 + removal in the base subject extraction process. An attempt to add 1.756 + such translated tokens would result in a geometrically complex, and 1.757 + ultimately unimplementable, task. 1.758 + 1.759 + Instead, note that [RFC-2822] section 3.6.5 recommends that "re:" 1.760 + (from the Latin "res", in the matter of) be used to identify a reply. 1.761 + Although it is evident that, from the multiple forms of token to 1.762 + identify a forwarded message, there is considerable variation found 1.763 + in the wild, the variations are (still) manageable. Consequently, it 1.764 + is suggested that "re:" and one of the variations of the tokens for 1.765 + forward supported by the base subject extraction rules be adopted for 1.766 + Internet mail messages, since doing so makes it a simple display time 1.767 + task to localize the token language for the user. 1.768 + 1.769 +8. IANA Considerations 1.770 + 1.771 + [IMAP] capabilities are registered by publishing a standards track or 1.772 + IESG approved experimental RFC. This document constitutes 1.773 + registration of the SORT and THREAD capabilities in the [IMAP] 1.774 + capabilities registry. 1.775 + 1.776 + This document creates a new [IMAP] threading algorithms registry, 1.777 + which registers threading algorithms by publishing a standards track 1.778 + or IESG approved experimental RFC. This document constitutes 1.779 + registration of the ORDEREDSUBJECT and REFERENCES algorithms in that 1.780 + registry. 1.781 + 1.782 +9. Normative References 1.783 + 1.784 + The following documents are normative to this document: 1.785 + 1.786 + [ABNF] Crocker, D. and Overell, P. "Augmented BNF 1.787 + for Syntax Specifications: ABNF", RFC 5234 1.788 + January 2008 1.789 + 1.790 + [CHARSET] Freed, N. and Postel, J. "IANA Character Set 1.791 + Registration Procedures", RFC 2978, October 1.792 + 2000. 1.793 + 1.794 + [IMAP] Crispin, M. "Internet Message Access Protocol - 1.795 + Version 4rev1", RFC 3501, March 2003. 1.796 + 1.797 + [IMAP-I18N] Newman, C. and Gulbrandsen, A. "Internet 1.798 + Message Access Protocol Internationalization", 1.799 + Work in Progress. 1.800 + 1.801 + [KEYWORDS] Bradner, S. "Key words for use in RFCs to 1.802 + Indicate Requirement Levels", BCP 14, RFC 2119, 1.803 + March 1997. 1.804 + 1.805 + [RFC-2822] Resnick, P. "Internet Message Format", RFC 1.806 + 2822, April 2001. 1.807 + 1.808 + [UNICASEMAP] Crispin, M. "i;unicode-casemap - Simple Unicode 1.809 + Collation Algorithm", RFC 5051. 1.810 + 1.811 +10. Informative References 1.812 + 1.813 + The following documents are informative to this document: 1.814 + 1.815 + [IMAP-MODELS] Crispin, M. "Distributed Electronic Mail Models 1.816 + in IMAP4", RFC 1733, December 1994. 1.817 + 1.818 + [THREADING] Zawinski, J. "Message Threading", 1.819 + http://www.jwz.org/doc/threading.html, 1.820 + 1997-2002. 1.821 + 1.822 +Appendices 1.823 + 1.824 +Author's Address 1.825 + 1.826 + Mark R. Crispin 1.827 + Networks and Distributed Computing 1.828 + University of Washington 1.829 + 4545 15th Avenue NE 1.830 + Seattle, WA 98105-4527 1.831 + 1.832 + Phone: +1 (206) 543-5762 1.833 + 1.834 + EMail: MRC@CAC.Washington.EDU 1.835 + 1.836 + Kenneth Murchison 1.837 + Carnegie Mellon University 1.838 + 5000 Forbes Avenue 1.839 + Cyert Hall 285 1.840 + Pittsburgh, PA 15213 1.841 + 1.842 + Phone: +1 (412) 268-2638 1.843 + Email: murch@andrew.cmu.edu 1.844 + 1.845 +Full Copyright Statement 1.846 + 1.847 + Copyright (C) The IETF Trust (2008). 1.848 + 1.849 + This document is subject to the rights, licenses and restrictions 1.850 + contained in BCP 78, and except as set forth therein, the authors 1.851 + retain all their rights. 1.852 + 1.853 + This document and the information contained herein are provided on an 1.854 + "AS IS" basis and THE CONTRIBUTOR, THE ORGANIZATION HE/SHE REPRESENTS 1.855 + OR IS SPONSORED BY (IF ANY), THE INTERNET SOCIETY, THE IETF TRUST AND 1.856 + THE INTERNET ENGINEERING TASK FORCE DISCLAIM ALL WARRANTIES, EXPRESS 1.857 + OR IMPLIED, INCLUDING BUT NOT LIMITED TO ANY WARRANTY THAT THE USE OF 1.858 + THE INFORMATION HEREIN WILL NOT INFRINGE ANY RIGHTS OR ANY IMPLIED 1.859 + WARRANTIES OF MERCHANTABILITY OR FITNESS FOR A PARTICULAR PURPOSE. 1.860 + 1.861 +Intellectual Property 1.862 + 1.863 + The IETF takes no position regarding the validity or scope of any 1.864 + Intellectual Property Rights or other rights that might be claimed to 1.865 + pertain to the implementation or use of the technology described in 1.866 + this document or the extent to which any license under such rights 1.867 + might or might not be available; nor does it represent that it has 1.868 + made any independent effort to identify any such rights. Information 1.869 + on the procedures with respect to rights in RFC documents can be 1.870 + found in BCP 78 and BCP 79. 1.871 + 1.872 + Copies of IPR disclosures made to the IETF Secretariat and any 1.873 + assurances of licenses to be made available, or the result of an 1.874 + attempt made to obtain a general license or permission for the use of 1.875 + such proprietary rights by implementers or users of this 1.876 + specification can be obtained from the IETF on-line IPR repository at 1.877 + http://www.ietf.org/ipr. 1.878 + 1.879 + The IETF invites any interested party to bring to its attention any 1.880 + copyrights, patents or patent applications, or other proprietary 1.881 + rights that may cover technology that may be required to implement 1.882 + this standard. Please address the information to the IETF at ietf- 1.883 + ipr@ietf.org. 1.884 + 1.885 +Acknowledgement 1.886 + 1.887 + Funding for the RFC Editor function is currently provided by the 1.888 + Internet Society.