imapext-2007

view docs/draft/sort.txt @ 0:ada5e610ab86

imap-2007e
author yuuji@gentei.org
date Mon, 14 Sep 2009 15:17:45 +0900
parents
children
line source
1 IMAP Extensions Working Group M. Crispin
2 Internet-Draft K. Murchison
3 Intended status: Proposed Standard March 10, 2008
4 Expires: September 10, 2008
5 Document: internet-drafts/draft-ietf-imapext-sort-20.txt
7 INTERNET MESSAGE ACCESS PROTOCOL - SORT AND THREAD EXTENSIONS
9 Status of this Memo
11 By submitting this Internet-Draft, each author represents that
12 any applicable patent or other IPR claims of which he or she is
13 aware have been or will be disclosed, and any of which he or she
14 becomes aware will be disclosed, in accordance with Section 6 of
15 BCP 79.
17 Internet-Drafts are working documents of the Internet Engineering
18 Task Force (IETF), its areas, and its working groups. Note that
19 other groups may also distribute working documents as
20 Internet-Drafts.
22 Internet-Drafts are draft documents valid for a maximum of six months
23 and may be updated, replaced, or obsoleted by other documents at any
24 time. It is inappropriate to use Internet-Drafts as reference
25 material or to cite them other than as "work in progress."
27 The list of current Internet-Drafts can be accessed at
28 http://www.ietf.org/ietf/1id-abstracts.txt
30 The list of Internet-Draft Shadow Directories can be accessed at
31 http://www.ietf.org/shadow.html.
33 A revised version of this draft document will be submitted to the RFC
34 editor as a Proposed Standard for the Internet Community. Discussion
35 and suggestions for improvement are requested, and should be sent to
36 ietf-imapext@IMC.ORG.
38 Distribution of this memo is unlimited.
40 Abstract
42 This document describes the base-level server-based sorting and
43 threading extensions to the [IMAP] protocol. These extensions
44 provide substantial performance improvements for IMAP clients which
45 offer sorted and threaded views.
47 1. Introduction
49 The SORT and THREAD extensions to the [IMAP] protocol provide a means
50 of server-based sorting and threading of messages, without requiring
51 that the client download the necessary data to do so itself. This is
52 particularly useful for online clients as described in [IMAP-MODELS].
54 A server which supports the base-level SORT extension indicates this
55 with a capability name which starts with "SORT". Future,
56 upwards-compatible extensions to the SORT extension will all start
57 with "SORT", indicating support for this base level.
59 A server which supports the THREAD extension indicates this with one
60 or more capability names consisting of "THREAD=" followed by a
61 supported threading algorithm name as described in this document.
62 This provides for future upwards-compatible extensions.
64 A server which implements the SORT and/or THREAD extensions MUST
65 collate strings in accordance with the requirements of I18NLEVEL=1,
66 as described in [IMAP-I18N], and SHOULD implement and advertise the
67 I18NLEVEL=1 extension. Alternatively, a server MAY implement
68 I18NLEVEL=2 (or higher) and comply with the rules of that level.
70 Discussion: the SORT and THREAD extensions predate [IMAP-I18N] by
71 several years. At the time of this writing, all known server
72 implementations of SORT and THREAD comply with the rules of
73 I18NLEVEL=1, but do not necessarily advertise it. As discussed
74 in [IMAP-I18N] section 4.5, all server implementations should
75 eventually be updated to comply with the I18NLEVEL=2 extension.
77 Historical note: the REFERENCES threading algorithm is based on the
78 [THREADING] algorithm written used in "Netscape Mail and News"
79 versions 2.0 through 3.0.
81 2. Terminology
83 The key words "MUST", "MUST NOT", "REQUIRED", "SHALL", "SHALL NOT",
84 "SHOULD", "SHOULD NOT", "RECOMMENDED", "MAY", and "OPTIONAL" in this
85 document are to be interpreted as described in [KEYWORDS].
87 The word "can" (not "may") is used to refer to a possible
88 circumstance or situation, as opposed to an optional facility of the
89 protocol.
91 "User" is used to refer to a human user, whereas "client" refers to
92 the software being run by the user.
94 In examples, "C:" and "S:" indicate lines sent by the client and
95 server respectively.
97 2.1 Base Subject
99 Subject sorting and threading use the "base subject," which has
100 specific subject artifacts removed. Due to the complexity of these
101 artifacts, the formal syntax for the subject extraction rules is
102 ambiguous. The following procedure is followed to determine the
103 "base subject", using the [ABNF] formal syntax rules described in
104 section 5:
106 (1) Convert any RFC 2047 encoded-words in the subject to
107 UTF-8 as described in "internationalization
108 considerations." Convert all tabs and continuations to
109 space. Convert all multiple spaces to a single space.
111 (2) Remove all trailing text of the subject that matches
112 the subj-trailer ABNF, repeat until no more matches are
113 possible.
115 (3) Remove all prefix text of the subject that matches the
116 subj-leader ABNF.
118 (4) If there is prefix text of the subject that matches the
119 subj-blob ABNF, and removing that prefix leaves a non-empty
120 subj-base, then remove the prefix text.
122 (5) Repeat (3) and (4) until no matches remain.
124 Note: it is possible to defer step (2) until step (6), but this
125 requires checking for subj-trailer in step (4).
127 (6) If the resulting text begins with the subj-fwd-hdr ABNF
128 and ends with the subj-fwd-trl ABNF, remove the
129 subj-fwd-hdr and subj-fwd-trl and repeat from step (2).
131 (7) The resulting text is the "base subject" used in the
132 SORT.
134 All servers and disconnected (as described in [IMAP-MODELS]) clients
135 MUST use exactly this algorithm to determine the "base subject".
136 Otherwise there is potential for a user to get inconsistent results
137 based on whether they are running in connected or disconnected mode.
139 2.2 Sent Date
141 As used in this document, the term "sent date" refers to the date and
142 time from the Date: header, adjusted by time zone to normalize to
143 UTC. For example, "31 Dec 2000 16:01:33 -0800" is equivalent to the
144 UTC date and time of "1 Jan 2001 00:01:33 +0000".
146 If the time zone is invalid, the date and time SHOULD be treated as
147 UTC. If the time is also invalid, the time SHOULD be treated as
148 00:00:00. If there is no valid date or time, the date and time
149 SHOULD be treated as 00:00:00 on the earliest possible date.
151 This differs from the date-related criteria in the SEARCH command
152 (described in [IMAP] section 6.4.4), which use just the date and not
153 the time, and are not adjusted by time zone.
155 If the sent date can not be determined (a Date: header is missing or
156 can not be parsed), the INTERNALDATE for that message is used as the
157 sent date.
159 When comparing two sent dates that match exactly, the order in which
160 the two messages appear in the mailbox (that is, by sequence number)
161 is used as a tie-breaker to determine the order.
163 3. Additional Commands
165 These commands are extension to the [IMAP] base protocol.
167 The section headings are intended to correspond with where they would
168 be located in the main document if they were part of the base
169 specification.
171 BASE.6.4.SORT. SORT Command
173 Arguments: sort program
174 charset specification
175 searching criteria (one or more)
177 Data: untagged responses: SORT
179 Result: OK - sort completed
180 NO - sort error: can't sort that charset or
181 criteria
182 BAD - command unknown or arguments invalid
184 The SORT command is a variant of SEARCH with sorting semantics for
185 the results. Sort has two arguments before the searching criteria
186 argument; a parenthesized list of sort criteria, and the searching
187 charset.
189 The charset argument is mandatory (unlike SEARCH) and indicates
190 the [CHARSET] of the strings that appear in the searching
191 criteria. The US-ASCII and UTF-8 charsets MUST be implemented.
192 All other charsets are optional.
194 There is also a UID SORT command which returns unique identifiers
195 instead of message sequence numbers. Note that there are separate
196 searching criteria for message sequence numbers and UIDs; thus the
197 arguments to UID SORT are interpreted the same as in SORT. This
198 is analogous to the behavior of UID SEARCH, as opposed to UID
199 COPY, UID FETCH, or UID STORE.
201 The SORT command first searches the mailbox for messages that
202 match the given searching criteria using the charset argument for
203 the interpretation of strings in the searching criteria. It then
204 returns the matching messages in an untagged SORT response, sorted
205 according to one or more sort criteria.
207 Sorting is in ascending order. Earlier dates sort before later
208 dates; smaller sizes sort before larger sizes; and strings are
209 sorted according to ascending values established by their
210 collation algorithm (see under "Internationalization
211 Considerations").
213 If two or more messages exactly match according to the sorting
214 criteria, these messages are sorted according to the order in
215 which they appear in the mailbox. In other words, there is an
216 implicit sort criterion of "sequence number".
218 When multiple sort criteria are specified, the result is sorted in
219 the priority order that the criteria appear. For example,
220 (SUBJECT DATE) will sort messages in order by their base subject
221 text; and for messages with the same base subject text will sort
222 by their sent date.
224 Untagged EXPUNGE responses are not permitted while the server is
225 responding to a SORT command, but are permitted during a UID SORT
226 command.
228 The defined sort criteria are as follows. Refer to the Formal
229 Syntax section for the precise syntactic definitions of the
230 arguments. If the associated RFC-822 header for a particular
231 criterion is absent, it is treated as the empty string. The empty
232 string always collates before non-empty strings.
234 ARRIVAL
235 Internal date and time of the message. This differs from the
236 ON criteria in SEARCH, which uses just the internal date.
238 CC
239 [IMAP] addr-mailbox of the first "cc" address.
241 DATE
242 Sent date and time, as described in section 2.2.
244 FROM
245 [IMAP] addr-mailbox of the first "From" address.
247 REVERSE
248 Followed by another sort criterion, has the effect of that
249 criterion but in reverse (descending) order.
250 Note: REVERSE only reverses a single criterion, and does not
251 affect the implicit "sequence number" sort criterion if all
252 other criteria are identicial. Consequently, a sort of
253 REVERSE SUBJECT is not the same as a reverse ordering of a
254 SUBJECT sort. This can be avoided by use of additional
255 criteria, e.g. SUBJECT DATE vs. REVERSE SUBJECT REVERSE
256 DATE. In general, however, it's better (and faster, if the
257 client has a "reverse current ordering" command) to reverse
258 the results in the client instead of issuing a new SORT.
260 SIZE
261 Size of the message in octets.
263 SUBJECT
264 Base subject text.
266 TO
267 [IMAP] addr-mailbox of the first "To" address.
269 Example: C: A282 SORT (SUBJECT) UTF-8 SINCE 1-Feb-1994
270 S: * SORT 2 84 882
271 S: A282 OK SORT completed
272 C: A283 SORT (SUBJECT REVERSE DATE) UTF-8 ALL
273 S: * SORT 5 3 4 1 2
274 S: A283 OK SORT completed
275 C: A284 SORT (SUBJECT) US-ASCII TEXT "not in mailbox"
276 S: * SORT
277 S: A284 OK SORT completed
279 BASE.6.4.THREAD. THREAD Command
281 Arguments: threading algorithm
282 charset specification
283 searching criteria (one or more)
285 Data: untagged responses: THREAD
287 Result: OK - thread completed
288 NO - thread error: can't thread that charset or
289 criteria
290 BAD - command unknown or arguments invalid
292 The THREAD command is a variant of SEARCH with threading semantics
293 for the results. Thread has two arguments before the searching
294 criteria argument; a threading algorithm, and the searching
295 charset.
297 The charset argument is mandatory (unlike SEARCH) and indicates
298 the [CHARSET] of the strings that appear in the searching
299 criteria. The US-ASCII and UTF-8 charsets MUST be implemented.
300 All other charsets are optional.
302 There is also a UID THREAD command which returns unique
303 identifiers instead of message sequence numbers. Note that there
304 are separate searching criteria for message sequence numbers and
305 UIDs; thus the arguments to UID THREAD are interpreted the same as
306 in THREAD. This is analogous to the behavior of UID SEARCH, as
307 opposed to UID COPY, UID FETCH, or UID STORE.
309 The THREAD command first searches the mailbox for messages that
310 match the given searching criteria using the charset argument for
311 the interpretation of strings in the searching criteria. It then
312 returns the matching messages in an untagged THREAD response,
313 threaded according to the specified threading algorithm.
315 All collation is in ascending order. Earlier dates collate before
316 later dates and strings are collated according to ascending values
317 established by their collation algorithm (see under
318 "Internationalization Considerations").
320 Untagged EXPUNGE responses are not permitted while the server is
321 responding to a THREAD command, but are permitted during a UID
322 THREAD command.
324 The defined threading algorithms are as follows:
326 ORDEREDSUBJECT
328 The ORDEREDSUBJECT threading algorithm is also referred to as
329 "poor man's threading." The searched messages are sorted by
330 base subject and then by the sent date. The messages are then
331 split into separate threads, with each thread containing
332 messages with the same base subject text. Finally, the threads
333 are sorted by the sent date of the first message in the thread.
335 The first message of each thread are siblings of each other
336 (the "root"). The second message of a thread is the child of
337 the first message, and subsequent messages of the thread are
338 siblings of the second message and hence children of the
339 message at the root. Hence, there are no grandchildren in
340 ORDEREDSUBJECT threading.
342 Children in ORDEREDSUBJECT threading do not have descendents.
343 Client implementations SHOULD treat descendents of a child in
344 a server response as being siblings of that child.
346 REFERENCES
348 The REFERENCES threading algorithm threads the searched
349 messages by grouping them together in parent/child
350 relationships based on which messages are replies to others.
351 The parent/child relationships are built using two methods:
352 reconstructing a message's ancestry using the references
353 contained within it; and checking the original (not base)
354 subject of a message to see if it is a reply to (or forward of)
355 another message.
357 Note: "Message ID" in the following description refers to a
358 normalized form of the msg-id in [RFC-2822]. The actual
359 text in an RFC 2822 may use quoting, resulting in multiple
360 ways of expressing the same Message ID. Implementations of
361 the REFERENCES threading algorithm MUST normalize any msg-id
362 in order to avoid false non-matches due to differences in
363 quoting.
365 For example, the msg-id
366 <"01KF8JCEOCBS0045PS"@xxx.yyy.com>
367 and the msg-id
368 <01KF8JCEOCBS0045PS@xxx.yyy.com>
369 MUST be interpreted as being the same Message ID.
371 The references used for reconstructing a message's ancestry are
372 found using the following rules:
374 If a message contains a References header line, then use the
375 Message IDs in the References header line as the references.
377 If a message does not contain a References header line, or
378 the References header line does not contain any valid
379 Message IDs, then use the first (if any) valid Message ID
380 found in the In-Reply-To header line as the only reference
381 (parent) for this message.
383 Note: Although [RFC-2822] permits multiple Message IDs in
384 the In-Reply-To header, in actual practice this
385 discipline has not been followed. For example,
386 In-Reply-To headers have been observed with message
387 addresses after the Message ID, and there are no good
388 heuristics for software to determine the difference.
389 This is not a problem with the References header however.
391 If a message does not contain an In-Reply-To header line, or
392 the In-Reply-To header line does not contain a valid Message
393 ID, then the message does not have any references (NIL).
395 A message is considered to be a reply or forward if the base
396 subject extraction rules, applied to the original subject,
397 remove any of the following: a subj-refwd, a "(fwd)"
398 subj-trailer, or a subj-fwd-hdr and subj-fwd-trl.
400 The REFERENCES algorithm is significantly more complex than
401 ORDEREDSUBJECT and consists of six main steps. These steps are
402 outlined in detail below.
404 (1) For each searched message:
406 (A) Using the Message IDs in the message's references, link
407 the corresponding messages (those whose Message-ID header
408 line contains the given reference Message ID) together as
409 parent/child. Make the first reference the parent of the
410 second (and the second a child of the first), the second the
411 parent of the third (and the third a child of the second),
412 etc. The following rules govern the creation of these
413 links:
415 If a message does not contain a Message-ID header line,
416 or the Message-ID header line does not contain a valid
417 Message ID, then assign a unique Message ID to this
418 message.
420 If two or more messages have the same Message ID, then
421 only use that Message ID in the first (lowest sequence
422 number) message, and assign a unique Message ID to each
423 of the subsequent messages with a duplicate of that
424 Message ID.
426 If no message can be found with a given Message ID,
427 create a dummy message with this ID. Use this dummy
428 message for all subsequent references to this ID.
430 If a message already has a parent, don't change the
431 existing link. This is done because the References
432 header line may have been truncated by a MUA. As a
433 result, there is no guarantee that the messages
434 corresponding to adjacent Message IDs in the References
435 header line are parent and child.
437 Do not create a parent/child link if creating that link
438 would introduce a loop. For example, before making
439 message A the parent of B, make sure that A is not a
440 descendent of B.
442 Note: Message ID comparisons are case-sensitive.
444 (B) Create a parent/child link between the last reference
445 (or NIL if there are no references) and the current message.
446 If the current message already has a parent, it is probably
447 the result of a truncated References header line, so break
448 the current parent/child link before creating the new
449 correct one. As in step 1.A, do not create the parent/child
450 link if creating that link would introduce a loop. Note
451 that if this message has no references, that it will now
452 have no parent.
454 Note: The parent/child links created in steps 1.A and 1.B
455 MUST be kept consistent with one another at ALL times.
457 (2) Gather together all of the messages that have no parents
458 and make them all children (siblings of one another) of a dummy
459 parent (the "root"). These messages constitute the first
460 (head) message of the threads created thus far.
462 (3) Prune dummy messages from the thread tree. Traverse each
463 thread under the root, and for each message:
465 If it is a dummy message with NO children, delete it.
467 If it is a dummy message with children, delete it, but
468 promote its children to the current level. In other words,
469 splice them in with the dummy's siblings.
471 Do not promote the children if doing so would make them
472 children of the root, unless there is only one child.
474 (4) Sort the messages under the root (top-level siblings only)
475 by sent date as described in section 2.2. In the case of a
476 dummy message, sort its children by sent date and then use the
477 first child for the top-level sort.
479 (5) Gather together messages under the root that have the same
480 base subject text.
482 (A) Create a table for associating base subjects with
483 messages, called the subject table.
485 (B) Populate the subject table with one message per each
486 base subject. For each child of the root:
488 (i) Find the subject of this thread, by using the base
489 subject from either the current message or its first
490 child if the current message is a dummy. This is the
491 thread subject.
493 (ii) If the thread subject is empty, skip this message.
495 (iii) Look up the message associated with the thread
496 subject in the subject table.
498 (iv) If there is no message in the subject table with the
499 thread subject, add the current message and the thread
500 subject to the subject table.
502 Otherwise, if the message in the subject table is not a
503 dummy, AND either of the following criteria are true:
505 The current message is a dummy, OR
507 The message in the subject table is a reply or forward
508 and the current message is not.
510 then replace the message in the subject table with the
511 current message.
513 (C) Merge threads with the same thread subject. For each
514 child of the root:
516 (i) Find the message's thread subject as in step 5.B.i
517 above.
519 (ii) If the thread subject is empty, skip this message.
521 (iii) Lookup the message associated with this thread
522 subject in the subject table.
524 (iv) If the message in the subject table is the current
525 message, skip this message.
527 Otherwise, merge the current message with the one in the
528 subject table using the following rules:
530 If both messages are dummies, append the current
531 message's children to the children of the message in
532 the subject table (the children of both messages
533 become siblings), and then delete the current message.
535 If the message in the subject table is a dummy and the
536 current message is not, make the current message a
537 child of the message in the subject table (a sibling
538 of its children).
540 If the current message is a reply or forward and the
541 message in the subject table is not, make the current
542 message a child of the message in the subject table (a
543 sibling of its children).
545 Otherwise, create a new dummy message and make both
546 the current message and the message in the subject
547 table children of the dummy. Then replace the message
548 in the subject table with the dummy message.
550 Note: Subject comparisons are case-insensitive, as
551 described under "Internationalization
552 Considerations."
554 (6) Traverse the messages under the root and sort each set of
555 siblings by sent date as described in section 2.2. Traverse
556 the messages in such a way that the "youngest" set of siblings
557 are sorted first, and the "oldest" set of siblings are sorted
558 last (grandchildren are sorted before children, etc). In the
559 case of a dummy message (which can only occur with top-level
560 siblings), use its first child for sorting.
562 Example: C: A283 THREAD ORDEREDSUBJECT UTF-8 SINCE 5-MAR-2000
563 S: * THREAD (166)(167)(168)(169)(172)(170)(171)
564 (173)(174 (175)(176)(178)(181)(180))(179)(177
565 (183)(182)(188)(184)(185)(186)(187)(189))(190)
566 (191)(192)(193)(194 195)(196 (197)(198))(199)
567 (200 202)(201)(203)(204)(205)(206 207)(208)
568 S: A283 OK THREAD completed
569 C: A284 THREAD ORDEREDSUBJECT US-ASCII TEXT "gewp"
570 S: * THREAD
571 S: A284 OK THREAD completed
572 C: A285 THREAD REFERENCES UTF-8 SINCE 5-MAR-2000
573 S: * THREAD (166)(167)(168)(169)(172)((170)(179))
574 (171)(173)((174)(175)(176)(178)(181)(180))
575 ((177)(183)(182)(188 (184)(189))(185 186)(187))
576 (190)(191)(192)(193)((194)(195 196))(197 198)
577 (199)(200 202)(201)(203)(204)(205 206 207)(208)
578 S: A285 OK THREAD completed
580 Note: The line breaks in the first and third server
581 responses are for editorial clarity and do not appear in
582 real THREAD responses.
584 4. Additional Responses
586 These responses are extensions to the [IMAP] base protocol.
588 The section headings of these responses are intended to correspond
589 with where they would be located in the main document.
591 BASE.7.2.SORT. SORT Response
593 Data: zero or more numbers
595 The SORT response occurs as a result of a SORT or UID SORT
596 command. The number(s) refer to those messages that match the
597 search criteria. For SORT, these are message sequence numbers;
598 for UID SORT, these are unique identifiers. Each number is
599 delimited by a space.
601 Example: S: * SORT 2 3 6
603 BASE.7.2.THREAD. THREAD Response
605 Data: zero or more threads
607 The THREAD response occurs as a result of a THREAD or UID THREAD
608 command. It contains zero or more threads. A thread consists of
609 a parenthesized list of thread members.
611 Thread members consist of zero or more message numbers, delimited
612 by spaces, indicating successive parent and child. This continues
613 until the thread splits into multiple sub-threads, at which point
614 the thread nests into multiple sub-threads with the first member
615 of each subthread being siblings at this level. There is no limit
616 to the nesting of threads.
618 The messages numbers refer to those messages that match the search
619 criteria. For THREAD, these are message sequence numbers; for UID
620 THREAD, these are unique identifiers.
622 Example: S: * THREAD (2)(3 6 (4 23)(44 7 96))
624 The first thread consists only of message 2. The second thread
625 consists of the messages 3 (parent) and 6 (child), after which it
626 splits into two subthreads; the first of which contains messages 4
627 (child of 6, sibling of 44) and 23 (child of 4), and the second of
628 which contains messages 44 (child of 6, sibling of 4), 7 (child of
629 44), and 96 (child of 7). Since some later messages are parents
630 of earlier messages, the messages were probably moved from some
631 other mailbox at different times.
633 -- 2
635 -- 3
636 \-- 6
637 |-- 4
638 | \-- 23
639 |
640 \-- 44
641 \-- 7
642 \-- 96
644 Example: S: * THREAD ((3)(5))
646 In this example, 3 and 5 are siblings of a parent which does not
647 match the search criteria (and/or does not exist in the mailbox);
648 however they are members of the same thread.
650 5. Formal Syntax of SORT and THREAD Commands and Responses
652 The following syntax specification uses the Augmented Backus-Naur
653 Form (ABNF) notation as specified in [ABNF]. It also uses [ABNF]
654 rules defined in [IMAP].
656 sort = ["UID" SP] "SORT" SP sort-criteria SP search-criteria
658 sort-criteria = "(" sort-criterion *(SP sort-criterion) ")"
660 sort-criterion = ["REVERSE" SP] sort-key
662 sort-key = "ARRIVAL" / "CC" / "DATE" / "FROM" / "SIZE" /
663 "SUBJECT" / "TO"
665 thread = ["UID" SP] "THREAD" SP thread-alg SP search-criteria
667 thread-alg = "ORDEREDSUBJECT" / "REFERENCES" / thread-alg-ext
669 thread-alg-ext = atom
670 ; New algorithms MUST be registered with IANA
672 search-criteria = charset 1*(SP search-key)
674 charset = atom / quoted
675 ; CHARSET values MUST be registered with IANA
677 sort-data = "SORT" *(SP nz-number)
679 thread-data = "THREAD" [SP 1*thread-list]
681 thread-list = "(" (thread-members / thread-nested) ")"
683 thread-members = nz-number *(SP nz-number) [SP thread-nested]
685 thread-nested = 2*thread-list
687 The following syntax describes base subject extraction rules (2)-(6):
689 subject = *subj-leader [subj-middle] *subj-trailer
691 subj-refwd = ("re" / ("fw" ["d"])) *WSP [subj-blob] ":"
693 subj-blob = "[" *BLOBCHAR "]" *WSP
695 subj-fwd = subj-fwd-hdr subject subj-fwd-trl
697 subj-fwd-hdr = "[fwd:"
699 subj-fwd-trl = "]"
701 subj-leader = (*subj-blob subj-refwd) / WSP
703 subj-middle = *subj-blob (subj-base / subj-fwd)
704 ; last subj-blob is subj-base if subj-base would
705 ; otherwise be empty
707 subj-trailer = "(fwd)" / WSP
709 subj-base = NONWSP *(*WSP NONWSP)
710 ; can be a subj-blob
712 BLOBCHAR = %x01-5a / %x5c / %x5e-ff
713 ; any CHAR8 except '[' and ']'
715 NONWSP = %x01-08 / %x0a-1f / %x21-ff
716 ; any CHAR8 other than WSP
718 6. Security Considerations
720 The SORT and THREAD extensions do not raise any security
721 considerations that are not present in the base [IMAP] protocol, and
722 these issues are discussed in [IMAP]. Nevertheless, it is important
723 to remember that [IMAP] protocol transactions, including message
724 data, are sent in the clear over the network unless protection from
725 snooping is negotiated, either by the use of STARTTLS, privacy
726 protection is negotiated in the AUTHENTICATE command, or some other
727 protection mechanism.
729 Although not a security consideration, it is important to recognize
730 that sorting by REFERENCES can lead to misleading threading trees.
731 For example, a message with false References: header data will cause
732 a thread to be incorporated into another thread.
734 The process of extracting the base subject may lead to incorrect
735 collation if the extracted data was significant text as opposed to
736 a subject artifact.
738 7. Internationalization Considerations
740 As stated in the introduction, the rules of I18NLEVEL=1 as described
741 in [IMAP-I18N] MUST be followed; that is, the SORT and THREAD
742 extensions MUST collate strings according to the i;unicode-casemap
743 collation described in [UNICASEMAP]. Servers SHOULD also advertise
744 the I18NLEVEL=1 extension. Alternatively, a server MAY implement
745 I18NLEVEL=2 (or higher) and comply with the rules of that level.
747 As discussed in [IMAP-I18N] section 4.5, all server implementations
748 should eventually be updated to support the [IMAP-I18N] I18NLEVEL=2
749 extension.
751 Translations of the "re" or "fw"/"fwd" tokens are not specified for
752 removal in the base subject extraction process. An attempt to add
753 such translated tokens would result in a geometrically complex, and
754 ultimately unimplementable, task.
756 Instead, note that [RFC-2822] section 3.6.5 recommends that "re:"
757 (from the Latin "res", in the matter of) be used to identify a reply.
758 Although it is evident that, from the multiple forms of token to
759 identify a forwarded message, there is considerable variation found
760 in the wild, the variations are (still) manageable. Consequently, it
761 is suggested that "re:" and one of the variations of the tokens for
762 forward supported by the base subject extraction rules be adopted for
763 Internet mail messages, since doing so makes it a simple display time
764 task to localize the token language for the user.
766 8. IANA Considerations
768 [IMAP] capabilities are registered by publishing a standards track or
769 IESG approved experimental RFC. This document constitutes
770 registration of the SORT and THREAD capabilities in the [IMAP]
771 capabilities registry.
773 This document creates a new [IMAP] threading algorithms registry,
774 which registers threading algorithms by publishing a standards track
775 or IESG approved experimental RFC. This document constitutes
776 registration of the ORDEREDSUBJECT and REFERENCES algorithms in that
777 registry.
779 9. Normative References
781 The following documents are normative to this document:
783 [ABNF] Crocker, D. and Overell, P. "Augmented BNF
784 for Syntax Specifications: ABNF", RFC 5234
785 January 2008
787 [CHARSET] Freed, N. and Postel, J. "IANA Character Set
788 Registration Procedures", RFC 2978, October
789 2000.
791 [IMAP] Crispin, M. "Internet Message Access Protocol -
792 Version 4rev1", RFC 3501, March 2003.
794 [IMAP-I18N] Newman, C. and Gulbrandsen, A. "Internet
795 Message Access Protocol Internationalization",
796 Work in Progress.
798 [KEYWORDS] Bradner, S. "Key words for use in RFCs to
799 Indicate Requirement Levels", BCP 14, RFC 2119,
800 March 1997.
802 [RFC-2822] Resnick, P. "Internet Message Format", RFC
803 2822, April 2001.
805 [UNICASEMAP] Crispin, M. "i;unicode-casemap - Simple Unicode
806 Collation Algorithm", RFC 5051.
808 10. Informative References
810 The following documents are informative to this document:
812 [IMAP-MODELS] Crispin, M. "Distributed Electronic Mail Models
813 in IMAP4", RFC 1733, December 1994.
815 [THREADING] Zawinski, J. "Message Threading",
816 http://www.jwz.org/doc/threading.html,
817 1997-2002.
819 Appendices
821 Author's Address
823 Mark R. Crispin
824 Networks and Distributed Computing
825 University of Washington
826 4545 15th Avenue NE
827 Seattle, WA 98105-4527
829 Phone: +1 (206) 543-5762
831 EMail: MRC@CAC.Washington.EDU
833 Kenneth Murchison
834 Carnegie Mellon University
835 5000 Forbes Avenue
836 Cyert Hall 285
837 Pittsburgh, PA 15213
839 Phone: +1 (412) 268-2638
840 Email: murch@andrew.cmu.edu
842 Full Copyright Statement
844 Copyright (C) The IETF Trust (2008).
846 This document is subject to the rights, licenses and restrictions
847 contained in BCP 78, and except as set forth therein, the authors
848 retain all their rights.
850 This document and the information contained herein are provided on an
851 "AS IS" basis and THE CONTRIBUTOR, THE ORGANIZATION HE/SHE REPRESENTS
852 OR IS SPONSORED BY (IF ANY), THE INTERNET SOCIETY, THE IETF TRUST AND
853 THE INTERNET ENGINEERING TASK FORCE DISCLAIM ALL WARRANTIES, EXPRESS
854 OR IMPLIED, INCLUDING BUT NOT LIMITED TO ANY WARRANTY THAT THE USE OF
855 THE INFORMATION HEREIN WILL NOT INFRINGE ANY RIGHTS OR ANY IMPLIED
856 WARRANTIES OF MERCHANTABILITY OR FITNESS FOR A PARTICULAR PURPOSE.
858 Intellectual Property
860 The IETF takes no position regarding the validity or scope of any
861 Intellectual Property Rights or other rights that might be claimed to
862 pertain to the implementation or use of the technology described in
863 this document or the extent to which any license under such rights
864 might or might not be available; nor does it represent that it has
865 made any independent effort to identify any such rights. Information
866 on the procedures with respect to rights in RFC documents can be
867 found in BCP 78 and BCP 79.
869 Copies of IPR disclosures made to the IETF Secretariat and any
870 assurances of licenses to be made available, or the result of an
871 attempt made to obtain a general license or permission for the use of
872 such proprietary rights by implementers or users of this
873 specification can be obtained from the IETF on-line IPR repository at
874 http://www.ietf.org/ipr.
876 The IETF invites any interested party to bring to its attention any
877 copyrights, patents or patent applications, or other proprietary
878 rights that may cover technology that may be required to implement
879 this standard. Please address the information to the IETF at ietf-
880 ipr@ietf.org.
882 Acknowledgement
884 Funding for the RFC Editor function is currently provided by the
885 Internet Society.

UW-IMAP'd extensions by yuuji