imapext-2007: ada5e610ab86 docs/draft/sort.txt

imapext-2007

view docs/draft/sort.txt @ 0:ada5e610ab86

imap-2007e

author	yuuji@gentei.org
date	Mon, 14 Sep 2009 15:17:45 +0900
parents
children

line source

1 IMAP Extensions Working Group M. Crispin

2 Internet-Draft K. Murchison

3 Intended status: Proposed Standard March 10, 2008

4 Expires: September 10, 2008

5 Document: internet-drafts/draft-ietf-imapext-sort-20.txt

7 INTERNET MESSAGE ACCESS PROTOCOL - SORT AND THREAD EXTENSIONS

9 Status of this Memo

11 By submitting this Internet-Draft, each author represents that

12 any applicable patent or other IPR claims of which he or she is

13 aware have been or will be disclosed, and any of which he or she

14 becomes aware will be disclosed, in accordance with Section 6 of

15 BCP 79.

17 Internet-Drafts are working documents of the Internet Engineering

18 Task Force (IETF), its areas, and its working groups. Note that

19 other groups may also distribute working documents as

20 Internet-Drafts.

22 Internet-Drafts are draft documents valid for a maximum of six months

23 and may be updated, replaced, or obsoleted by other documents at any

24 time. It is inappropriate to use Internet-Drafts as reference

25 material or to cite them other than as "work in progress."

27 The list of current Internet-Drafts can be accessed at

28 http://www.ietf.org/ietf/1id-abstracts.txt

30 The list of Internet-Draft Shadow Directories can be accessed at

31 http://www.ietf.org/shadow.html.

33 A revised version of this draft document will be submitted to the RFC

34 editor as a Proposed Standard for the Internet Community. Discussion

35 and suggestions for improvement are requested, and should be sent to

36 ietf-imapext@IMC.ORG.

38 Distribution of this memo is unlimited.

40 Abstract

42 This document describes the base-level server-based sorting and

43 threading extensions to the [IMAP] protocol. These extensions

44 provide substantial performance improvements for IMAP clients which

45 offer sorted and threaded views.

47 1. Introduction

49 The SORT and THREAD extensions to the [IMAP] protocol provide a means

50 of server-based sorting and threading of messages, without requiring

51 that the client download the necessary data to do so itself. This is

52 particularly useful for online clients as described in [IMAP-MODELS].

54 A server which supports the base-level SORT extension indicates this

55 with a capability name which starts with "SORT". Future,

56 upwards-compatible extensions to the SORT extension will all start

57 with "SORT", indicating support for this base level.

59 A server which supports the THREAD extension indicates this with one

60 or more capability names consisting of "THREAD=" followed by a

61 supported threading algorithm name as described in this document.

62 This provides for future upwards-compatible extensions.

64 A server which implements the SORT and/or THREAD extensions MUST

65 collate strings in accordance with the requirements of I18NLEVEL=1,

66 as described in [IMAP-I18N], and SHOULD implement and advertise the

67 I18NLEVEL=1 extension. Alternatively, a server MAY implement

68 I18NLEVEL=2 (or higher) and comply with the rules of that level.

70 Discussion: the SORT and THREAD extensions predate [IMAP-I18N] by

71 several years. At the time of this writing, all known server

72 implementations of SORT and THREAD comply with the rules of

73 I18NLEVEL=1, but do not necessarily advertise it. As discussed

74 in [IMAP-I18N] section 4.5, all server implementations should

75 eventually be updated to comply with the I18NLEVEL=2 extension.

77 Historical note: the REFERENCES threading algorithm is based on the

78 [THREADING] algorithm written used in "Netscape Mail and News"

79 versions 2.0 through 3.0.

81 2. Terminology

83 The key words "MUST", "MUST NOT", "REQUIRED", "SHALL", "SHALL NOT",

84 "SHOULD", "SHOULD NOT", "RECOMMENDED", "MAY", and "OPTIONAL" in this

85 document are to be interpreted as described in [KEYWORDS].

87 The word "can" (not "may") is used to refer to a possible

88 circumstance or situation, as opposed to an optional facility of the

89 protocol.

91 "User" is used to refer to a human user, whereas "client" refers to

92 the software being run by the user.

94 In examples, "C:" and "S:" indicate lines sent by the client and

95 server respectively.

97 2.1 Base Subject

99 Subject sorting and threading use the "base subject," which has

100 specific subject artifacts removed. Due to the complexity of these

101 artifacts, the formal syntax for the subject extraction rules is

102 ambiguous. The following procedure is followed to determine the

103 "base subject", using the [ABNF] formal syntax rules described in

104 section 5:

105

106 (1) Convert any RFC 2047 encoded-words in the subject to

107 UTF-8 as described in "internationalization

108 considerations." Convert all tabs and continuations to

109 space. Convert all multiple spaces to a single space.

110

111 (2) Remove all trailing text of the subject that matches

112 the subj-trailer ABNF, repeat until no more matches are

113 possible.

114

115 (3) Remove all prefix text of the subject that matches the

116 subj-leader ABNF.

117

118 (4) If there is prefix text of the subject that matches the

119 subj-blob ABNF, and removing that prefix leaves a non-empty

120 subj-base, then remove the prefix text.

121

122 (5) Repeat (3) and (4) until no matches remain.

123

124 Note: it is possible to defer step (2) until step (6), but this

125 requires checking for subj-trailer in step (4).

126

127 (6) If the resulting text begins with the subj-fwd-hdr ABNF

128 and ends with the subj-fwd-trl ABNF, remove the

129 subj-fwd-hdr and subj-fwd-trl and repeat from step (2).

130

131 (7) The resulting text is the "base subject" used in the

132 SORT.

133

134 All servers and disconnected (as described in [IMAP-MODELS]) clients

135 MUST use exactly this algorithm to determine the "base subject".

136 Otherwise there is potential for a user to get inconsistent results

137 based on whether they are running in connected or disconnected mode.

138

139 2.2 Sent Date

140

141 As used in this document, the term "sent date" refers to the date and

142 time from the Date: header, adjusted by time zone to normalize to

143 UTC. For example, "31 Dec 2000 16:01:33 -0800" is equivalent to the

144 UTC date and time of "1 Jan 2001 00:01:33 +0000".

145

146 If the time zone is invalid, the date and time SHOULD be treated as

147 UTC. If the time is also invalid, the time SHOULD be treated as

148 00:00:00. If there is no valid date or time, the date and time

149 SHOULD be treated as 00:00:00 on the earliest possible date.

150

151 This differs from the date-related criteria in the SEARCH command

152 (described in [IMAP] section 6.4.4), which use just the date and not

153 the time, and are not adjusted by time zone.

154

155 If the sent date can not be determined (a Date: header is missing or

156 can not be parsed), the INTERNALDATE for that message is used as the

157 sent date.

158

159 When comparing two sent dates that match exactly, the order in which

160 the two messages appear in the mailbox (that is, by sequence number)

161 is used as a tie-breaker to determine the order.

162

163 3. Additional Commands

164

165 These commands are extension to the [IMAP] base protocol.

166

167 The section headings are intended to correspond with where they would

168 be located in the main document if they were part of the base

169 specification.

170

171 BASE.6.4.SORT. SORT Command

172

173 Arguments: sort program

174 charset specification

175 searching criteria (one or more)

176

177 Data: untagged responses: SORT

178

179 Result: OK - sort completed

180 NO - sort error: can't sort that charset or

181 criteria

182 BAD - command unknown or arguments invalid

183

184 The SORT command is a variant of SEARCH with sorting semantics for

185 the results. Sort has two arguments before the searching criteria

186 argument; a parenthesized list of sort criteria, and the searching

187 charset.

188

189 The charset argument is mandatory (unlike SEARCH) and indicates

190 the [CHARSET] of the strings that appear in the searching

191 criteria. The US-ASCII and UTF-8 charsets MUST be implemented.

192 All other charsets are optional.

193

194 There is also a UID SORT command which returns unique identifiers

195 instead of message sequence numbers. Note that there are separate

196 searching criteria for message sequence numbers and UIDs; thus the

197 arguments to UID SORT are interpreted the same as in SORT. This

198 is analogous to the behavior of UID SEARCH, as opposed to UID

199 COPY, UID FETCH, or UID STORE.

200

201 The SORT command first searches the mailbox for messages that

202 match the given searching criteria using the charset argument for

203 the interpretation of strings in the searching criteria. It then

204 returns the matching messages in an untagged SORT response, sorted

205 according to one or more sort criteria.

206

207 Sorting is in ascending order. Earlier dates sort before later

208 dates; smaller sizes sort before larger sizes; and strings are

209 sorted according to ascending values established by their

210 collation algorithm (see under "Internationalization

211 Considerations").

212

213 If two or more messages exactly match according to the sorting

214 criteria, these messages are sorted according to the order in

215 which they appear in the mailbox. In other words, there is an

216 implicit sort criterion of "sequence number".

217

218 When multiple sort criteria are specified, the result is sorted in

219 the priority order that the criteria appear. For example,

220 (SUBJECT DATE) will sort messages in order by their base subject

221 text; and for messages with the same base subject text will sort

222 by their sent date.

223

224 Untagged EXPUNGE responses are not permitted while the server is

225 responding to a SORT command, but are permitted during a UID SORT

226 command.

227

228 The defined sort criteria are as follows. Refer to the Formal

229 Syntax section for the precise syntactic definitions of the

230 arguments. If the associated RFC-822 header for a particular

231 criterion is absent, it is treated as the empty string. The empty

232 string always collates before non-empty strings.

233

234 ARRIVAL

235 Internal date and time of the message. This differs from the

236 ON criteria in SEARCH, which uses just the internal date.

237

238 CC

239 [IMAP] addr-mailbox of the first "cc" address.

240

241 DATE

242 Sent date and time, as described in section 2.2.

243

244 FROM

245 [IMAP] addr-mailbox of the first "From" address.

246

247 REVERSE

248 Followed by another sort criterion, has the effect of that

249 criterion but in reverse (descending) order.

250 Note: REVERSE only reverses a single criterion, and does not

251 affect the implicit "sequence number" sort criterion if all

252 other criteria are identicial. Consequently, a sort of

253 REVERSE SUBJECT is not the same as a reverse ordering of a

254 SUBJECT sort. This can be avoided by use of additional

255 criteria, e.g. SUBJECT DATE vs. REVERSE SUBJECT REVERSE

256 DATE. In general, however, it's better (and faster, if the

257 client has a "reverse current ordering" command) to reverse

258 the results in the client instead of issuing a new SORT.

259

260 SIZE

261 Size of the message in octets.

262

263 SUBJECT

264 Base subject text.

265

266 TO

267 [IMAP] addr-mailbox of the first "To" address.

268

269 Example: C: A282 SORT (SUBJECT) UTF-8 SINCE 1-Feb-1994

270 S: * SORT 2 84 882

271 S: A282 OK SORT completed

272 C: A283 SORT (SUBJECT REVERSE DATE) UTF-8 ALL

273 S: * SORT 5 3 4 1 2

274 S: A283 OK SORT completed

275 C: A284 SORT (SUBJECT) US-ASCII TEXT "not in mailbox"

276 S: * SORT

277 S: A284 OK SORT completed

278

279 BASE.6.4.THREAD. THREAD Command

280

281 Arguments: threading algorithm

282 charset specification

283 searching criteria (one or more)

284

285 Data: untagged responses: THREAD

286

287 Result: OK - thread completed

288 NO - thread error: can't thread that charset or

289 criteria

290 BAD - command unknown or arguments invalid

291

292 The THREAD command is a variant of SEARCH with threading semantics

293 for the results. Thread has two arguments before the searching

294 criteria argument; a threading algorithm, and the searching

295 charset.

296

297 The charset argument is mandatory (unlike SEARCH) and indicates

298 the [CHARSET] of the strings that appear in the searching

299 criteria. The US-ASCII and UTF-8 charsets MUST be implemented.

300 All other charsets are optional.

301

302 There is also a UID THREAD command which returns unique

303 identifiers instead of message sequence numbers. Note that there

304 are separate searching criteria for message sequence numbers and

305 UIDs; thus the arguments to UID THREAD are interpreted the same as

306 in THREAD. This is analogous to the behavior of UID SEARCH, as

307 opposed to UID COPY, UID FETCH, or UID STORE.

308

309 The THREAD command first searches the mailbox for messages that

310 match the given searching criteria using the charset argument for

311 the interpretation of strings in the searching criteria. It then

312 returns the matching messages in an untagged THREAD response,

313 threaded according to the specified threading algorithm.

314

315 All collation is in ascending order. Earlier dates collate before

316 later dates and strings are collated according to ascending values

317 established by their collation algorithm (see under

318 "Internationalization Considerations").

319

320 Untagged EXPUNGE responses are not permitted while the server is

321 responding to a THREAD command, but are permitted during a UID

322 THREAD command.

323

324 The defined threading algorithms are as follows:

325

326 ORDEREDSUBJECT

327

328 The ORDEREDSUBJECT threading algorithm is also referred to as

329 "poor man's threading." The searched messages are sorted by

330 base subject and then by the sent date. The messages are then

331 split into separate threads, with each thread containing

332 messages with the same base subject text. Finally, the threads

333 are sorted by the sent date of the first message in the thread.

334

335 The first message of each thread are siblings of each other

336 (the "root"). The second message of a thread is the child of

337 the first message, and subsequent messages of the thread are

338 siblings of the second message and hence children of the

339 message at the root. Hence, there are no grandchildren in

340 ORDEREDSUBJECT threading.

341

342 Children in ORDEREDSUBJECT threading do not have descendents.

343 Client implementations SHOULD treat descendents of a child in

344 a server response as being siblings of that child.

345

346 REFERENCES

347

348 The REFERENCES threading algorithm threads the searched

349 messages by grouping them together in parent/child

350 relationships based on which messages are replies to others.

351 The parent/child relationships are built using two methods:

352 reconstructing a message's ancestry using the references

353 contained within it; and checking the original (not base)

354 subject of a message to see if it is a reply to (or forward of)

355 another message.

356

357 Note: "Message ID" in the following description refers to a

358 normalized form of the msg-id in [RFC-2822]. The actual

359 text in an RFC 2822 may use quoting, resulting in multiple

360 ways of expressing the same Message ID. Implementations of

361 the REFERENCES threading algorithm MUST normalize any msg-id

362 in order to avoid false non-matches due to differences in

363 quoting.

364

365 For example, the msg-id

366 <"01KF8JCEOCBS0045PS"@xxx.yyy.com>

367 and the msg-id

368 <01KF8JCEOCBS0045PS@xxx.yyy.com>

369 MUST be interpreted as being the same Message ID.

370

371 The references used for reconstructing a message's ancestry are

372 found using the following rules:

373

374 If a message contains a References header line, then use the

375 Message IDs in the References header line as the references.

376

377 If a message does not contain a References header line, or

378 the References header line does not contain any valid

379 Message IDs, then use the first (if any) valid Message ID

380 found in the In-Reply-To header line as the only reference

381 (parent) for this message.

382

383 Note: Although [RFC-2822] permits multiple Message IDs in

384 the In-Reply-To header, in actual practice this

385 discipline has not been followed. For example,

386 In-Reply-To headers have been observed with message

387 addresses after the Message ID, and there are no good

388 heuristics for software to determine the difference.

389 This is not a problem with the References header however.

390

391 If a message does not contain an In-Reply-To header line, or

392 the In-Reply-To header line does not contain a valid Message

393 ID, then the message does not have any references (NIL).

394

395 A message is considered to be a reply or forward if the base

396 subject extraction rules, applied to the original subject,

397 remove any of the following: a subj-refwd, a "(fwd)"

398 subj-trailer, or a subj-fwd-hdr and subj-fwd-trl.

399

400 The REFERENCES algorithm is significantly more complex than

401 ORDEREDSUBJECT and consists of six main steps. These steps are

402 outlined in detail below.

403

404 (1) For each searched message:

405

406 (A) Using the Message IDs in the message's references, link

407 the corresponding messages (those whose Message-ID header

408 line contains the given reference Message ID) together as

409 parent/child. Make the first reference the parent of the

410 second (and the second a child of the first), the second the

411 parent of the third (and the third a child of the second),

412 etc. The following rules govern the creation of these

413 links:

414

415 If a message does not contain a Message-ID header line,

416 or the Message-ID header line does not contain a valid

417 Message ID, then assign a unique Message ID to this

418 message.

419

420 If two or more messages have the same Message ID, then

421 only use that Message ID in the first (lowest sequence

422 number) message, and assign a unique Message ID to each

423 of the subsequent messages with a duplicate of that

424 Message ID.

425

426 If no message can be found with a given Message ID,

427 create a dummy message with this ID. Use this dummy

428 message for all subsequent references to this ID.

429

430 If a message already has a parent, don't change the

431 existing link. This is done because the References

432 header line may have been truncated by a MUA. As a

433 result, there is no guarantee that the messages

434 corresponding to adjacent Message IDs in the References

435 header line are parent and child.

436

437 Do not create a parent/child link if creating that link

438 would introduce a loop. For example, before making

439 message A the parent of B, make sure that A is not a

440 descendent of B.

441

442 Note: Message ID comparisons are case-sensitive.

443

444 (B) Create a parent/child link between the last reference

445 (or NIL if there are no references) and the current message.

446 If the current message already has a parent, it is probably

447 the result of a truncated References header line, so break

448 the current parent/child link before creating the new

449 correct one. As in step 1.A, do not create the parent/child

450 link if creating that link would introduce a loop. Note

451 that if this message has no references, that it will now

452 have no parent.

453

454 Note: The parent/child links created in steps 1.A and 1.B

455 MUST be kept consistent with one another at ALL times.

456

457 (2) Gather together all of the messages that have no parents

458 and make them all children (siblings of one another) of a dummy

459 parent (the "root"). These messages constitute the first

460 (head) message of the threads created thus far.

461

462 (3) Prune dummy messages from the thread tree. Traverse each

463 thread under the root, and for each message:

464

465 If it is a dummy message with NO children, delete it.

466

467 If it is a dummy message with children, delete it, but

468 promote its children to the current level. In other words,

469 splice them in with the dummy's siblings.

470

471 Do not promote the children if doing so would make them

472 children of the root, unless there is only one child.

473

474 (4) Sort the messages under the root (top-level siblings only)

475 by sent date as described in section 2.2. In the case of a

476 dummy message, sort its children by sent date and then use the

477 first child for the top-level sort.

478

479 (5) Gather together messages under the root that have the same

480 base subject text.

481

482 (A) Create a table for associating base subjects with

483 messages, called the subject table.

484

485 (B) Populate the subject table with one message per each

486 base subject. For each child of the root:

487

488 (i) Find the subject of this thread, by using the base

489 subject from either the current message or its first

490 child if the current message is a dummy. This is the

491 thread subject.

492

493 (ii) If the thread subject is empty, skip this message.

494

495 (iii) Look up the message associated with the thread

496 subject in the subject table.

497

498 (iv) If there is no message in the subject table with the

499 thread subject, add the current message and the thread

500 subject to the subject table.

501

502 Otherwise, if the message in the subject table is not a

503 dummy, AND either of the following criteria are true:

504

505 The current message is a dummy, OR

506

507 The message in the subject table is a reply or forward

508 and the current message is not.

509

510 then replace the message in the subject table with the

511 current message.

512

513 (C) Merge threads with the same thread subject. For each

514 child of the root:

515

516 (i) Find the message's thread subject as in step 5.B.i

517 above.

518

519 (ii) If the thread subject is empty, skip this message.

520

521 (iii) Lookup the message associated with this thread

522 subject in the subject table.

523

524 (iv) If the message in the subject table is the current

525 message, skip this message.

526

527 Otherwise, merge the current message with the one in the

528 subject table using the following rules:

529

530 If both messages are dummies, append the current

531 message's children to the children of the message in

532 the subject table (the children of both messages

533 become siblings), and then delete the current message.

534

535 If the message in the subject table is a dummy and the

536 current message is not, make the current message a

537 child of the message in the subject table (a sibling

538 of its children).

539

540 If the current message is a reply or forward and the

541 message in the subject table is not, make the current

542 message a child of the message in the subject table (a

543 sibling of its children).

544

545 Otherwise, create a new dummy message and make both

546 the current message and the message in the subject

547 table children of the dummy. Then replace the message

548 in the subject table with the dummy message.

549

550 Note: Subject comparisons are case-insensitive, as

551 described under "Internationalization

552 Considerations."

553

554 (6) Traverse the messages under the root and sort each set of

555 siblings by sent date as described in section 2.2. Traverse

556 the messages in such a way that the "youngest" set of siblings

557 are sorted first, and the "oldest" set of siblings are sorted

558 last (grandchildren are sorted before children, etc). In the

559 case of a dummy message (which can only occur with top-level

560 siblings), use its first child for sorting.

561

562 Example: C: A283 THREAD ORDEREDSUBJECT UTF-8 SINCE 5-MAR-2000

563 S: * THREAD (166)(167)(168)(169)(172)(170)(171)

564 (173)(174 (175)(176)(178)(181)(180))(179)(177

565 (183)(182)(188)(184)(185)(186)(187)(189))(190)

566 (191)(192)(193)(194 195)(196 (197)(198))(199)

567 (200 202)(201)(203)(204)(205)(206 207)(208)

568 S: A283 OK THREAD completed

569 C: A284 THREAD ORDEREDSUBJECT US-ASCII TEXT "gewp"

570 S: * THREAD

571 S: A284 OK THREAD completed

572 C: A285 THREAD REFERENCES UTF-8 SINCE 5-MAR-2000

573 S: * THREAD (166)(167)(168)(169)(172)((170)(179))

574 (171)(173)((174)(175)(176)(178)(181)(180))

575 ((177)(183)(182)(188 (184)(189))(185 186)(187))

576 (190)(191)(192)(193)((194)(195 196))(197 198)

577 (199)(200 202)(201)(203)(204)(205 206 207)(208)

578 S: A285 OK THREAD completed

579

580 Note: The line breaks in the first and third server

581 responses are for editorial clarity and do not appear in

582 real THREAD responses.

583

584 4. Additional Responses

585

586 These responses are extensions to the [IMAP] base protocol.

587

588 The section headings of these responses are intended to correspond

589 with where they would be located in the main document.

590

591 BASE.7.2.SORT. SORT Response

592

593 Data: zero or more numbers

594

595 The SORT response occurs as a result of a SORT or UID SORT

596 command. The number(s) refer to those messages that match the

597 search criteria. For SORT, these are message sequence numbers;

598 for UID SORT, these are unique identifiers. Each number is

599 delimited by a space.

600

601 Example: S: * SORT 2 3 6

602

603 BASE.7.2.THREAD. THREAD Response

604

605 Data: zero or more threads

606

607 The THREAD response occurs as a result of a THREAD or UID THREAD

608 command. It contains zero or more threads. A thread consists of

609 a parenthesized list of thread members.

610

611 Thread members consist of zero or more message numbers, delimited

612 by spaces, indicating successive parent and child. This continues

613 until the thread splits into multiple sub-threads, at which point

614 the thread nests into multiple sub-threads with the first member

615 of each subthread being siblings at this level. There is no limit

616 to the nesting of threads.

617

618 The messages numbers refer to those messages that match the search

619 criteria. For THREAD, these are message sequence numbers; for UID

620 THREAD, these are unique identifiers.

621

622 Example: S: * THREAD (2)(3 6 (4 23)(44 7 96))

623

624 The first thread consists only of message 2. The second thread

625 consists of the messages 3 (parent) and 6 (child), after which it

626 splits into two subthreads; the first of which contains messages 4

627 (child of 6, sibling of 44) and 23 (child of 4), and the second of

628 which contains messages 44 (child of 6, sibling of 4), 7 (child of

629 44), and 96 (child of 7). Since some later messages are parents

630 of earlier messages, the messages were probably moved from some

631 other mailbox at different times.

632

633 -- 2

634

635 -- 3

636 \-- 6

637 |-- 4

638 | \-- 23

639 |

640 \-- 44

641 \-- 7

642 \-- 96

643

644 Example: S: * THREAD ((3)(5))

645

646 In this example, 3 and 5 are siblings of a parent which does not

647 match the search criteria (and/or does not exist in the mailbox);

648 however they are members of the same thread.

649

650 5. Formal Syntax of SORT and THREAD Commands and Responses

651

652 The following syntax specification uses the Augmented Backus-Naur

653 Form (ABNF) notation as specified in [ABNF]. It also uses [ABNF]

654 rules defined in [IMAP].

655

656 sort = ["UID" SP] "SORT" SP sort-criteria SP search-criteria

657

658 sort-criteria = "(" sort-criterion *(SP sort-criterion) ")"

659

660 sort-criterion = ["REVERSE" SP] sort-key

661

662 sort-key = "ARRIVAL" / "CC" / "DATE" / "FROM" / "SIZE" /

663 "SUBJECT" / "TO"

664

665 thread = ["UID" SP] "THREAD" SP thread-alg SP search-criteria

666

667 thread-alg = "ORDEREDSUBJECT" / "REFERENCES" / thread-alg-ext

668

669 thread-alg-ext = atom

670 ; New algorithms MUST be registered with IANA

671

672 search-criteria = charset 1*(SP search-key)

673

674 charset = atom / quoted

675 ; CHARSET values MUST be registered with IANA

676

677 sort-data = "SORT" *(SP nz-number)

678

679 thread-data = "THREAD" [SP 1*thread-list]

680

681 thread-list = "(" (thread-members / thread-nested) ")"

682

683 thread-members = nz-number *(SP nz-number) [SP thread-nested]

684

685 thread-nested = 2*thread-list

686

687 The following syntax describes base subject extraction rules (2)-(6):

688

689 subject = *subj-leader [subj-middle] *subj-trailer

690

691 subj-refwd = ("re" / ("fw" ["d"])) *WSP [subj-blob] ":"

692

693 subj-blob = "[" *BLOBCHAR "]" *WSP

694

695 subj-fwd = subj-fwd-hdr subject subj-fwd-trl

696

697 subj-fwd-hdr = "[fwd:"

698

699 subj-fwd-trl = "]"

700

701 subj-leader = (*subj-blob subj-refwd) / WSP

702

703 subj-middle = *subj-blob (subj-base / subj-fwd)

704 ; last subj-blob is subj-base if subj-base would

705 ; otherwise be empty

706

707 subj-trailer = "(fwd)" / WSP

708

709 subj-base = NONWSP *(*WSP NONWSP)

710 ; can be a subj-blob

711

712 BLOBCHAR = %x01-5a / %x5c / %x5e-ff

713 ; any CHAR8 except '[' and ']'

714

715 NONWSP = %x01-08 / %x0a-1f / %x21-ff

716 ; any CHAR8 other than WSP

717

718 6. Security Considerations

719

720 The SORT and THREAD extensions do not raise any security

721 considerations that are not present in the base [IMAP] protocol, and

722 these issues are discussed in [IMAP]. Nevertheless, it is important

723 to remember that [IMAP] protocol transactions, including message

724 data, are sent in the clear over the network unless protection from

725 snooping is negotiated, either by the use of STARTTLS, privacy

726 protection is negotiated in the AUTHENTICATE command, or some other

727 protection mechanism.

728

729 Although not a security consideration, it is important to recognize

730 that sorting by REFERENCES can lead to misleading threading trees.

731 For example, a message with false References: header data will cause

732 a thread to be incorporated into another thread.

733

734 The process of extracting the base subject may lead to incorrect

735 collation if the extracted data was significant text as opposed to

736 a subject artifact.

737

738 7. Internationalization Considerations

739

740 As stated in the introduction, the rules of I18NLEVEL=1 as described

741 in [IMAP-I18N] MUST be followed; that is, the SORT and THREAD

742 extensions MUST collate strings according to the i;unicode-casemap

743 collation described in [UNICASEMAP]. Servers SHOULD also advertise

744 the I18NLEVEL=1 extension. Alternatively, a server MAY implement

745 I18NLEVEL=2 (or higher) and comply with the rules of that level.

746

747 As discussed in [IMAP-I18N] section 4.5, all server implementations

748 should eventually be updated to support the [IMAP-I18N] I18NLEVEL=2

749 extension.

750

751 Translations of the "re" or "fw"/"fwd" tokens are not specified for

752 removal in the base subject extraction process. An attempt to add

753 such translated tokens would result in a geometrically complex, and

754 ultimately unimplementable, task.

755

756 Instead, note that [RFC-2822] section 3.6.5 recommends that "re:"

757 (from the Latin "res", in the matter of) be used to identify a reply.

758 Although it is evident that, from the multiple forms of token to

759 identify a forwarded message, there is considerable variation found

760 in the wild, the variations are (still) manageable. Consequently, it

761 is suggested that "re:" and one of the variations of the tokens for

762 forward supported by the base subject extraction rules be adopted for

763 Internet mail messages, since doing so makes it a simple display time

764 task to localize the token language for the user.

765

766 8. IANA Considerations

767

768 [IMAP] capabilities are registered by publishing a standards track or

769 IESG approved experimental RFC. This document constitutes

770 registration of the SORT and THREAD capabilities in the [IMAP]

771 capabilities registry.

772

773 This document creates a new [IMAP] threading algorithms registry,

774 which registers threading algorithms by publishing a standards track

775 or IESG approved experimental RFC. This document constitutes

776 registration of the ORDEREDSUBJECT and REFERENCES algorithms in that

777 registry.

778

779 9. Normative References

780

781 The following documents are normative to this document:

782

783 [ABNF] Crocker, D. and Overell, P. "Augmented BNF

784 for Syntax Specifications: ABNF", RFC 5234

785 January 2008

786

787 [CHARSET] Freed, N. and Postel, J. "IANA Character Set

788 Registration Procedures", RFC 2978, October

789 2000.

790

791 [IMAP] Crispin, M. "Internet Message Access Protocol -

792 Version 4rev1", RFC 3501, March 2003.

793

794 [IMAP-I18N] Newman, C. and Gulbrandsen, A. "Internet

795 Message Access Protocol Internationalization",

796 Work in Progress.

797

798 [KEYWORDS] Bradner, S. "Key words for use in RFCs to

799 Indicate Requirement Levels", BCP 14, RFC 2119,

800 March 1997.

801

802 [RFC-2822] Resnick, P. "Internet Message Format", RFC

803 2822, April 2001.

804

805 [UNICASEMAP] Crispin, M. "i;unicode-casemap - Simple Unicode

806 Collation Algorithm", RFC 5051.

807

808 10. Informative References

809

810 The following documents are informative to this document:

811

812 [IMAP-MODELS] Crispin, M. "Distributed Electronic Mail Models

813 in IMAP4", RFC 1733, December 1994.

814

815 [THREADING] Zawinski, J. "Message Threading",

816 http://www.jwz.org/doc/threading.html,

817 1997-2002.

818

819 Appendices

820

821 Author's Address

822

823 Mark R. Crispin

824 Networks and Distributed Computing

825 University of Washington

826 4545 15th Avenue NE

827 Seattle, WA 98105-4527

828

829 Phone: +1 (206) 543-5762

830

831 EMail: MRC@CAC.Washington.EDU

832

833 Kenneth Murchison

834 Carnegie Mellon University

835 5000 Forbes Avenue

836 Cyert Hall 285

837 Pittsburgh, PA 15213

838

839 Phone: +1 (412) 268-2638

840 Email: murch@andrew.cmu.edu

841

842 Full Copyright Statement

843

844 Copyright (C) The IETF Trust (2008).

845

846 This document is subject to the rights, licenses and restrictions

847 contained in BCP 78, and except as set forth therein, the authors

848 retain all their rights.

849

850 This document and the information contained herein are provided on an

851 "AS IS" basis and THE CONTRIBUTOR, THE ORGANIZATION HE/SHE REPRESENTS

852 OR IS SPONSORED BY (IF ANY), THE INTERNET SOCIETY, THE IETF TRUST AND

853 THE INTERNET ENGINEERING TASK FORCE DISCLAIM ALL WARRANTIES, EXPRESS

854 OR IMPLIED, INCLUDING BUT NOT LIMITED TO ANY WARRANTY THAT THE USE OF

855 THE INFORMATION HEREIN WILL NOT INFRINGE ANY RIGHTS OR ANY IMPLIED

856 WARRANTIES OF MERCHANTABILITY OR FITNESS FOR A PARTICULAR PURPOSE.

857

858 Intellectual Property

859

860 The IETF takes no position regarding the validity or scope of any

861 Intellectual Property Rights or other rights that might be claimed to

862 pertain to the implementation or use of the technology described in

863 this document or the extent to which any license under such rights

864 might or might not be available; nor does it represent that it has

865 made any independent effort to identify any such rights. Information

866 on the procedures with respect to rights in RFC documents can be

867 found in BCP 78 and BCP 79.

868

869 Copies of IPR disclosures made to the IETF Secretariat and any

870 assurances of licenses to be made available, or the result of an

871 attempt made to obtain a general license or permission for the use of

872 such proprietary rights by implementers or users of this

873 specification can be obtained from the IETF on-line IPR repository at

874 http://www.ietf.org/ipr.

875

876 The IETF invites any interested party to bring to its attention any

877 copyrights, patents or patent applications, or other proprietary

878 rights that may cover technology that may be required to implement

879 this standard. Please address the information to the IETF at ietf-

880 ipr@ietf.org.

881

882 Acknowledgement

883

884 Funding for the RFC Editor function is currently provided by the

885 Internet Society.