imapext-2007
diff docs/rfc/rfc4790.txt @ 0:ada5e610ab86
imap-2007e
author | yuuji@gentei.org |
---|---|
date | Mon, 14 Sep 2009 15:17:45 +0900 |
parents | |
children |
line diff
1.1 --- /dev/null Thu Jan 01 00:00:00 1970 +0000 1.2 +++ b/docs/rfc/rfc4790.txt Mon Sep 14 15:17:45 2009 +0900 1.3 @@ -0,0 +1,1459 @@ 1.4 + 1.5 + 1.6 + 1.7 + 1.8 + 1.9 + 1.10 +Network Working Group C. Newman 1.11 +Request for Comments: 4790 Sun Microsystems 1.12 +Category: Standards Track M. Duerst 1.13 + Aoyama Gakuin University 1.14 + A. Gulbrandsen 1.15 + Oryx 1.16 + March 2007 1.17 + 1.18 + 1.19 + Internet Application Protocol Collation Registry 1.20 + 1.21 +Status of This Memo 1.22 + 1.23 + This document specifies an Internet standards track protocol for the 1.24 + Internet community, and requests discussion and suggestions for 1.25 + improvements. Please refer to the current edition of the "Internet 1.26 + Official Protocol Standards" (STD 1) for the standardization state 1.27 + and status of this protocol. Distribution of this memo is unlimited. 1.28 + 1.29 +Copyright Notice 1.30 + 1.31 + Copyright (C) The IETF Trust (2007). 1.32 + 1.33 +Abstract 1.34 + 1.35 + Many Internet application protocols include string-based lookup, 1.36 + searching, or sorting operations. However, the problem space for 1.37 + searching and sorting international strings is large, not fully 1.38 + explored, and is outside the area of expertise for the Internet 1.39 + Engineering Task Force (IETF). Rather than attempt to solve such a 1.40 + large problem, this specification creates an abstraction framework so 1.41 + that application protocols can precisely identify a comparison 1.42 + function, and the repertoire of comparison functions can be extended 1.43 + in the future. 1.44 + 1.45 + 1.46 + 1.47 + 1.48 + 1.49 + 1.50 + 1.51 + 1.52 + 1.53 + 1.54 + 1.55 + 1.56 + 1.57 + 1.58 + 1.59 + 1.60 + 1.61 +Newman, et al. Standards Track [Page 1] 1.62 + 1.63 +RFC 4790 Collation Registry March 2007 1.64 + 1.65 + 1.66 +Table of Contents 1.67 + 1.68 + 1. Introduction . . . . . . . . . . . . . . . . . . . . . . . . . 4 1.69 + 1.1. Conventions Used in This Document . . . . . . . . . . . . 4 1.70 + 2. Collation Definition and Purpose . . . . . . . . . . . . . . . 4 1.71 + 2.1. Definition . . . . . . . . . . . . . . . . . . . . . . . . 4 1.72 + 2.2. Purpose . . . . . . . . . . . . . . . . . . . . . . . . . 4 1.73 + 2.3. Some Other Terms Used in this Document . . . . . . . . . . 5 1.74 + 2.4. Sort Keys . . . . . . . . . . . . . . . . . . . . . . . . 5 1.75 + 3. Collation Identifier Syntax . . . . . . . . . . . . . . . . . 6 1.76 + 3.1. Basic Syntax . . . . . . . . . . . . . . . . . . . . . . . 6 1.77 + 3.2. Wildcards . . . . . . . . . . . . . . . . . . . . . . . . 6 1.78 + 3.3. Ordering Direction . . . . . . . . . . . . . . . . . . . . 7 1.79 + 3.4. URIs . . . . . . . . . . . . . . . . . . . . . . . . . . . 7 1.80 + 3.5. Naming Guidelines . . . . . . . . . . . . . . . . . . . . 7 1.81 + 4. Collation Specification Requirements . . . . . . . . . . . . . 8 1.82 + 4.1. Collation/Server Interface . . . . . . . . . . . . . . . . 8 1.83 + 4.2. Operations Supported . . . . . . . . . . . . . . . . . . . 8 1.84 + 4.2.1. Validity . . . . . . . . . . . . . . . . . . . . . . . 9 1.85 + 4.2.2. Equality . . . . . . . . . . . . . . . . . . . . . . . 9 1.86 + 4.2.3. Substring . . . . . . . . . . . . . . . . . . . . . . 9 1.87 + 4.2.4. Ordering . . . . . . . . . . . . . . . . . . . . . . . 10 1.88 + 4.3. Sort Keys . . . . . . . . . . . . . . . . . . . . . . . . 10 1.89 + 4.4. Use of Lookup Tables . . . . . . . . . . . . . . . . . . . 11 1.90 + 5. Application Protocol Requirements . . . . . . . . . . . . . . 11 1.91 + 5.1. Character Encoding . . . . . . . . . . . . . . . . . . . . 11 1.92 + 5.2. Operations . . . . . . . . . . . . . . . . . . . . . . . . 11 1.93 + 5.3. Wildcards . . . . . . . . . . . . . . . . . . . . . . . . 12 1.94 + 5.4. String Comparison . . . . . . . . . . . . . . . . . . . . 12 1.95 + 5.5. Disconnected Clients . . . . . . . . . . . . . . . . . . . 12 1.96 + 5.6. Error Codes . . . . . . . . . . . . . . . . . . . . . . . 13 1.97 + 5.7. Octet Collation . . . . . . . . . . . . . . . . . . . . . 13 1.98 + 6. Use by Existing Protocols . . . . . . . . . . . . . . . . . . 13 1.99 + 7. Collation Registration . . . . . . . . . . . . . . . . . . . . 14 1.100 + 7.1. Collation Registration Procedure . . . . . . . . . . . . . 14 1.101 + 7.2. Collation Registration Format . . . . . . . . . . . . . . 15 1.102 + 7.2.1. Registration Template . . . . . . . . . . . . . . . . 15 1.103 + 7.2.2. The Collation Element . . . . . . . . . . . . . . . . 15 1.104 + 7.2.3. The Identifier Element . . . . . . . . . . . . . . . . 16 1.105 + 7.2.4. The Title Element . . . . . . . . . . . . . . . . . . 16 1.106 + 7.2.5. The Operations Element . . . . . . . . . . . . . . . . 16 1.107 + 7.2.6. The Specification Element . . . . . . . . . . . . . . 16 1.108 + 7.2.7. The Submitter Element . . . . . . . . . . . . . . . . 16 1.109 + 7.2.8. The Owner Element . . . . . . . . . . . . . . . . . . 16 1.110 + 7.2.9. The Version Element . . . . . . . . . . . . . . . . . 17 1.111 + 7.2.10. The Variable Element . . . . . . . . . . . . . . . . . 17 1.112 + 7.3. Structure of Collation Registry . . . . . . . . . . . . . 17 1.113 + 7.4. Example Initial Registry Summary . . . . . . . . . . . . . 18 1.114 + 1.115 + 1.116 + 1.117 +Newman, et al. Standards Track [Page 2] 1.118 + 1.119 +RFC 4790 Collation Registry March 2007 1.120 + 1.121 + 1.122 + 8. Guidelines for Expert Reviewer . . . . . . . . . . . . . . . . 18 1.123 + 9. Initial Collations . . . . . . . . . . . . . . . . . . . . . . 19 1.124 + 9.1. ASCII Numeric Collation . . . . . . . . . . . . . . . . . 20 1.125 + 9.1.1. ASCII Numeric Collation Description . . . . . . . . . 20 1.126 + 9.1.2. ASCII Numeric Collation Registration . . . . . . . . . 20 1.127 + 9.2. ASCII Casemap Collation . . . . . . . . . . . . . . . . . 21 1.128 + 9.2.1. ASCII Casemap Collation Description . . . . . . . . . 21 1.129 + 9.2.2. ASCII Casemap Collation Registration . . . . . . . . . 22 1.130 + 9.3. Octet Collation . . . . . . . . . . . . . . . . . . . . . 22 1.131 + 9.3.1. Octet Collation Description . . . . . . . . . . . . . 22 1.132 + 9.3.2. Octet Collation Registration . . . . . . . . . . . . . 23 1.133 + 10. IANA Considerations . . . . . . . . . . . . . . . . . . . . . 23 1.134 + 11. Security Considerations . . . . . . . . . . . . . . . . . . . 23 1.135 + 12. Acknowledgements . . . . . . . . . . . . . . . . . . . . . . . 23 1.136 + 13. References . . . . . . . . . . . . . . . . . . . . . . . . . . 24 1.137 + 13.1. Normative References . . . . . . . . . . . . . . . . . . . 24 1.138 + 13.2. Informative References . . . . . . . . . . . . . . . . . . 24 1.139 + 1.140 + 1.141 + 1.142 + 1.143 + 1.144 + 1.145 + 1.146 + 1.147 + 1.148 + 1.149 + 1.150 + 1.151 + 1.152 + 1.153 + 1.154 + 1.155 + 1.156 + 1.157 + 1.158 + 1.159 + 1.160 + 1.161 + 1.162 + 1.163 + 1.164 + 1.165 + 1.166 + 1.167 + 1.168 + 1.169 + 1.170 + 1.171 + 1.172 + 1.173 +Newman, et al. Standards Track [Page 3] 1.174 + 1.175 +RFC 4790 Collation Registry March 2007 1.176 + 1.177 + 1.178 +1. Introduction 1.179 + 1.180 + The Application Configuration Access Protocol ACAP [11] specification 1.181 + introduced the concept of a comparator (which we call collation in 1.182 + this document), but failed to create an IANA registry. With the 1.183 + introduction of stringprep [6] and the Unicode Collation Algorithm 1.184 + [7], it is now time to create that registry and populate it with some 1.185 + initial values appropriate for an international community. This 1.186 + specification replaces and generalizes the definition of a comparator 1.187 + in ACAP, and creates a collation registry. 1.188 + 1.189 +1.1. Conventions Used in This Document 1.190 + 1.191 + The key words "MUST", "MUST NOT", "SHOULD", "SHOULD NOT", and "MAY" 1.192 + in this document are to be interpreted as defined in "Key words for 1.193 + use in RFCs to Indicate Requirement Levels" [1]. 1.194 + 1.195 + The attribute syntax specifications use the Augmented Backus-Naur 1.196 + Form (ABNF) [2] notation, including the core rules defined in 1.197 + Appendix A. The ABNF production "Language-tag" is imported from 1.198 + Language Tags [5] and "reg-name" from URI: Generic Syntax [4]. 1.199 + 1.200 +2. Collation Definition and Purpose 1.201 + 1.202 +2.1. Definition 1.203 + 1.204 + A collation is a named function which takes two arbitrary length 1.205 + strings as input and can be used to perform one or more of three 1.206 + basic comparison operations: equality test, substring match, and 1.207 + ordering test. 1.208 + 1.209 +2.2. Purpose 1.210 + 1.211 + Collations are an abstraction for comparison functions so that these 1.212 + comparison functions can be used in multiple protocols. The details 1.213 + of a particular comparison operation can be specified by someone with 1.214 + appropriate expertise, independent of the application protocols that 1.215 + use that collation. This is similar to the way a charset [13] 1.216 + separates the details of octet to character mapping from a protocol 1.217 + specification, such as MIME [9], or the way SASL [10] separates the 1.218 + details of an authentication mechanism from a protocol specification, 1.219 + such as ACAP [11]. 1.220 + 1.221 + 1.222 + 1.223 + 1.224 + 1.225 + 1.226 + 1.227 + 1.228 + 1.229 +Newman, et al. Standards Track [Page 4] 1.230 + 1.231 +RFC 4790 Collation Registry March 2007 1.232 + 1.233 + 1.234 + Here is a small diagram to help illustrate the value of this 1.235 + abstraction: 1.236 + 1.237 + +-------------------+ +-----------------+ 1.238 + | IMAP i18n SEARCH |--+ | Basic | 1.239 + +-------------------+ | +--| Collation Spec | 1.240 + | | +-----------------+ 1.241 + +-------------------+ | +-------------+ | +-----------------+ 1.242 + | ACAP i18n SEARCH |--+--| Collation |--+--| A stringprep | 1.243 + +-------------------+ | | Registry | | | Collation Spec | 1.244 + | +-------------+ | +-----------------+ 1.245 + +-------------------+ | | +-----------------+ 1.246 + | ...other protocol |--+ | | locale-specific | 1.247 + +-------------------+ +--| Collation Spec | 1.248 + +-----------------+ 1.249 + 1.250 + Thus IMAP, ACAP, and future application protocols with international 1.251 + search capability simply specify how to interface to the collation 1.252 + registry instead of each protocol specification having to specify all 1.253 + the collations it supports. 1.254 + 1.255 +2.3. Some Other Terms Used in this Document 1.256 + 1.257 + The terms client, server, and protocol are used in somewhat unusual 1.258 + senses. 1.259 + 1.260 + Client means a user, or a program acting directly on behalf of a 1.261 + user. This may be a mail reader acting as an IMAP client, or it may 1.262 + be an interactive shell, where the user can type protocol commands/ 1.263 + requests directly, or it may be a script or program written by the 1.264 + user. 1.265 + 1.266 + Server means a program that performs services requested by the 1.267 + client. This may be a traditional server such as an HTTP server, or 1.268 + it may be a Sieve [14] interpreter running a Sieve script written by 1.269 + a user. A server needs to use the operations provided by collations 1.270 + in order to fulfill the client's requests. 1.271 + 1.272 + The protocol describes how the client tells the server what it wants 1.273 + done, and (if applicable) how the server tells the client about the 1.274 + results. IMAP is a protocol by this definition, and so is the Sieve 1.275 + language. 1.276 + 1.277 +2.4. Sort Keys 1.278 + 1.279 + One component of a collation is a transformation, which turns a 1.280 + string into a sort key, which is then used while sorting. 1.281 + 1.282 + 1.283 + 1.284 + 1.285 +Newman, et al. Standards Track [Page 5] 1.286 + 1.287 +RFC 4790 Collation Registry March 2007 1.288 + 1.289 + 1.290 + The transformation can range from an identity mapping (e.g., the 1.291 + i;octet collation Section 9.3) to a mapping that makes the string 1.292 + unreadable to a human. 1.293 + 1.294 + This is an implementation detail of collations or servers. A 1.295 + protocol SHOULD NOT expose it to clients, since some collations leave 1.296 + the sort key's format up to the implementation, and current 1.297 + conformant implementations are known to use different formats. 1.298 + 1.299 +3. Collation Identifier Syntax 1.300 + 1.301 +3.1. Basic Syntax 1.302 + 1.303 + The collation identifier itself is a single US-ASCII string. The 1.304 + identifier MUST NOT be longer than 254 characters, and obeys the 1.305 + following grammar: 1.306 + 1.307 + collation-char = ALPHA / DIGIT / "-" / ";" / "=" / "." 1.308 + 1.309 + collation-id = collation-prefix ";" collation-core-name 1.310 + *collation-arg 1.311 + 1.312 + collation-scope = Language-tag / "vnd-" reg-name 1.313 + 1.314 + collation-core-name = ALPHA *( ALPHA / DIGIT / "-" ) 1.315 + 1.316 + collation-arg = ";" ALPHA *( ALPHA / DIGIT ) "=" 1.317 + 1*( ALPHA / DIGIT / "." ) 1.318 + 1.319 + 1.320 + Note: the ABNF production "Language-tag" is imported from Language 1.321 + Tags [5] and "reg-name" from URI: Generic Syntax [4]. 1.322 + 1.323 + There is a special identifier called "default". For protocols that 1.324 + have a default collation, "default" refers to that collation. For 1.325 + other protocols, the identifier "default" MUST match no collations, 1.326 + and servers SHOULD treat it in the same way as they treat nonexistent 1.327 + collations. 1.328 + 1.329 +3.2. Wildcards 1.330 + 1.331 + The string a client uses to select a collation MAY contain one or 1.332 + more wildcard ("*") characters that match zero or more collation- 1.333 + chars. Wildcard characters MUST NOT be adjacent. If the wildcard 1.334 + string matches multiple collations, the server SHOULD attempt to 1.335 + select a widely useful collation in preference to a narrowly useful 1.336 + one. 1.337 + 1.338 + 1.339 + 1.340 + 1.341 +Newman, et al. Standards Track [Page 6] 1.342 + 1.343 +RFC 4790 Collation Registry March 2007 1.344 + 1.345 + 1.346 + collation-wild = ("*" / (ALPHA ["*"])) *(collation-char ["*"]) 1.347 + ; MUST NOT exceed 254 characters total 1.348 + 1.349 +3.3. Ordering Direction 1.350 + 1.351 + When used as a protocol element for ordering, the collation 1.352 + identifier MAY be prefixed by either "+" or "-" to explicitly specify 1.353 + an ordering direction. "+" has no effect on the ordering operation, 1.354 + while "-" inverts the result of the ordering operation. In general, 1.355 + collation-order is used when a client requests a collation, and 1.356 + collation-selected is used when the server informs the client of the 1.357 + selected collation. 1.358 + 1.359 + collation-selected = ["+" / "-"] collation-id 1.360 + 1.361 + collation-order = ["+" / "-"] collation-wild 1.362 + 1.363 +3.4. URIs 1.364 + 1.365 + Some protocols are designed to use URIs [4] to refer to collations 1.366 + rather than simple tokens. A special section of the IANA URL space 1.367 + is reserved for such usage. The "collation-uri" form is used to 1.368 + refer to a specific named collation (the collation registration may 1.369 + not actually be present). The "collation-auri" form is an abstract 1.370 + name for an ordering, a collation pattern or a vendor private 1.371 + collator. 1.372 + 1.373 + collation-uri = "http://www.iana.org/assignments/collation/" 1.374 + collation-id ".xml" 1.375 + 1.376 + collation-auri = ( "http://www.iana.org/assignments/collation/" 1.377 + collation-order ".xml" ) / other-uri 1.378 + 1.379 + other-uri = <absoluteURI> 1.380 + ; excluding the IANA collation namespace. 1.381 + 1.382 +3.5. Naming Guidelines 1.383 + 1.384 + While this specification makes no absolute requirements on the 1.385 + structure of collation identifiers, naming consistency is important, 1.386 + so the following initial guidelines are provided. 1.387 + 1.388 + Collation identifiers with an international audience typically begin 1.389 + with "i;". Collation identifiers intended for a particular language 1.390 + or locale typically begin with a language tag [5] followed by a ";". 1.391 + After the first ";" is normally the name of the general collation 1.392 + algorithm, followed by a series of algorithm modifications separated 1.393 + by the ";" delimiter. Parameterized modifications will use "=" to 1.394 + 1.395 + 1.396 + 1.397 +Newman, et al. Standards Track [Page 7] 1.398 + 1.399 +RFC 4790 Collation Registry March 2007 1.400 + 1.401 + 1.402 + delimit the parameter from the value. The version numbers of any 1.403 + lookup tables used by the algorithm SHOULD be present as 1.404 + parameterized modifications. 1.405 + 1.406 + Collation identifiers of the form *;vnd-hostname;* are reserved for 1.407 + vendor-specific collations created by the owner of the hostname 1.408 + following the "vnd-" prefix (e.g., vnd-example.com for the vendor 1.409 + example.com). Registration of such collations (or the name space as 1.410 + a whole), with intended use of the "Vendor", is encouraged when a 1.411 + public specification or open-source implementation is available, but 1.412 + is not required. 1.413 + 1.414 +4. Collation Specification Requirements 1.415 + 1.416 +4.1. Collation/Server Interface 1.417 + 1.418 + The collation itself defines what it operates on. Most collations 1.419 + are expected to operate on character strings. The i;octet 1.420 + (Section 9.3) collation operates on octet strings. The i;ascii- 1.421 + numeric (Section 9.1) operation operates on numbers. 1.422 + 1.423 + This specification defines the collation interface in terms of octet 1.424 + strings. However, implementations may choose to use character 1.425 + strings instead. Such implementations may not be able to implement 1.426 + e.g., i;octet. Since i;octet is not currently mandatory to implement 1.427 + for any protocol, this should not be a problem. 1.428 + 1.429 +4.2. Operations Supported 1.430 + 1.431 + A collation specification MUST state which of the three basic 1.432 + operations are supported (equality, substring, ordering) and how to 1.433 + perform each of the supported operations on any two input character 1.434 + strings, including empty strings. Collations must be deterministic, 1.435 + i.e., given a collation with a specific identifier, and any two fixed 1.436 + input strings, the result MUST be the same for the same operation. 1.437 + 1.438 + In general, collation operations should behave as their names 1.439 + suggest. While a collation may be new, the operations are not, so 1.440 + the new collation's operations should be similar to those of older 1.441 + collations. For example, a date/time collation should not provide a 1.442 + "substring" operation that would morph IMAP substring SEARCH into 1.443 + e.g., a date-range search. 1.444 + 1.445 + A non-obvious consequence of the rules for each collation operation 1.446 + is that, for any single collation, either none or all of the 1.447 + operations can return "undefined". For example, it is not possible 1.448 + to have an equality operation that never returns "undefined", and a 1.449 + substring operation that occasionally does. 1.450 + 1.451 + 1.452 + 1.453 +Newman, et al. Standards Track [Page 8] 1.454 + 1.455 +RFC 4790 Collation Registry March 2007 1.456 + 1.457 + 1.458 +4.2.1. Validity 1.459 + 1.460 + The validity test takes one string as argument. It returns valid if 1.461 + its input string is a valid input to the collation's other 1.462 + operations, and invalid if not. (In other words, a string is valid 1.463 + if it is equal to itself according to the collation's equality 1.464 + operation.) 1.465 + 1.466 + The validity test is provided by all collations. It MUST NOT be 1.467 + listed separately in the collation registration. 1.468 + 1.469 +4.2.2. Equality 1.470 + 1.471 + The equality test always returns "match" or "no-match" when it is 1.472 + supplied valid input, and MAY return "undefined" if one or both input 1.473 + strings are not valid. 1.474 + 1.475 + The equality test MUST be reflexive and symmetric. For valid input, 1.476 + it MUST be transitive. 1.477 + 1.478 + If a collation provides either a substring or an ordering test, it 1.479 + MUST also provide an equality test. The substring and/or ordering 1.480 + tests MUST be consistent with the equality test. 1.481 + 1.482 + The return values of the equality test are called "match", "no-match" 1.483 + and "undefined" in this document. 1.484 + 1.485 +4.2.3. Substring 1.486 + 1.487 + The substring matching operation determines if the first string is a 1.488 + substring of the second string, i.e., if one or more substrings of 1.489 + the second string is equal to the first, as defined by the 1.490 + collation's equality operation. 1.491 + 1.492 + A collation that supports substring matching will automatically 1.493 + support two special cases of substring matching: prefix and suffix 1.494 + matching, if those special cases are supported by the application 1.495 + protocol. It returns "match" or "no-match" when it is supplied valid 1.496 + input and returns "undefined" when supplied invalid input. 1.497 + 1.498 + Application protocols MAY return position information for substring 1.499 + matches. If this is done, the position information SHOULD include 1.500 + both the starting offset and the ending offset for each match. This 1.501 + is important because more sophisticated collations can match strings 1.502 + of unequal length (for example, a pre-composed accented character can 1.503 + match a decomposed accented character). In general, overlapping 1.504 + matches SHOULD be reported (as when "ana" occurs twice within 1.505 + "banana"), although there are cases where a collation may decide not 1.506 + 1.507 + 1.508 + 1.509 +Newman, et al. Standards Track [Page 9] 1.510 + 1.511 +RFC 4790 Collation Registry March 2007 1.512 + 1.513 + 1.514 + to. For example, in a collation which treats all whitespace 1.515 + sequences as identical, the substring operation could be defined such 1.516 + that " 1 " (SP "1" SP) is reported just once within " 1 " (SP SP 1.517 + "1" SP SP), not four times (SP SP "1" SP, SP "1" SP, SP "1" SP SP and 1.518 + SP SP "1" SP SP), since the four matches are, in a sense, the same 1.519 + match. 1.520 + 1.521 + A string is a substring of itself. The empty string is a substring 1.522 + of all strings. 1.523 + 1.524 + Note that the substring operation of some collations can match 1.525 + strings of unequal length. For example, a pre-composed accented 1.526 + character can match a decomposed accented character. The Unicode 1.527 + Collation Algorithm [7] discusses this in more detail. 1.528 + 1.529 + The return values of the substring operation are called "match", "no- 1.530 + match", and "undefined" in this document. 1.531 + 1.532 +4.2.4. Ordering 1.533 + 1.534 + The ordering operation determines how two strings are ordered. It 1.535 + MUST be reflexive. For valid input, it MUST be transitive and 1.536 + trichotomous. 1.537 + 1.538 + Ordering returns "less" if the first string is listed before the 1.539 + second string, according to the collation; "greater", if the second 1.540 + string is listed before the first string; and "equal", if the two 1.541 + strings are equal, as defined by the collation's equality operation. 1.542 + If one or both strings are invalid, the result of ordering is 1.543 + "undefined". 1.544 + 1.545 + When the collation is used with a "+" prefix, the behavior is the 1.546 + same as when used with no prefix. When the collation is used with a 1.547 + "-" prefix, the result of the ordering operation of the collation 1.548 + MUST be reversed. 1.549 + 1.550 + The return values of the ordering operation are called "less", 1.551 + "equal", "greater", and "undefined" in this document. 1.552 + 1.553 +4.3. Sort Keys 1.554 + 1.555 + A collation specification SHOULD describe the internal transformation 1.556 + algorithm to generate sort keys. This algorithm can be applied to 1.557 + individual strings, and the result can be stored to potentially 1.558 + optimize future comparison operations. A collation MAY specify that 1.559 + the sort key is generated by the identity function. The sort key may 1.560 + have no meaning to a human. The sort key may not be valid input to 1.561 + the collation. 1.562 + 1.563 + 1.564 + 1.565 +Newman, et al. Standards Track [Page 10] 1.566 + 1.567 +RFC 4790 Collation Registry March 2007 1.568 + 1.569 + 1.570 +4.4. Use of Lookup Tables 1.571 + 1.572 + Some collations use customizable lookup tables, e.g., because the 1.573 + tables depend on locale, and may be modified after shipping the 1.574 + software. Collations that use more than one customizable lookup 1.575 + table in a documented format MUST assign numbers to the tables they 1.576 + use. This permits an application protocol command to access the 1.577 + tables used by a server collation, so that clients and servers use 1.578 + the same tables. 1.579 + 1.580 +5. Application Protocol Requirements 1.581 + 1.582 + This section describes the requirements and issues that an 1.583 + application protocol needs to consider if it offers searching, 1.584 + substring matching and/or sorting, and permits the use of characters 1.585 + outside the US-ASCII charset. 1.586 + 1.587 +5.1. Character Encoding 1.588 + 1.589 + The protocol specification has to make sure that it is clear on which 1.590 + characters (rather than just octets) the collations are used. This 1.591 + can be done by specifying the protocol itself in terms of characters 1.592 + (e.g., in the case of a query language), by specifying a single 1.593 + character encoding for the protocol (e.g., UTF-8 [3]), or by 1.594 + carefully describing the relevant issues of character encoding 1.595 + labeling and conversion. In the later case, details to consider 1.596 + include how to handle unknown charsets, any charsets that are 1.597 + mandatory-to-implement, any issues with byte-order that might apply, 1.598 + and any transfer encodings that need to be supported. 1.599 + 1.600 +5.2. Operations 1.601 + 1.602 + The protocol must specify which of the operations defined in this 1.603 + specification (equality matching, substring matching, and ordering) 1.604 + can be invoked in the protocol, and how they are invoked. There may 1.605 + be more than one way to invoke an operation. 1.606 + 1.607 + The protocol MUST provide a mechanism for the client to select the 1.608 + collation to use with equality matching, substring matching, and 1.609 + ordering. 1.610 + 1.611 + If a protocol needs a total ordering and the collation chosen does 1.612 + not provide it because the ordering operation returns "undefined" at 1.613 + least once, the recommended fallback is to sort all invalid strings 1.614 + after the valid ones, and use i;octet to order the invalid strings. 1.615 + 1.616 + Although the collation's substring function provides a list of 1.617 + matches, a protocol need not provide all that to the client. It may 1.618 + 1.619 + 1.620 + 1.621 +Newman, et al. Standards Track [Page 11] 1.622 + 1.623 +RFC 4790 Collation Registry March 2007 1.624 + 1.625 + 1.626 + provide only the first matching substring, or even just the 1.627 + information that the substring search matched. In this way, 1.628 + collations can be used with protocols that are defined such that "x 1.629 + is a substring of y" returns true-false. 1.630 + 1.631 + If the protocol provides positional information for the results of a 1.632 + substring match, that positional information SHOULD fully specify the 1.633 + substring(s) in the result that matches, independent of the length of 1.634 + the search string. For example, returning both the starting and 1.635 + ending offset of the match would suffice, as would the starting 1.636 + offset and a length. Returning just the starting offset is not 1.637 + acceptable. This rule is necessary because advanced collations can 1.638 + treat strings of different lengths as equal (for example, pre- 1.639 + composed and decomposed accented characters). 1.640 + 1.641 +5.3. Wildcards 1.642 + 1.643 + The protocol MUST specify whether it allows the use of wildcards in 1.644 + collation identifiers. If the protocol allows wildcards, then: 1.645 + The protocol MUST specify how comparisons behave in the absence of 1.646 + explicit collation negotiation, or when a collation of "default" 1.647 + is requested. The protocol MAY specify that the default collation 1.648 + used in such circumstances is sensitive to server configuration. 1.649 + 1.650 + The protocol SHOULD provide a way to list available collations 1.651 + matching a given wildcard pattern, or patterns. 1.652 + 1.653 +5.4. String Comparison 1.654 + 1.655 + If a protocol compares strings in any nontrivial way, using a 1.656 + collation may be appropriate. As an example, many protocols use 1.657 + case-independent strings. In many cases, a simple ASCII mapping to 1.658 + upper/lower case works well. In other cases, it may be better to use 1.659 + a specifiable collation; for example, so that a server can treat "i" 1.660 + and "I" as equivalent in Italy, and different in Turkey (Turkish also 1.661 + has a dotted upper-case" I" and a dotless lower-case "i"). 1.662 + 1.663 + Protocol designers should consider, in each case, whether to use a 1.664 + specifiable collation. Keywords often have other needs than user 1.665 + variables, and search arguments may be different again. 1.666 + 1.667 +5.5. Disconnected Clients 1.668 + 1.669 + If the protocol supports disconnected clients, and a collation is 1.670 + used that can use configurable tables (e.g., to support 1.671 + locale-specific extensions), then the client may not be able to 1.672 + reproduce the server's collation operations while offline. 1.673 + 1.674 + 1.675 + 1.676 + 1.677 +Newman, et al. Standards Track [Page 12] 1.678 + 1.679 +RFC 4790 Collation Registry March 2007 1.680 + 1.681 + 1.682 + A mechanism to download such tables has been discussed. Such a 1.683 + mechanism is not included in the present specification, since the 1.684 + problem is not yet well understood. 1.685 + 1.686 +5.6. Error Codes 1.687 + 1.688 + The protocol specification should consider assigning protocol error 1.689 + codes for the following circumstances: 1.690 + 1.691 + o The client requests the use of a collation by identifier or 1.692 + pattern, but no implemented collation matches that pattern. 1.693 + 1.694 + o The client attempts to use a collation for an operation that is 1.695 + not supported by that collation -- for example, attempting to use 1.696 + the "i;ascii-numeric" collation for substring matching. 1.697 + 1.698 + o The client uses an equality or substring matching collation, and 1.699 + the result is an error. It may be appropriate to distinguish 1.700 + between the two input strings, particularly when one is supplied 1.701 + by the client and the other is stored by the server. It might 1.702 + also be appropriate to distinguish the specific case of an invalid 1.703 + UTF-8 string. 1.704 + 1.705 +5.7. Octet Collation 1.706 + 1.707 + The i;octet (Section 9.3) collation is only usable with protocols 1.708 + based on octet-strings. Clients and servers MUST NOT use i;octet 1.709 + with other protocols. 1.710 + 1.711 + If the protocol permits the use of collations with data structures 1.712 + other than strings, the protocol MUST describe the default behavior 1.713 + for a collation with those data structures. 1.714 + 1.715 +6. Use by Existing Protocols 1.716 + 1.717 + This section is informative. 1.718 + 1.719 + Both ACAP [11] and Sieve [14] are standards track specifications that 1.720 + used collations prior to the creation of this specification and 1.721 + registry. Those standards do not meet all the application protocol 1.722 + requirements described in Section 5. 1.723 + 1.724 + These protocols allow the use of the i;octet (Section 9.3) collation 1.725 + working directly on UTF-8 data, as used in these protocols. 1.726 + 1.727 + 1.728 + 1.729 + 1.730 + 1.731 + 1.732 + 1.733 +Newman, et al. Standards Track [Page 13] 1.734 + 1.735 +RFC 4790 Collation Registry March 2007 1.736 + 1.737 + 1.738 + In Sieve, all matches are either true or false. Accordingly, Sieve 1.739 + servers must treat "undefined" and "no-match" results of the equality 1.740 + and substring operations as false, and only "match" as true. 1.741 + 1.742 + In ACAP and Sieve, there are no invalid strings. In this document's 1.743 + terms, invalid strings sort after valid strings. 1.744 + 1.745 + IMAP [15] also collates, although that is explicit only when the 1.746 + COMPARATOR [17] extension is used. The built-in IMAP substring 1.747 + operation and the ordering provided by the SORT [16] extension may 1.748 + not meet the requirements made in this document. 1.749 + 1.750 + Other protocols may be in a similar position. 1.751 + 1.752 + In IMAP, the default collation is i;ascii-casemap, because its 1.753 + operations are understood to match IMAP's built-in operations. 1.754 + 1.755 +7. Collation Registration 1.756 + 1.757 +7.1. Collation Registration Procedure 1.758 + 1.759 + The IETF will create a mailing list, collation@ietf.org, which can be 1.760 + used for public discussion of collation proposals prior to 1.761 + registration. Use of the mailing list is strongly encouraged. The 1.762 + IESG will appoint a designated expert who will monitor the 1.763 + collation@ietf.org mailing list and review registrations. 1.764 + 1.765 + The registration procedure begins when a completed registration 1.766 + template is sent to iana@iana.org and collation@ietf.org. The 1.767 + designated expert is expected to tell IANA and the submitter of the 1.768 + registration within two weeks whether the registration is approved, 1.769 + approved with minor changes, or rejected with cause. When a 1.770 + registration is rejected with cause, it can be re-submitted if the 1.771 + concerns listed in the cause are addressed. Decisions made by the 1.772 + designated expert can be appealed to the IESG Applications Area 1.773 + Director, then to the IESG. They follow the normal appeals procedure 1.774 + for IESG decisions. 1.775 + 1.776 + Collation registrations in a standards track, BCP, or IESG-approved 1.777 + experimental RFC are owned by the IETF, and changes to the 1.778 + registration follow normal procedures for updating such documents. 1.779 + Collation registrations in other RFCs are owned by the RFC author(s). 1.780 + Other collation registrations are owned by the individual(s) listed 1.781 + in the contact field of the registration, and IANA will preserve this 1.782 + information. 1.783 + 1.784 + If the registration is a change of an existing collation, it MUST be 1.785 + approved by the owner. In the event the owner cannot be contacted 1.786 + 1.787 + 1.788 + 1.789 +Newman, et al. Standards Track [Page 14] 1.790 + 1.791 +RFC 4790 Collation Registry March 2007 1.792 + 1.793 + 1.794 + for a period of one month, and the designated expert deems the change 1.795 + necessary, the IESG MAY re-assign ownership to an appropriate party. 1.796 + 1.797 +7.2. Collation Registration Format 1.798 + 1.799 + Registration of a collation is done by sending a well-formed XML 1.800 + document to collation@ietf.org and iana@iana.org. 1.801 + 1.802 +7.2.1. Registration Template 1.803 + 1.804 + Here is a template for the registration: 1.805 + 1.806 + <?xml version='1.0'?> 1.807 + <!DOCTYPE collation SYSTEM 'collationreg.dtd'> 1.808 + <collation rfc="YYYY" scope="global" intendedUse="common"> 1.809 + <identifier>collation identifier</identifier> 1.810 + <title>technical title for collation</title> 1.811 + <operations>equality order substring</operations> 1.812 + <specification>specification reference</specification> 1.813 + <owner>email address of owner or IETF</owner> 1.814 + <submitter>email address of submitter</submitter> 1.815 + <version>1</version> 1.816 + </collation> 1.817 + 1.818 +7.2.2. The Collation Element 1.819 + 1.820 + The root of the registration document MUST be a <collation> element. 1.821 + The collation element contains the other elements in the 1.822 + registration, which are described in the following sub-subsections, 1.823 + in the order given here. 1.824 + 1.825 + The <collation> element MAY include an "rfc=" attribute if the 1.826 + specification is in an RFC. The "rfc=" attribute gives only the 1.827 + number of the RFC, without any prefix, such as "RFC", or suffix, such 1.828 + as ".txt". 1.829 + 1.830 + The <collation> element MUST include a "scope=" attribute, which MUST 1.831 + have one of the values "global", "local", or "other". 1.832 + 1.833 + The <collation> element MUST include an "intendedUse=" attribute, 1.834 + which must have one of the values "common", "limited", "vendor", or 1.835 + "deprecated". Collation specifications intended for "common" use are 1.836 + expected to reference standards from standards bodies with 1.837 + significant experience dealing with the details of international 1.838 + character sets. 1.839 + 1.840 + Be aware that future revisions of this specification may add 1.841 + additional function types, as well as additional XML attributes, 1.842 + 1.843 + 1.844 + 1.845 +Newman, et al. Standards Track [Page 15] 1.846 + 1.847 +RFC 4790 Collation Registry March 2007 1.848 + 1.849 + 1.850 + values, and elements. Any system that automatically parses these XML 1.851 + documents MUST take this into account to preserve future 1.852 + compatibility. 1.853 + 1.854 +7.2.3. The Identifier Element 1.855 + 1.856 + The <identifier> element gives the precise identifier of the 1.857 + collation, e.g., i;ascii-casemap. The <identifier> element is 1.858 + mandatory. 1.859 + 1.860 +7.2.4. The Title Element 1.861 + 1.862 + The <title> element gives the title of the collation. The <title> 1.863 + element is mandatory. 1.864 + 1.865 +7.2.5. The Operations Element 1.866 + 1.867 + The <operations> element lists which of the three operations 1.868 + ("equality", "order" or "substring") the collation provides, 1.869 + separated by single spaces. The <operations> element is mandatory. 1.870 + 1.871 +7.2.6. The Specification Element 1.872 + 1.873 + The <specification> element describes where to find the 1.874 + specification. The <specification> element is mandatory. It MAY 1.875 + have a URI attribute. There may be more than one <specification> 1.876 + element, in which case, they together form the specification. 1.877 + 1.878 + If it is discovered that parts of a collation specification conflict, 1.879 + a new revision of the collation is necessary, and the 1.880 + collation@ietf.org mailing list should be notified. 1.881 + 1.882 +7.2.7. The Submitter Element 1.883 + 1.884 + The <submitter> element provides an RFC 2822 [12] email address for 1.885 + the person who submitted the registration. It is optional if the 1.886 + <owner> element contains an email address. 1.887 + 1.888 + There may be more than one <submitter> element. 1.889 + 1.890 +7.2.8. The Owner Element 1.891 + 1.892 + The <owner> element contains either the four letters "IETF" or an 1.893 + email address of the owner of the registration. The <owner> element 1.894 + is mandatory. There may be more than one <owner> element. If so, 1.895 + all owners are equal. Each owner can speak for all. 1.896 + 1.897 + 1.898 + 1.899 + 1.900 + 1.901 +Newman, et al. Standards Track [Page 16] 1.902 + 1.903 +RFC 4790 Collation Registry March 2007 1.904 + 1.905 + 1.906 +7.2.9. The Version Element 1.907 + 1.908 + The <version> element MUST be included when the registration is 1.909 + likely to be revised, or has been revised in such a way that the 1.910 + results change for one or more input strings. The <version> element 1.911 + is optional. 1.912 + 1.913 +7.2.10. The Variable Element 1.914 + 1.915 + The <variable> element specifies an optional variable to control the 1.916 + collation's behaviour, for example whether it is case sensitive. The 1.917 + <variable> element is optional. When <variable> is used, it must 1.918 + contain <name> and <default> elements, and it may contain one or more 1.919 + <value> elements. 1.920 + 1.921 +7.2.10.1. The Name Element 1.922 + 1.923 + The <name> element specifies the name value of a variable. The 1.924 + <name> element is mandatory. 1.925 + 1.926 +7.2.10.2. The Default Element 1.927 + 1.928 + The <default> element specifies the default value of a variable. The 1.929 + <default> element is mandatory. 1.930 + 1.931 +7.2.10.3. The Value Element 1.932 + 1.933 + The <value> element specifies a legal value of a variable. The 1.934 + <value> element is optional. If one or more <value> elements are 1.935 + present, only those values are legal. If none are, then the 1.936 + variable's legal values do not form an enumerated set, and the rules 1.937 + MUST be specified in an RFC accompanying the registration. 1.938 + 1.939 +7.3. Structure of Collation Registry 1.940 + 1.941 + Once the registration is approved, IANA will store each XML 1.942 + registration document in a URL of the form 1.943 + http://www.iana.org/assignments/collation/collation-id.xml, where 1.944 + collation-id is the content of the identifier element in the 1.945 + registration. Both the submitter and the designated expert are 1.946 + responsible for verifying that the XML is well-formed. The 1.947 + registration document should avoid using new elements. If any are 1.948 + necessary, it is important to be consistent with other registrations. 1.949 + 1.950 + IANA will also maintain a text summary of the registry under the name 1.951 + http://www.iana.org/assignments/collation/collation-index.html. This 1.952 + summary is divided into four sections. The first section is for 1.953 + collations intended for common use. This section is intended for 1.954 + 1.955 + 1.956 + 1.957 +Newman, et al. Standards Track [Page 17] 1.958 + 1.959 +RFC 4790 Collation Registry March 2007 1.960 + 1.961 + 1.962 + collation registrations published in IESG-approved RFCs, or for 1.963 + locally scoped collations from the primary standards body for that 1.964 + locale. The designated expert is encouraged to reject collation 1.965 + registrations with an intended use of "common" if the expert believes 1.966 + it should be "limited", as it is desirable to keep the number of 1.967 + "common" registrations small and of high quality. The second section 1.968 + is reserved for limited-use collations. The third section is 1.969 + reserved for registered vendor-specific collations. The final 1.970 + section is reserved for deprecated collations. 1.971 + 1.972 +7.4. Example Initial Registry Summary 1.973 + 1.974 + The following is an example of how IANA might structure the initial 1.975 + registry summary.html file: 1.976 + 1.977 + Collation Functions Scope Reference 1.978 + --------- --------- ----- --------- 1.979 + Common Use Collations: 1.980 + i;ascii-casemap e, o, s Local [RFC 4790] 1.981 + 1.982 + Limited Use Collations: 1.983 + i;octet e, o, s Other [RFC 4790] 1.984 + i;ascii-numeric e, o Other [RFC 4790] 1.985 + 1.986 + Vendor Collations: 1.987 + 1.988 + Deprecated Collations: 1.989 + 1.990 + 1.991 + References 1.992 + ---------- 1.993 + [RFC 4790] Newman, C., Duerst, M., Gulbrandsen, A., "Internet 1.994 + Application Protocol Collation Registry", RFC 4790, 1.995 + Sun Microsystems, March 2007. 1.996 + 1.997 +8. Guidelines for Expert Reviewer 1.998 + 1.999 + The expert reviewer appointed by the IESG has fairly broad latitude 1.1000 + for this registry. While a number of collations are expected 1.1001 + (particularly customizations of the UCA for localized use), an 1.1002 + explosion of collations (particularly common-use collations) is not 1.1003 + desirable for widespread interoperability. However, it is important 1.1004 + for the expert reviewer to provide cause when rejecting a 1.1005 + registration, and, when possible, to describe corrective action to 1.1006 + 1.1007 + 1.1008 + 1.1009 + 1.1010 + 1.1011 + 1.1012 + 1.1013 +Newman, et al. Standards Track [Page 18] 1.1014 + 1.1015 +RFC 4790 Collation Registry March 2007 1.1016 + 1.1017 + 1.1018 + permit the registration to proceed. The following table includes 1.1019 + some example reasons to reject a registration with cause: 1.1020 + 1.1021 + o The registration is not a well-formed XML document. 1.1022 + 1.1023 + o The registration has an intended use of "common", but there is no 1.1024 + evidence the collation will be widely deployed, so it should be 1.1025 + listed as "limited". 1.1026 + 1.1027 + o The registration has an intended use of "common", but it is 1.1028 + redundant with the functionality of a previously registered 1.1029 + "common" collation. 1.1030 + 1.1031 + o The registration has an intended use of "common", but the 1.1032 + specification is not detailed enough to allow interoperable 1.1033 + implementations by others. 1.1034 + 1.1035 + o The collation identifier fails to precisely identify the version 1.1036 + numbers of relevant tables to use. 1.1037 + 1.1038 + o The registration fails to meet one of the "MUST" requirements in 1.1039 + Section 4. 1.1040 + 1.1041 + o The collation identifier fails to meet the syntax in Section 3. 1.1042 + 1.1043 + o The collation specification referenced in the registration is 1.1044 + vague or has optional features without a clear behavior specified. 1.1045 + 1.1046 + o The referenced specification does not adequately address security 1.1047 + considerations specific to that collation. 1.1048 + 1.1049 + o The registration's operations are needlessly different from those 1.1050 + of traditional operations. 1.1051 + 1.1052 + o The registration's XML is needlessly different from that of 1.1053 + already registered collations. 1.1054 + 1.1055 +9. Initial Collations 1.1056 + 1.1057 + This section registers the three collations that were originally 1.1058 + defined in [11], and are implemented in most [14] engines. Some of 1.1059 + the behavior of these collations is perhaps not ideal, such as 1.1060 + i;ascii-casemap accepting non-ASCII input. Compatibility with widely 1.1061 + deployed code was judged more important than fixing the collations. 1.1062 + Some of the aspects of these collations are necessary to maintain 1.1063 + compatibility with widely deployed code. 1.1064 + 1.1065 + 1.1066 + 1.1067 + 1.1068 + 1.1069 +Newman, et al. Standards Track [Page 19] 1.1070 + 1.1071 +RFC 4790 Collation Registry March 2007 1.1072 + 1.1073 + 1.1074 +9.1. ASCII Numeric Collation 1.1075 + 1.1076 +9.1.1. ASCII Numeric Collation Description 1.1077 + 1.1078 + The "i;ascii-numeric" collation is a simple collation intended for 1.1079 + use with arbitrarily-sized, unsigned decimal integer numbers stored 1.1080 + as octet strings. US-ASCII digits (0x30 to 0x39) represent digits of 1.1081 + the numbers. Before converting from string to integer, the input 1.1082 + string is truncated at the first non-digit character. All input is 1.1083 + valid; strings that do not start with a digit represent positive 1.1084 + infinity. 1.1085 + 1.1086 + The collation supports equality and ordering, but does not support 1.1087 + the substring operation. 1.1088 + 1.1089 + The equality operation returns "match" if the two strings represent 1.1090 + the same number (i.e., leading zeroes and trailing non-digits are 1.1091 + disregarded), and "no-match" if the two strings represent different 1.1092 + numbers. 1.1093 + 1.1094 + The ordering operation returns "less" if the first string represents 1.1095 + a smaller number than the second, "equal" if they represent the same 1.1096 + number, and "greater" if the first string represents a larger number 1.1097 + than the second. 1.1098 + 1.1099 + Some examples: "0" is less than "1", and "1" is less than 1.1100 + "4294967298". "4294967298", "04294967298", and "4294967298b" are all 1.1101 + equal. "04294967298" is less than "". "", "x", and "y" are equal. 1.1102 + 1.1103 +9.1.2. ASCII Numeric Collation Registration 1.1104 + 1.1105 + <?xml version='1.0'?> 1.1106 + <!DOCTYPE collation SYSTEM 'collationreg.dtd'> 1.1107 + <collation rfc="4790" scope="other" intendedUse="limited"> 1.1108 + <identifier>i;ascii-numeric</identifier> 1.1109 + <title>ASCII Numeric</title> 1.1110 + <operations>equality order</operations> 1.1111 + <specification>RFC 4790</specification> 1.1112 + <owner>IETF</owner> 1.1113 + <submitter>chris.newman@sun.com</submitter> 1.1114 + </collation> 1.1115 + 1.1116 + 1.1117 + 1.1118 + 1.1119 + 1.1120 + 1.1121 + 1.1122 + 1.1123 + 1.1124 + 1.1125 +Newman, et al. Standards Track [Page 20] 1.1126 + 1.1127 +RFC 4790 Collation Registry March 2007 1.1128 + 1.1129 + 1.1130 +9.2. ASCII Casemap Collation 1.1131 + 1.1132 +9.2.1. ASCII Casemap Collation Description 1.1133 + 1.1134 + The "i;ascii-casemap" collation is a simple collation that operates 1.1135 + on octet strings and treats US-ASCII letters case-insensitively. It 1.1136 + provides equality, substring, and ordering operations. All input is 1.1137 + valid. Note that letters outside ASCII are not treated case- 1.1138 + insensitively. 1.1139 + 1.1140 + Its equality, ordering, and substring operations are as for i;octet, 1.1141 + except that at first, the lower-case letters (octet values 97-122) in 1.1142 + each input string are changed to upper case (octet values 65-90). 1.1143 + 1.1144 + Care should be taken when using OS-supplied functions to implement 1.1145 + this collation, as it is not locale sensitive. Functions, such as 1.1146 + strcasecmp and toupper, are sometimes locale sensitive, and may 1.1147 + inappropriately map lower-case letters other than a-z to upper case. 1.1148 + 1.1149 + The i;ascii-casemap collation is well-suited for use with many 1.1150 + Internet protocols and computer languages. Use with natural language 1.1151 + is often inappropriate; even though the collation apparently supports 1.1152 + languages such as Swahili and English, in real-world use, it tends to 1.1153 + mis-sort a number of types of string: 1.1154 + 1.1155 + o people and place names containing non-ASCII, 1.1156 + 1.1157 + o words such as "naive" (if spelled with an accent, the accented 1.1158 + character could push the word to the wrong spot in a sorted list), 1.1159 + 1.1160 + o names such as "Lloyd" (which, in Welsh, sorts after "Lyon", unlike 1.1161 + in English), 1.1162 + 1.1163 + o strings containing euro and pound sterling symbols, quotation 1.1164 + marks other than '"', dashes/hyphens, etc. 1.1165 + 1.1166 + 1.1167 + 1.1168 + 1.1169 + 1.1170 + 1.1171 + 1.1172 + 1.1173 + 1.1174 + 1.1175 + 1.1176 + 1.1177 + 1.1178 + 1.1179 + 1.1180 + 1.1181 +Newman, et al. Standards Track [Page 21] 1.1182 + 1.1183 +RFC 4790 Collation Registry March 2007 1.1184 + 1.1185 + 1.1186 +9.2.2. ASCII Casemap Collation Registration 1.1187 + 1.1188 + <?xml version='1.0'?> 1.1189 + <!DOCTYPE collation SYSTEM 'collationreg.dtd'> 1.1190 + <collation rfc="4790" scope="local" intendedUse="common"> 1.1191 + <identifier>i;ascii-casemap</identifier> 1.1192 + <title>ASCII Casemap</title> 1.1193 + <operations>equality order substring</operations> 1.1194 + <specification>RFC 4790</specification> 1.1195 + <owner>IETF</owner> 1.1196 + <submitter>chris.newman@sun.com</submitter> 1.1197 + </collation> 1.1198 + 1.1199 +9.3. Octet Collation 1.1200 + 1.1201 +9.3.1. Octet Collation Description 1.1202 + 1.1203 + The "i;octet" collation is a simple and fast collation intended for 1.1204 + use on binary octet strings rather than on character data. Protocols 1.1205 + that want to make this collation available have to do so by 1.1206 + explicitly allowing it. If not explicitly allowed, it MUST NOT be 1.1207 + used. It never returns an "undefined" result. It provides equality, 1.1208 + substring, and ordering operations. 1.1209 + 1.1210 + The ordering algorithm is as follows: 1.1211 + 1.1212 + 1. If both strings are the empty string, return the result "equal". 1.1213 + 1.1214 + 2. If the first string is empty and the second is not, return the 1.1215 + result "less". 1.1216 + 1.1217 + 3. If the second string is empty and the first is not, return the 1.1218 + result "greater". 1.1219 + 1.1220 + 4. If both strings begin with the same octet value, remove the first 1.1221 + octet from both strings and repeat this algorithm from step 1. 1.1222 + 1.1223 + 5. If the unsigned value (0 to 255) of the first octet of the first 1.1224 + string is less than the unsigned value of the first octet of the 1.1225 + second string, then return "less". 1.1226 + 1.1227 + 6. If this step is reached, return "greater". 1.1228 + 1.1229 + This algorithm is roughly equivalent to the C library function 1.1230 + memcmp, with appropriate length checks added. 1.1231 + 1.1232 + 1.1233 + 1.1234 + 1.1235 + 1.1236 + 1.1237 +Newman, et al. Standards Track [Page 22] 1.1238 + 1.1239 +RFC 4790 Collation Registry March 2007 1.1240 + 1.1241 + 1.1242 + The matching operation returns "match" if the sorting algorithm would 1.1243 + return "equal". Otherwise, the matching operation returns "no- 1.1244 + match". 1.1245 + 1.1246 + The substring operation returns "match" if the first string is the 1.1247 + empty string, or if there exists a substring of the second string of 1.1248 + length equal to the length of the first string, which would result in 1.1249 + a "match" result from the equality function. Otherwise, the 1.1250 + substring operation returns "no-match". 1.1251 + 1.1252 +9.3.2. Octet Collation Registration 1.1253 + 1.1254 + This collation is defined with intendedUse="limited" because it can 1.1255 + only be used by protocols that explicitly allow it. 1.1256 + 1.1257 + <?xml version='1.0'?> 1.1258 + <!DOCTYPE collation SYSTEM 'collationreg.dtd'> 1.1259 + <collation rfc="4790" scope="global" intendedUse="limited"> 1.1260 + <identifier>i;octet</identifier> 1.1261 + <title>Octet</title> 1.1262 + <operations>equality order substring</operations> 1.1263 + <specification>RFC 4790</specification> 1.1264 + <owner>IETF</owner> 1.1265 + <submitter>chris.newman@sun.com</submitter> 1.1266 + </collation> 1.1267 + 1.1268 +10. IANA Considerations 1.1269 + 1.1270 + Section 7 defines how to register collations with IANA. Section 9 1.1271 + defines a list of predefined collations that have been registered 1.1272 + with IANA. 1.1273 + 1.1274 +11. Security Considerations 1.1275 + 1.1276 + Collations will normally be used with UTF-8 strings. Thus, the 1.1277 + security considerations for UTF-8 [3], stringprep [6], and Unicode 1.1278 + TR-36 [8] also apply, and are normative to this specification. 1.1279 + 1.1280 +12. Acknowledgements 1.1281 + 1.1282 + The authors want to thank all who have contributed to this document, 1.1283 + including Brian Carpenter, John Cowan, Dave Cridland, Mark Davis, 1.1284 + Spencer Dawkins, Lisa Dusseault, Lars Eggert, Frank Ellermann, Philip 1.1285 + Guenther, Tony Hansen, Ted Hardie, Sam Hartman, Kjetil Torgrim Homme, 1.1286 + Michael Kay, John Klensin, Alexey Melnikov, Jim Melton, and Abhijit 1.1287 + Menon-Sen. 1.1288 + 1.1289 + 1.1290 + 1.1291 + 1.1292 + 1.1293 +Newman, et al. Standards Track [Page 23] 1.1294 + 1.1295 +RFC 4790 Collation Registry March 2007 1.1296 + 1.1297 + 1.1298 +13. References 1.1299 + 1.1300 +13.1. Normative References 1.1301 + 1.1302 + [1] Bradner, S., "Key words for use in RFCs to Indicate Requirement 1.1303 + Levels", BCP 14, RFC 2119, March 1997. 1.1304 + 1.1305 + [2] Crocker, D. and P. Overell, "Augmented BNF for Syntax 1.1306 + Specifications: ABNF", RFC 4234, October 2005. 1.1307 + 1.1308 + [3] Yergeau, F., "UTF-8, a transformation format of ISO 10646", 1.1309 + STD 63, RFC 3629, November 2003. 1.1310 + 1.1311 + [4] Berners-Lee, T., Fielding, R., and L. Masinter, "Uniform 1.1312 + Resource Identifier (URI): Generic Syntax", RFC 3986, 1.1313 + January 2005. 1.1314 + 1.1315 + [5] Phillips, A. and M. Davis, "Tags for Identifying Languages", 1.1316 + BCP 47, RFC 4646, September 2006. 1.1317 + 1.1318 + [6] Hoffman, P. and M. Blanchet, "Preparation of Internationalized 1.1319 + Strings ("stringprep")", RFC 3454, December 2002. 1.1320 + 1.1321 + [7] Davis, M. and K. Whistler, "Unicode Collation Algorithm version 1.1322 + 14", May 2005, 1.1323 + <http://www.unicode.org/reports/tr10/tr10-14.html>. 1.1324 + 1.1325 + [8] Davis, M. and M. Suignard, "Unicode Security Considerations", 1.1326 + February 2006, <http://www.unicode.org/reports/tr36/>. 1.1327 + 1.1328 +13.2. Informative References 1.1329 + 1.1330 + [9] Freed, N. and N. Borenstein, "Multipurpose Internet Mail 1.1331 + Extensions (MIME) Part One: Format of Internet Message Bodies", 1.1332 + RFC 2045, November 1996. 1.1333 + 1.1334 + [10] Melnikov, A., "Simple Authentication and Security Layer 1.1335 + (SASL)", RFC 4422, June 2006. 1.1336 + 1.1337 + [11] Newman, C. and J. Myers, "ACAP -- Application Configuration 1.1338 + Access Protocol", RFC 2244, November 1997. 1.1339 + 1.1340 + [12] Resnick, P., "Internet Message Format", RFC 2822, April 2001. 1.1341 + 1.1342 + [13] Freed, N. and J. Postel, "IANA Charset Registration 1.1343 + Procedures", BCP 19, RFC 2978, October 2000. 1.1344 + 1.1345 + 1.1346 + 1.1347 + 1.1348 + 1.1349 +Newman, et al. Standards Track [Page 24] 1.1350 + 1.1351 +RFC 4790 Collation Registry March 2007 1.1352 + 1.1353 + 1.1354 + [14] Showalter, T., "Sieve: A Mail Filtering Language", RFC 3028, 1.1355 + January 2001. 1.1356 + 1.1357 + [15] Crispin, M., "Internet Message Access Protocol - Version 1.1358 + 4rev1", RFC 3501, March 2003. 1.1359 + 1.1360 + [16] Crispin, M. and K. Murchison, "Internet Message Access Protocol 1.1361 + - Sort and Thread Extensions", Work in Progress, May 2004. 1.1362 + 1.1363 + [17] Newman, C. and A. Gulbrandsen, "Internet Message Access 1.1364 + Protocol Internationalization", Work in Progress, January 2006. 1.1365 + 1.1366 +Authors' Addresses 1.1367 + 1.1368 + Chris Newman 1.1369 + Sun Microsystems 1.1370 + 1050 Lakes Drive 1.1371 + West Covina, CA 91790 1.1372 + USA 1.1373 + 1.1374 + EMail: chris.newman@sun.com 1.1375 + 1.1376 + 1.1377 + Martin Duerst 1.1378 + Aoyama Gakuin University 1.1379 + 5-10-1 Fuchinobe 1.1380 + Sagamihara, Kanagawa 229-8558 1.1381 + Japan 1.1382 + 1.1383 + Phone: +81 42 759 6329 1.1384 + Fax: +81 42 759 6495 1.1385 + EMail: duerst@it.aoyama.ac.jp 1.1386 + URI: http://www.sw.it.aoyama.ac.jp/D%C3%BCrst/ 1.1387 + 1.1388 + Note: Please write "Duerst" with u-umlaut wherever possible, for 1.1389 + example as "Dürst" in XML and HTML. 1.1390 + 1.1391 + 1.1392 + Arnt Gulbrandsen 1.1393 + Oryx Mail Systems GmbH 1.1394 + Schweppermannstr. 8 1.1395 + 81671 Munich 1.1396 + Germany 1.1397 + 1.1398 + Fax: +49 89 4502 9758 1.1399 + EMail: arnt@oryx.com 1.1400 + URI: http://www.oryx.com/arnt/ 1.1401 + 1.1402 + 1.1403 + 1.1404 + 1.1405 +Newman, et al. Standards Track [Page 25] 1.1406 + 1.1407 +RFC 4790 Collation Registry March 2007 1.1408 + 1.1409 + 1.1410 +Full Copyright Statement 1.1411 + 1.1412 + Copyright (C) The IETF Trust (2007). 1.1413 + 1.1414 + This document is subject to the rights, licenses and restrictions 1.1415 + contained in BCP 78, and except as set forth therein, the authors 1.1416 + retain all their rights. 1.1417 + 1.1418 + This document and the information contained herein are provided on an 1.1419 + "AS IS" basis and THE CONTRIBUTOR, THE ORGANIZATION HE/SHE REPRESENTS 1.1420 + OR IS SPONSORED BY (IF ANY), THE INTERNET SOCIETY, THE IETF TRUST AND 1.1421 + THE INTERNET ENGINEERING TASK FORCE DISCLAIM ALL WARRANTIES, EXPRESS 1.1422 + OR IMPLIED, INCLUDING BUT NOT LIMITED TO ANY WARRANTY THAT THE USE OF 1.1423 + THE INFORMATION HEREIN WILL NOT INFRINGE ANY RIGHTS OR ANY IMPLIED 1.1424 + WARRANTIES OF MERCHANTABILITY OR FITNESS FOR A PARTICULAR PURPOSE. 1.1425 + 1.1426 +Intellectual Property 1.1427 + 1.1428 + The IETF takes no position regarding the validity or scope of any 1.1429 + Intellectual Property Rights or other rights that might be claimed to 1.1430 + pertain to the implementation or use of the technology described in 1.1431 + this document or the extent to which any license under such rights 1.1432 + might or might not be available; nor does it represent that it has 1.1433 + made any independent effort to identify any such rights. Information 1.1434 + on the procedures with respect to rights in RFC documents can be 1.1435 + found in BCP 78 and BCP 79. 1.1436 + 1.1437 + Copies of IPR disclosures made to the IETF Secretariat and any 1.1438 + assurances of licenses to be made available, or the result of an 1.1439 + attempt made to obtain a general license or permission for the use of 1.1440 + such proprietary rights by implementers or users of this 1.1441 + specification can be obtained from the IETF on-line IPR repository at 1.1442 + http://www.ietf.org/ipr. 1.1443 + 1.1444 + The IETF invites any interested party to bring to its attention any 1.1445 + copyrights, patents or patent applications, or other proprietary 1.1446 + rights that may cover technology that may be required to implement 1.1447 + this standard. Please address the information to the IETF at 1.1448 + ietf-ipr@ietf.org. 1.1449 + 1.1450 +Acknowledgement 1.1451 + 1.1452 + Funding for the RFC Editor function is currently provided by the 1.1453 + Internet Society. 1.1454 + 1.1455 + 1.1456 + 1.1457 + 1.1458 + 1.1459 + 1.1460 + 1.1461 +Newman, et al. Standards Track [Page 26] 1.1462 +