imapext-2007

diff docs/rfc/rfc4790.txt @ 0:ada5e610ab86

imap-2007e
author yuuji@gentei.org
date Mon, 14 Sep 2009 15:17:45 +0900
parents
children
line diff
     1.1 --- /dev/null	Thu Jan 01 00:00:00 1970 +0000
     1.2 +++ b/docs/rfc/rfc4790.txt	Mon Sep 14 15:17:45 2009 +0900
     1.3 @@ -0,0 +1,1459 @@
     1.4 +
     1.5 +
     1.6 +
     1.7 +
     1.8 +
     1.9 +
    1.10 +Network Working Group                                          C. Newman
    1.11 +Request for Comments: 4790                              Sun Microsystems
    1.12 +Category: Standards Track                                      M. Duerst
    1.13 +                                                Aoyama Gakuin University
    1.14 +                                                          A. Gulbrandsen
    1.15 +                                                                    Oryx
    1.16 +                                                              March 2007
    1.17 +
    1.18 +
    1.19 +            Internet Application Protocol Collation Registry
    1.20 +
    1.21 +Status of This Memo
    1.22 +
    1.23 +   This document specifies an Internet standards track protocol for the
    1.24 +   Internet community, and requests discussion and suggestions for
    1.25 +   improvements.  Please refer to the current edition of the "Internet
    1.26 +   Official Protocol Standards" (STD 1) for the standardization state
    1.27 +   and status of this protocol.  Distribution of this memo is unlimited.
    1.28 +
    1.29 +Copyright Notice
    1.30 +
    1.31 +   Copyright (C) The IETF Trust (2007).
    1.32 +
    1.33 +Abstract
    1.34 +
    1.35 +   Many Internet application protocols include string-based lookup,
    1.36 +   searching, or sorting operations.  However, the problem space for
    1.37 +   searching and sorting international strings is large, not fully
    1.38 +   explored, and is outside the area of expertise for the Internet
    1.39 +   Engineering Task Force (IETF).  Rather than attempt to solve such a
    1.40 +   large problem, this specification creates an abstraction framework so
    1.41 +   that application protocols can precisely identify a comparison
    1.42 +   function, and the repertoire of comparison functions can be extended
    1.43 +   in the future.
    1.44 +
    1.45 +
    1.46 +
    1.47 +
    1.48 +
    1.49 +
    1.50 +
    1.51 +
    1.52 +
    1.53 +
    1.54 +
    1.55 +
    1.56 +
    1.57 +
    1.58 +
    1.59 +
    1.60 +
    1.61 +Newman, et al.              Standards Track                     [Page 1]
    1.62 +
    1.63 +RFC 4790                   Collation Registry                 March 2007
    1.64 +
    1.65 +
    1.66 +Table of Contents
    1.67 +
    1.68 +   1.  Introduction . . . . . . . . . . . . . . . . . . . . . . . . .  4
    1.69 +     1.1.  Conventions Used in This Document  . . . . . . . . . . . .  4
    1.70 +   2.  Collation Definition and Purpose . . . . . . . . . . . . . . .  4
    1.71 +     2.1.  Definition . . . . . . . . . . . . . . . . . . . . . . . .  4
    1.72 +     2.2.  Purpose  . . . . . . . . . . . . . . . . . . . . . . . . .  4
    1.73 +     2.3.  Some Other Terms Used in this Document . . . . . . . . . .  5
    1.74 +     2.4.  Sort Keys  . . . . . . . . . . . . . . . . . . . . . . . .  5
    1.75 +   3.  Collation Identifier Syntax  . . . . . . . . . . . . . . . . .  6
    1.76 +     3.1.  Basic Syntax . . . . . . . . . . . . . . . . . . . . . . .  6
    1.77 +     3.2.  Wildcards  . . . . . . . . . . . . . . . . . . . . . . . .  6
    1.78 +     3.3.  Ordering Direction . . . . . . . . . . . . . . . . . . . .  7
    1.79 +     3.4.  URIs . . . . . . . . . . . . . . . . . . . . . . . . . . .  7
    1.80 +     3.5.  Naming Guidelines  . . . . . . . . . . . . . . . . . . . .  7
    1.81 +   4.  Collation Specification Requirements . . . . . . . . . . . . .  8
    1.82 +     4.1.  Collation/Server Interface . . . . . . . . . . . . . . . .  8
    1.83 +     4.2.  Operations Supported . . . . . . . . . . . . . . . . . . .  8
    1.84 +       4.2.1.  Validity . . . . . . . . . . . . . . . . . . . . . . .  9
    1.85 +       4.2.2.  Equality . . . . . . . . . . . . . . . . . . . . . . .  9
    1.86 +       4.2.3.  Substring  . . . . . . . . . . . . . . . . . . . . . .  9
    1.87 +       4.2.4.  Ordering . . . . . . . . . . . . . . . . . . . . . . . 10
    1.88 +     4.3.  Sort Keys  . . . . . . . . . . . . . . . . . . . . . . . . 10
    1.89 +     4.4.  Use of Lookup Tables . . . . . . . . . . . . . . . . . . . 11
    1.90 +   5.  Application Protocol Requirements  . . . . . . . . . . . . . . 11
    1.91 +     5.1.  Character Encoding . . . . . . . . . . . . . . . . . . . . 11
    1.92 +     5.2.  Operations . . . . . . . . . . . . . . . . . . . . . . . . 11
    1.93 +     5.3.  Wildcards  . . . . . . . . . . . . . . . . . . . . . . . . 12
    1.94 +     5.4.  String Comparison  . . . . . . . . . . . . . . . . . . . . 12
    1.95 +     5.5.  Disconnected Clients . . . . . . . . . . . . . . . . . . . 12
    1.96 +     5.6.  Error Codes  . . . . . . . . . . . . . . . . . . . . . . . 13
    1.97 +     5.7.  Octet Collation  . . . . . . . . . . . . . . . . . . . . . 13
    1.98 +   6.  Use by Existing Protocols  . . . . . . . . . . . . . . . . . . 13
    1.99 +   7.  Collation Registration . . . . . . . . . . . . . . . . . . . . 14
   1.100 +     7.1.  Collation Registration Procedure . . . . . . . . . . . . . 14
   1.101 +     7.2.  Collation Registration Format  . . . . . . . . . . . . . . 15
   1.102 +       7.2.1.  Registration Template  . . . . . . . . . . . . . . . . 15
   1.103 +       7.2.2.  The Collation Element  . . . . . . . . . . . . . . . . 15
   1.104 +       7.2.3.  The Identifier Element . . . . . . . . . . . . . . . . 16
   1.105 +       7.2.4.  The Title Element  . . . . . . . . . . . . . . . . . . 16
   1.106 +       7.2.5.  The Operations Element . . . . . . . . . . . . . . . . 16
   1.107 +       7.2.6.  The Specification Element  . . . . . . . . . . . . . . 16
   1.108 +       7.2.7.  The Submitter Element  . . . . . . . . . . . . . . . . 16
   1.109 +       7.2.8.  The Owner Element  . . . . . . . . . . . . . . . . . . 16
   1.110 +       7.2.9.  The Version Element  . . . . . . . . . . . . . . . . . 17
   1.111 +       7.2.10. The Variable Element . . . . . . . . . . . . . . . . . 17
   1.112 +     7.3.  Structure of Collation Registry  . . . . . . . . . . . . . 17
   1.113 +     7.4.  Example Initial Registry Summary . . . . . . . . . . . . . 18
   1.114 +
   1.115 +
   1.116 +
   1.117 +Newman, et al.              Standards Track                     [Page 2]
   1.118 +
   1.119 +RFC 4790                   Collation Registry                 March 2007
   1.120 +
   1.121 +
   1.122 +   8.  Guidelines for Expert Reviewer . . . . . . . . . . . . . . . . 18
   1.123 +   9.  Initial Collations . . . . . . . . . . . . . . . . . . . . . . 19
   1.124 +     9.1.  ASCII Numeric Collation  . . . . . . . . . . . . . . . . . 20
   1.125 +       9.1.1.  ASCII Numeric Collation Description  . . . . . . . . . 20
   1.126 +       9.1.2.  ASCII Numeric Collation Registration . . . . . . . . . 20
   1.127 +     9.2.  ASCII Casemap Collation  . . . . . . . . . . . . . . . . . 21
   1.128 +       9.2.1.  ASCII Casemap Collation Description  . . . . . . . . . 21
   1.129 +       9.2.2.  ASCII Casemap Collation Registration . . . . . . . . . 22
   1.130 +     9.3.  Octet Collation  . . . . . . . . . . . . . . . . . . . . . 22
   1.131 +       9.3.1.  Octet Collation Description  . . . . . . . . . . . . . 22
   1.132 +       9.3.2.  Octet Collation Registration . . . . . . . . . . . . . 23
   1.133 +   10. IANA Considerations  . . . . . . . . . . . . . . . . . . . . . 23
   1.134 +   11. Security Considerations  . . . . . . . . . . . . . . . . . . . 23
   1.135 +   12. Acknowledgements . . . . . . . . . . . . . . . . . . . . . . . 23
   1.136 +   13. References . . . . . . . . . . . . . . . . . . . . . . . . . . 24
   1.137 +     13.1. Normative References . . . . . . . . . . . . . . . . . . . 24
   1.138 +     13.2. Informative References . . . . . . . . . . . . . . . . . . 24
   1.139 +
   1.140 +
   1.141 +
   1.142 +
   1.143 +
   1.144 +
   1.145 +
   1.146 +
   1.147 +
   1.148 +
   1.149 +
   1.150 +
   1.151 +
   1.152 +
   1.153 +
   1.154 +
   1.155 +
   1.156 +
   1.157 +
   1.158 +
   1.159 +
   1.160 +
   1.161 +
   1.162 +
   1.163 +
   1.164 +
   1.165 +
   1.166 +
   1.167 +
   1.168 +
   1.169 +
   1.170 +
   1.171 +
   1.172 +
   1.173 +Newman, et al.              Standards Track                     [Page 3]
   1.174 +
   1.175 +RFC 4790                   Collation Registry                 March 2007
   1.176 +
   1.177 +
   1.178 +1.  Introduction
   1.179 +
   1.180 +   The Application Configuration Access Protocol ACAP [11] specification
   1.181 +   introduced the concept of a comparator (which we call collation in
   1.182 +   this document), but failed to create an IANA registry.  With the
   1.183 +   introduction of stringprep [6] and the Unicode Collation Algorithm
   1.184 +   [7], it is now time to create that registry and populate it with some
   1.185 +   initial values appropriate for an international community.  This
   1.186 +   specification replaces and generalizes the definition of a comparator
   1.187 +   in ACAP, and creates a collation registry.
   1.188 +
   1.189 +1.1.  Conventions Used in This Document
   1.190 +
   1.191 +   The key words "MUST", "MUST NOT", "SHOULD", "SHOULD NOT", and "MAY"
   1.192 +   in this document are to be interpreted as defined in "Key words for
   1.193 +   use in RFCs to Indicate Requirement Levels" [1].
   1.194 +
   1.195 +   The attribute syntax specifications use the Augmented Backus-Naur
   1.196 +   Form (ABNF) [2] notation, including the core rules defined in
   1.197 +   Appendix A.  The ABNF production "Language-tag" is imported from
   1.198 +   Language Tags [5] and "reg-name" from URI: Generic Syntax [4].
   1.199 +
   1.200 +2.  Collation Definition and Purpose
   1.201 +
   1.202 +2.1.  Definition
   1.203 +
   1.204 +   A collation is a named function which takes two arbitrary length
   1.205 +   strings as input and can be used to perform one or more of three
   1.206 +   basic comparison operations: equality test, substring match, and
   1.207 +   ordering test.
   1.208 +
   1.209 +2.2.  Purpose
   1.210 +
   1.211 +   Collations are an abstraction for comparison functions so that these
   1.212 +   comparison functions can be used in multiple protocols.  The details
   1.213 +   of a particular comparison operation can be specified by someone with
   1.214 +   appropriate expertise, independent of the application protocols that
   1.215 +   use that collation.  This is similar to the way a charset [13]
   1.216 +   separates the details of octet to character mapping from a protocol
   1.217 +   specification, such as MIME [9], or the way SASL [10] separates the
   1.218 +   details of an authentication mechanism from a protocol specification,
   1.219 +   such as ACAP [11].
   1.220 +
   1.221 +
   1.222 +
   1.223 +
   1.224 +
   1.225 +
   1.226 +
   1.227 +
   1.228 +
   1.229 +Newman, et al.              Standards Track                     [Page 4]
   1.230 +
   1.231 +RFC 4790                   Collation Registry                 March 2007
   1.232 +
   1.233 +
   1.234 +   Here is a small diagram to help illustrate the value of this
   1.235 +   abstraction:
   1.236 +
   1.237 +   +-------------------+                         +-----------------+
   1.238 +   | IMAP i18n SEARCH  |--+                      | Basic           |
   1.239 +   +-------------------+  |                   +--| Collation Spec  |
   1.240 +                          |                   |  +-----------------+
   1.241 +   +-------------------+  |  +-------------+  |  +-----------------+
   1.242 +   | ACAP i18n SEARCH  |--+--| Collation   |--+--| A stringprep    |
   1.243 +   +-------------------+  |  | Registry    |  |  | Collation Spec  |
   1.244 +                          |  +-------------+  |  +-----------------+
   1.245 +   +-------------------+  |                   |  +-----------------+
   1.246 +   | ...other protocol |--+                   |  | locale-specific |
   1.247 +   +-------------------+                      +--| Collation Spec  |
   1.248 +                                                 +-----------------+
   1.249 +
   1.250 +   Thus IMAP, ACAP, and future application protocols with international
   1.251 +   search capability simply specify how to interface to the collation
   1.252 +   registry instead of each protocol specification having to specify all
   1.253 +   the collations it supports.
   1.254 +
   1.255 +2.3.  Some Other Terms Used in this Document
   1.256 +
   1.257 +   The terms client, server, and protocol are used in somewhat unusual
   1.258 +   senses.
   1.259 +
   1.260 +   Client means a user, or a program acting directly on behalf of a
   1.261 +   user.  This may be a mail reader acting as an IMAP client, or it may
   1.262 +   be an interactive shell, where the user can type protocol commands/
   1.263 +   requests directly, or it may be a script or program written by the
   1.264 +   user.
   1.265 +
   1.266 +   Server means a program that performs services requested by the
   1.267 +   client.  This may be a traditional server such as an HTTP server, or
   1.268 +   it may be a Sieve [14] interpreter running a Sieve script written by
   1.269 +   a user.  A server needs to use the operations provided by collations
   1.270 +   in order to fulfill the client's requests.
   1.271 +
   1.272 +   The protocol describes how the client tells the server what it wants
   1.273 +   done, and (if applicable) how the server tells the client about the
   1.274 +   results.  IMAP is a protocol by this definition, and so is the Sieve
   1.275 +   language.
   1.276 +
   1.277 +2.4.  Sort Keys
   1.278 +
   1.279 +   One component of a collation is a transformation, which turns a
   1.280 +   string into a sort key, which is then used while sorting.
   1.281 +
   1.282 +
   1.283 +
   1.284 +
   1.285 +Newman, et al.              Standards Track                     [Page 5]
   1.286 +
   1.287 +RFC 4790                   Collation Registry                 March 2007
   1.288 +
   1.289 +
   1.290 +   The transformation can range from an identity mapping (e.g., the
   1.291 +   i;octet collation Section 9.3) to a mapping that makes the string
   1.292 +   unreadable to a human.
   1.293 +
   1.294 +   This is an implementation detail of collations or servers.  A
   1.295 +   protocol SHOULD NOT expose it to clients, since some collations leave
   1.296 +   the sort key's format up to the implementation, and current
   1.297 +   conformant implementations are known to use different formats.
   1.298 +
   1.299 +3.  Collation Identifier Syntax
   1.300 +
   1.301 +3.1.  Basic Syntax
   1.302 +
   1.303 +   The collation identifier itself is a single US-ASCII string.  The
   1.304 +   identifier MUST NOT be longer than 254 characters, and obeys the
   1.305 +   following grammar:
   1.306 +
   1.307 +     collation-char  = ALPHA / DIGIT / "-" / ";" / "=" / "."
   1.308 +
   1.309 +     collation-id    = collation-prefix ";" collation-core-name
   1.310 +                       *collation-arg
   1.311 +
   1.312 +     collation-scope = Language-tag / "vnd-" reg-name
   1.313 +
   1.314 +     collation-core-name = ALPHA *( ALPHA / DIGIT / "-" )
   1.315 +
   1.316 +     collation-arg   = ";" ALPHA *( ALPHA / DIGIT ) "="
   1.317 +                       1*( ALPHA / DIGIT / "." )
   1.318 +
   1.319 +
   1.320 +   Note: the ABNF production "Language-tag" is imported from Language
   1.321 +   Tags [5] and "reg-name" from URI: Generic Syntax [4].
   1.322 +
   1.323 +   There is a special identifier called "default".  For protocols that
   1.324 +   have a default collation, "default" refers to that collation.  For
   1.325 +   other protocols, the identifier "default" MUST match no collations,
   1.326 +   and servers SHOULD treat it in the same way as they treat nonexistent
   1.327 +   collations.
   1.328 +
   1.329 +3.2.  Wildcards
   1.330 +
   1.331 +   The string a client uses to select a collation MAY contain one or
   1.332 +   more wildcard ("*") characters that match zero or more collation-
   1.333 +   chars.  Wildcard characters MUST NOT be adjacent.  If the wildcard
   1.334 +   string matches multiple collations, the server SHOULD attempt to
   1.335 +   select a widely useful collation in preference to a narrowly useful
   1.336 +   one.
   1.337 +
   1.338 +
   1.339 +
   1.340 +
   1.341 +Newman, et al.              Standards Track                     [Page 6]
   1.342 +
   1.343 +RFC 4790                   Collation Registry                 March 2007
   1.344 +
   1.345 +
   1.346 +     collation-wild  =  ("*" / (ALPHA ["*"])) *(collation-char ["*"])
   1.347 +                         ; MUST NOT exceed 254 characters total
   1.348 +
   1.349 +3.3.  Ordering Direction
   1.350 +
   1.351 +   When used as a protocol element for ordering, the collation
   1.352 +   identifier MAY be prefixed by either "+" or "-" to explicitly specify
   1.353 +   an ordering direction. "+" has no effect on the ordering operation,
   1.354 +   while "-" inverts the result of the ordering operation.  In general,
   1.355 +   collation-order is used when a client requests a collation, and
   1.356 +   collation-selected is used when the server informs the client of the
   1.357 +   selected collation.
   1.358 +
   1.359 +     collation-selected =  ["+" / "-"] collation-id
   1.360 +
   1.361 +     collation-order =  ["+" / "-"] collation-wild
   1.362 +
   1.363 +3.4.  URIs
   1.364 +
   1.365 +   Some protocols are designed to use URIs [4] to refer to collations
   1.366 +   rather than simple tokens.  A special section of the IANA URL space
   1.367 +   is reserved for such usage.  The "collation-uri" form is used to
   1.368 +   refer to a specific named collation (the collation registration may
   1.369 +   not actually be present).  The "collation-auri" form is an abstract
   1.370 +   name for an ordering, a collation pattern or a vendor private
   1.371 +   collator.
   1.372 +
   1.373 +     collation-uri   =  "http://www.iana.org/assignments/collation/"
   1.374 +                        collation-id ".xml"
   1.375 +
   1.376 +     collation-auri  =  ( "http://www.iana.org/assignments/collation/"
   1.377 +                        collation-order ".xml" ) / other-uri
   1.378 +
   1.379 +     other-uri       =  <absoluteURI>
   1.380 +                     ;  excluding the IANA collation namespace.
   1.381 +
   1.382 +3.5.  Naming Guidelines
   1.383 +
   1.384 +   While this specification makes no absolute requirements on the
   1.385 +   structure of collation identifiers, naming consistency is important,
   1.386 +   so the following initial guidelines are provided.
   1.387 +
   1.388 +   Collation identifiers with an international audience typically begin
   1.389 +   with "i;".  Collation identifiers intended for a particular language
   1.390 +   or locale typically begin with a language tag [5] followed by a ";".
   1.391 +   After the first ";" is normally the name of the general collation
   1.392 +   algorithm, followed by a series of algorithm modifications separated
   1.393 +   by the ";" delimiter.  Parameterized modifications will use "=" to
   1.394 +
   1.395 +
   1.396 +
   1.397 +Newman, et al.              Standards Track                     [Page 7]
   1.398 +
   1.399 +RFC 4790                   Collation Registry                 March 2007
   1.400 +
   1.401 +
   1.402 +   delimit the parameter from the value.  The version numbers of any
   1.403 +   lookup tables used by the algorithm SHOULD be present as
   1.404 +   parameterized modifications.
   1.405 +
   1.406 +   Collation identifiers of the form *;vnd-hostname;* are reserved for
   1.407 +   vendor-specific collations created by the owner of the hostname
   1.408 +   following the "vnd-" prefix (e.g., vnd-example.com for the vendor
   1.409 +   example.com).  Registration of such collations (or the name space as
   1.410 +   a whole), with intended use of the "Vendor", is encouraged when a
   1.411 +   public specification or open-source implementation is available, but
   1.412 +   is not required.
   1.413 +
   1.414 +4.  Collation Specification Requirements
   1.415 +
   1.416 +4.1.  Collation/Server Interface
   1.417 +
   1.418 +   The collation itself defines what it operates on.  Most collations
   1.419 +   are expected to operate on character strings.  The i;octet
   1.420 +   (Section 9.3) collation operates on octet strings.  The i;ascii-
   1.421 +   numeric (Section 9.1) operation operates on numbers.
   1.422 +
   1.423 +   This specification defines the collation interface in terms of octet
   1.424 +   strings.  However, implementations may choose to use character
   1.425 +   strings instead.  Such implementations may not be able to implement
   1.426 +   e.g., i;octet.  Since i;octet is not currently mandatory to implement
   1.427 +   for any protocol, this should not be a problem.
   1.428 +
   1.429 +4.2.  Operations Supported
   1.430 +
   1.431 +   A collation specification MUST state which of the three basic
   1.432 +   operations are supported (equality, substring, ordering) and how to
   1.433 +   perform each of the supported operations on any two input character
   1.434 +   strings, including empty strings.  Collations must be deterministic,
   1.435 +   i.e., given a collation with a specific identifier, and any two fixed
   1.436 +   input strings, the result MUST be the same for the same operation.
   1.437 +
   1.438 +   In general, collation operations should behave as their names
   1.439 +   suggest.  While a collation may be new, the operations are not, so
   1.440 +   the new collation's operations should be similar to those of older
   1.441 +   collations.  For example, a date/time collation should not provide a
   1.442 +   "substring" operation that would morph IMAP substring SEARCH into
   1.443 +   e.g., a date-range search.
   1.444 +
   1.445 +   A non-obvious consequence of the rules for each collation operation
   1.446 +   is that, for any single collation, either none or all of the
   1.447 +   operations can return "undefined".  For example, it is not possible
   1.448 +   to have an equality operation that never returns "undefined", and a
   1.449 +   substring operation that occasionally does.
   1.450 +
   1.451 +
   1.452 +
   1.453 +Newman, et al.              Standards Track                     [Page 8]
   1.454 +
   1.455 +RFC 4790                   Collation Registry                 March 2007
   1.456 +
   1.457 +
   1.458 +4.2.1.  Validity
   1.459 +
   1.460 +   The validity test takes one string as argument.  It returns valid if
   1.461 +   its input string is a valid input to the collation's other
   1.462 +   operations, and invalid if not.  (In other words, a string is valid
   1.463 +   if it is equal to itself according to the collation's equality
   1.464 +   operation.)
   1.465 +
   1.466 +   The validity test is provided by all collations.  It MUST NOT be
   1.467 +   listed separately in the collation registration.
   1.468 +
   1.469 +4.2.2.  Equality
   1.470 +
   1.471 +   The equality test always returns "match" or "no-match" when it is
   1.472 +   supplied valid input, and MAY return "undefined" if one or both input
   1.473 +   strings are not valid.
   1.474 +
   1.475 +   The equality test MUST be reflexive and symmetric.  For valid input,
   1.476 +   it MUST be transitive.
   1.477 +
   1.478 +   If a collation provides either a substring or an ordering test, it
   1.479 +   MUST also provide an equality test.  The substring and/or ordering
   1.480 +   tests MUST be consistent with the equality test.
   1.481 +
   1.482 +   The return values of the equality test are called "match", "no-match"
   1.483 +   and "undefined" in this document.
   1.484 +
   1.485 +4.2.3.  Substring
   1.486 +
   1.487 +   The substring matching operation determines if the first string is a
   1.488 +   substring of the second string, i.e., if one or more substrings of
   1.489 +   the second string is equal to the first, as defined by the
   1.490 +   collation's equality operation.
   1.491 +
   1.492 +   A collation that supports substring matching will automatically
   1.493 +   support two special cases of substring matching: prefix and suffix
   1.494 +   matching, if those special cases are supported by the application
   1.495 +   protocol.  It returns "match" or "no-match" when it is supplied valid
   1.496 +   input and returns "undefined" when supplied invalid input.
   1.497 +
   1.498 +   Application protocols MAY return position information for substring
   1.499 +   matches.  If this is done, the position information SHOULD include
   1.500 +   both the starting offset and the ending offset for each match.  This
   1.501 +   is important because more sophisticated collations can match strings
   1.502 +   of unequal length (for example, a pre-composed accented character can
   1.503 +   match a decomposed accented character).  In general, overlapping
   1.504 +   matches SHOULD be reported (as when "ana" occurs twice within
   1.505 +   "banana"), although there are cases where a collation may decide not
   1.506 +
   1.507 +
   1.508 +
   1.509 +Newman, et al.              Standards Track                     [Page 9]
   1.510 +
   1.511 +RFC 4790                   Collation Registry                 March 2007
   1.512 +
   1.513 +
   1.514 +   to.  For example, in a collation which treats all whitespace
   1.515 +   sequences as identical, the substring operation could be defined such
   1.516 +   that " 1 " (SP "1" SP) is reported just once within "  1  " (SP SP
   1.517 +   "1" SP SP), not four times (SP SP "1" SP, SP "1" SP, SP "1" SP SP and
   1.518 +   SP SP "1" SP SP), since the four matches are, in a sense, the same
   1.519 +   match.
   1.520 +
   1.521 +   A string is a substring of itself.  The empty string is a substring
   1.522 +   of all strings.
   1.523 +
   1.524 +   Note that the substring operation of some collations can match
   1.525 +   strings of unequal length.  For example, a pre-composed accented
   1.526 +   character can match a decomposed accented character.  The Unicode
   1.527 +   Collation Algorithm [7] discusses this in more detail.
   1.528 +
   1.529 +   The return values of the substring operation are called "match", "no-
   1.530 +   match", and "undefined" in this document.
   1.531 +
   1.532 +4.2.4.  Ordering
   1.533 +
   1.534 +   The ordering operation determines how two strings are ordered.  It
   1.535 +   MUST be reflexive.  For valid input, it MUST be transitive and
   1.536 +   trichotomous.
   1.537 +
   1.538 +   Ordering returns "less" if the first string is listed before the
   1.539 +   second string, according to the collation; "greater", if the second
   1.540 +   string is listed before the first string; and "equal", if the two
   1.541 +   strings are equal, as defined by the collation's equality operation.
   1.542 +   If one or both strings are invalid, the result of ordering is
   1.543 +   "undefined".
   1.544 +
   1.545 +   When the collation is used with a "+" prefix, the behavior is the
   1.546 +   same as when used with no prefix.  When the collation is used with a
   1.547 +   "-" prefix, the result of the ordering operation of the collation
   1.548 +   MUST be reversed.
   1.549 +
   1.550 +   The return values of the ordering operation are called "less",
   1.551 +   "equal", "greater", and "undefined" in this document.
   1.552 +
   1.553 +4.3.  Sort Keys
   1.554 +
   1.555 +   A collation specification SHOULD describe the internal transformation
   1.556 +   algorithm to generate sort keys.  This algorithm can be applied to
   1.557 +   individual strings, and the result can be stored to potentially
   1.558 +   optimize future comparison operations.  A collation MAY specify that
   1.559 +   the sort key is generated by the identity function.  The sort key may
   1.560 +   have no meaning to a human.  The sort key may not be valid input to
   1.561 +   the collation.
   1.562 +
   1.563 +
   1.564 +
   1.565 +Newman, et al.              Standards Track                    [Page 10]
   1.566 +
   1.567 +RFC 4790                   Collation Registry                 March 2007
   1.568 +
   1.569 +
   1.570 +4.4.  Use of Lookup Tables
   1.571 +
   1.572 +   Some collations use customizable lookup tables, e.g., because the
   1.573 +   tables depend on locale, and may be modified after shipping the
   1.574 +   software.  Collations that use more than one customizable lookup
   1.575 +   table in a documented format MUST assign numbers to the tables they
   1.576 +   use.  This permits an application protocol command to access the
   1.577 +   tables used by a server collation, so that clients and servers use
   1.578 +   the same tables.
   1.579 +
   1.580 +5.  Application Protocol Requirements
   1.581 +
   1.582 +   This section describes the requirements and issues that an
   1.583 +   application protocol needs to consider if it offers searching,
   1.584 +   substring matching and/or sorting, and permits the use of characters
   1.585 +   outside the US-ASCII charset.
   1.586 +
   1.587 +5.1.  Character Encoding
   1.588 +
   1.589 +   The protocol specification has to make sure that it is clear on which
   1.590 +   characters (rather than just octets) the collations are used.  This
   1.591 +   can be done by specifying the protocol itself in terms of characters
   1.592 +   (e.g., in the case of a query language), by specifying a single
   1.593 +   character encoding for the protocol (e.g., UTF-8 [3]), or by
   1.594 +   carefully describing the relevant issues of character encoding
   1.595 +   labeling and conversion.  In the later case, details to consider
   1.596 +   include how to handle unknown charsets, any charsets that are
   1.597 +   mandatory-to-implement, any issues with byte-order that might apply,
   1.598 +   and any transfer encodings that need to be supported.
   1.599 +
   1.600 +5.2.  Operations
   1.601 +
   1.602 +   The protocol must specify which of the operations defined in this
   1.603 +   specification (equality matching, substring matching, and ordering)
   1.604 +   can be invoked in the protocol, and how they are invoked.  There may
   1.605 +   be more than one way to invoke an operation.
   1.606 +
   1.607 +   The protocol MUST provide a mechanism for the client to select the
   1.608 +   collation to use with equality matching, substring matching, and
   1.609 +   ordering.
   1.610 +
   1.611 +   If a protocol needs a total ordering and the collation chosen does
   1.612 +   not provide it because the ordering operation returns "undefined" at
   1.613 +   least once, the recommended fallback is to sort all invalid strings
   1.614 +   after the valid ones, and use i;octet to order the invalid strings.
   1.615 +
   1.616 +   Although the collation's substring function provides a list of
   1.617 +   matches, a protocol need not provide all that to the client.  It may
   1.618 +
   1.619 +
   1.620 +
   1.621 +Newman, et al.              Standards Track                    [Page 11]
   1.622 +
   1.623 +RFC 4790                   Collation Registry                 March 2007
   1.624 +
   1.625 +
   1.626 +   provide only the first matching substring, or even just the
   1.627 +   information that the substring search matched.  In this way,
   1.628 +   collations can be used with protocols that are defined such that "x
   1.629 +   is a substring of y" returns true-false.
   1.630 +
   1.631 +   If the protocol provides positional information for the results of a
   1.632 +   substring match, that positional information SHOULD fully specify the
   1.633 +   substring(s) in the result that matches, independent of the length of
   1.634 +   the search string.  For example, returning both the starting and
   1.635 +   ending offset of the match would suffice, as would the starting
   1.636 +   offset and a length.  Returning just the starting offset is not
   1.637 +   acceptable.  This rule is necessary because advanced collations can
   1.638 +   treat strings of different lengths as equal (for example, pre-
   1.639 +   composed and decomposed accented characters).
   1.640 +
   1.641 +5.3.  Wildcards
   1.642 +
   1.643 +   The protocol MUST specify whether it allows the use of wildcards in
   1.644 +   collation identifiers.  If the protocol allows wildcards, then:
   1.645 +      The protocol MUST specify how comparisons behave in the absence of
   1.646 +      explicit collation negotiation, or when a collation of "default"
   1.647 +      is requested.  The protocol MAY specify that the default collation
   1.648 +      used in such circumstances is sensitive to server configuration.
   1.649 +
   1.650 +      The protocol SHOULD provide a way to list available collations
   1.651 +      matching a given wildcard pattern, or patterns.
   1.652 +
   1.653 +5.4.  String Comparison
   1.654 +
   1.655 +   If a protocol compares strings in any nontrivial way, using a
   1.656 +   collation may be appropriate.  As an example, many protocols use
   1.657 +   case-independent strings.  In many cases, a simple ASCII mapping to
   1.658 +   upper/lower case works well.  In other cases, it may be better to use
   1.659 +   a specifiable collation; for example, so that a server can treat "i"
   1.660 +   and "I" as equivalent in Italy, and different in Turkey (Turkish also
   1.661 +   has a dotted upper-case" I" and a dotless lower-case "i").
   1.662 +
   1.663 +   Protocol designers should consider, in each case, whether to use a
   1.664 +   specifiable collation.  Keywords often have other needs than user
   1.665 +   variables, and search arguments may be different again.
   1.666 +
   1.667 +5.5.  Disconnected Clients
   1.668 +
   1.669 +   If the protocol supports disconnected clients, and a collation is
   1.670 +   used that can use configurable tables (e.g., to support
   1.671 +   locale-specific extensions), then the client may not be able to
   1.672 +   reproduce the server's collation operations while offline.
   1.673 +
   1.674 +
   1.675 +
   1.676 +
   1.677 +Newman, et al.              Standards Track                    [Page 12]
   1.678 +
   1.679 +RFC 4790                   Collation Registry                 March 2007
   1.680 +
   1.681 +
   1.682 +   A mechanism to download such tables has been discussed.  Such a
   1.683 +   mechanism is not included in the present specification, since the
   1.684 +   problem is not yet well understood.
   1.685 +
   1.686 +5.6.  Error Codes
   1.687 +
   1.688 +   The protocol specification should consider assigning protocol error
   1.689 +   codes for the following circumstances:
   1.690 +
   1.691 +   o  The client requests the use of a collation by identifier or
   1.692 +      pattern, but no implemented collation matches that pattern.
   1.693 +
   1.694 +   o  The client attempts to use a collation for an operation that is
   1.695 +      not supported by that collation -- for example, attempting to use
   1.696 +      the "i;ascii-numeric" collation for substring matching.
   1.697 +
   1.698 +   o  The client uses an equality or substring matching collation, and
   1.699 +      the result is an error.  It may be appropriate to distinguish
   1.700 +      between the two input strings, particularly when one is supplied
   1.701 +      by the client and the other is stored by the server.  It might
   1.702 +      also be appropriate to distinguish the specific case of an invalid
   1.703 +      UTF-8 string.
   1.704 +
   1.705 +5.7.  Octet Collation
   1.706 +
   1.707 +   The i;octet (Section 9.3) collation is only usable with protocols
   1.708 +   based on octet-strings.  Clients and servers MUST NOT use i;octet
   1.709 +   with other protocols.
   1.710 +
   1.711 +   If the protocol permits the use of collations with data structures
   1.712 +   other than strings, the protocol MUST describe the default behavior
   1.713 +   for a collation with those data structures.
   1.714 +
   1.715 +6.  Use by Existing Protocols
   1.716 +
   1.717 +   This section is informative.
   1.718 +
   1.719 +   Both ACAP [11] and Sieve [14] are standards track specifications that
   1.720 +   used collations prior to the creation of this specification and
   1.721 +   registry.  Those standards do not meet all the application protocol
   1.722 +   requirements described in Section 5.
   1.723 +
   1.724 +   These protocols allow the use of the i;octet (Section 9.3) collation
   1.725 +   working directly on UTF-8 data, as used in these protocols.
   1.726 +
   1.727 +
   1.728 +
   1.729 +
   1.730 +
   1.731 +
   1.732 +
   1.733 +Newman, et al.              Standards Track                    [Page 13]
   1.734 +
   1.735 +RFC 4790                   Collation Registry                 March 2007
   1.736 +
   1.737 +
   1.738 +   In Sieve, all matches are either true or false.  Accordingly, Sieve
   1.739 +   servers must treat "undefined" and "no-match" results of the equality
   1.740 +   and substring operations as false, and only "match" as true.
   1.741 +
   1.742 +   In ACAP and Sieve, there are no invalid strings.  In this document's
   1.743 +   terms, invalid strings sort after valid strings.
   1.744 +
   1.745 +   IMAP [15] also collates, although that is explicit only when the
   1.746 +   COMPARATOR [17] extension is used.  The built-in IMAP substring
   1.747 +   operation and the ordering provided by the SORT [16] extension may
   1.748 +   not meet the requirements made in this document.
   1.749 +
   1.750 +   Other protocols may be in a similar position.
   1.751 +
   1.752 +   In IMAP, the default collation is i;ascii-casemap, because its
   1.753 +   operations are understood to match IMAP's built-in operations.
   1.754 +
   1.755 +7.  Collation Registration
   1.756 +
   1.757 +7.1.  Collation Registration Procedure
   1.758 +
   1.759 +   The IETF will create a mailing list, collation@ietf.org, which can be
   1.760 +   used for public discussion of collation proposals prior to
   1.761 +   registration.  Use of the mailing list is strongly encouraged.  The
   1.762 +   IESG will appoint a designated expert who will monitor the
   1.763 +   collation@ietf.org mailing list and review registrations.
   1.764 +
   1.765 +   The registration procedure begins when a completed registration
   1.766 +   template is sent to iana@iana.org and collation@ietf.org.  The
   1.767 +   designated expert is expected to tell IANA and the submitter of the
   1.768 +   registration within two weeks whether the registration is approved,
   1.769 +   approved with minor changes, or rejected with cause.  When a
   1.770 +   registration is rejected with cause, it can be re-submitted if the
   1.771 +   concerns listed in the cause are addressed.  Decisions made by the
   1.772 +   designated expert can be appealed to the IESG Applications Area
   1.773 +   Director, then to the IESG.  They follow the normal appeals procedure
   1.774 +   for IESG decisions.
   1.775 +
   1.776 +   Collation registrations in a standards track, BCP, or IESG-approved
   1.777 +   experimental RFC are owned by the IETF, and changes to the
   1.778 +   registration follow normal procedures for updating such documents.
   1.779 +   Collation registrations in other RFCs are owned by the RFC author(s).
   1.780 +   Other collation registrations are owned by the individual(s) listed
   1.781 +   in the contact field of the registration, and IANA will preserve this
   1.782 +   information.
   1.783 +
   1.784 +   If the registration is a change of an existing collation, it MUST be
   1.785 +   approved by the owner.  In the event the owner cannot be contacted
   1.786 +
   1.787 +
   1.788 +
   1.789 +Newman, et al.              Standards Track                    [Page 14]
   1.790 +
   1.791 +RFC 4790                   Collation Registry                 March 2007
   1.792 +
   1.793 +
   1.794 +   for a period of one month, and the designated expert deems the change
   1.795 +   necessary, the IESG MAY re-assign ownership to an appropriate party.
   1.796 +
   1.797 +7.2.  Collation Registration Format
   1.798 +
   1.799 +   Registration of a collation is done by sending a well-formed XML
   1.800 +   document to collation@ietf.org and iana@iana.org.
   1.801 +
   1.802 +7.2.1.  Registration Template
   1.803 +
   1.804 +   Here is a template for the registration:
   1.805 +
   1.806 +   <?xml version='1.0'?>
   1.807 +   <!DOCTYPE collation SYSTEM 'collationreg.dtd'>
   1.808 +   <collation rfc="YYYY" scope="global" intendedUse="common">
   1.809 +     <identifier>collation identifier</identifier>
   1.810 +     <title>technical title for collation</title>
   1.811 +     <operations>equality order substring</operations>
   1.812 +     <specification>specification reference</specification>
   1.813 +     <owner>email address of owner or IETF</owner>
   1.814 +     <submitter>email address of submitter</submitter>
   1.815 +     <version>1</version>
   1.816 +   </collation>
   1.817 +
   1.818 +7.2.2.  The Collation Element
   1.819 +
   1.820 +   The root of the registration document MUST be a <collation> element.
   1.821 +   The collation element contains the other elements in the
   1.822 +   registration, which are described in the following sub-subsections,
   1.823 +   in the order given here.
   1.824 +
   1.825 +   The <collation> element MAY include an "rfc=" attribute if the
   1.826 +   specification is in an RFC.  The "rfc=" attribute gives only the
   1.827 +   number of the RFC, without any prefix, such as "RFC", or suffix, such
   1.828 +   as ".txt".
   1.829 +
   1.830 +   The <collation> element MUST include a "scope=" attribute, which MUST
   1.831 +   have one of the values "global", "local", or "other".
   1.832 +
   1.833 +   The <collation> element MUST include an "intendedUse=" attribute,
   1.834 +   which must have one of the values "common", "limited", "vendor", or
   1.835 +   "deprecated".  Collation specifications intended for "common" use are
   1.836 +   expected to reference standards from standards bodies with
   1.837 +   significant experience dealing with the details of international
   1.838 +   character sets.
   1.839 +
   1.840 +   Be aware that future revisions of this specification may add
   1.841 +   additional function types, as well as additional XML attributes,
   1.842 +
   1.843 +
   1.844 +
   1.845 +Newman, et al.              Standards Track                    [Page 15]
   1.846 +
   1.847 +RFC 4790                   Collation Registry                 March 2007
   1.848 +
   1.849 +
   1.850 +   values, and elements.  Any system that automatically parses these XML
   1.851 +   documents MUST take this into account to preserve future
   1.852 +   compatibility.
   1.853 +
   1.854 +7.2.3.  The Identifier Element
   1.855 +
   1.856 +   The <identifier> element gives the precise identifier of the
   1.857 +   collation, e.g., i;ascii-casemap.  The <identifier> element is
   1.858 +   mandatory.
   1.859 +
   1.860 +7.2.4.  The Title Element
   1.861 +
   1.862 +   The <title> element gives the title of the collation.  The <title>
   1.863 +   element is mandatory.
   1.864 +
   1.865 +7.2.5.  The Operations Element
   1.866 +
   1.867 +   The <operations> element lists which of the three operations
   1.868 +   ("equality", "order" or "substring") the collation provides,
   1.869 +   separated by single spaces.  The <operations> element is mandatory.
   1.870 +
   1.871 +7.2.6.  The Specification Element
   1.872 +
   1.873 +   The <specification> element describes where to find the
   1.874 +   specification.  The <specification> element is mandatory.  It MAY
   1.875 +   have a URI attribute.  There may be more than one <specification>
   1.876 +   element, in which case, they together form the specification.
   1.877 +
   1.878 +   If it is discovered that parts of a collation specification conflict,
   1.879 +   a new revision of the collation is necessary, and the
   1.880 +   collation@ietf.org mailing list should be notified.
   1.881 +
   1.882 +7.2.7.  The Submitter Element
   1.883 +
   1.884 +   The <submitter> element provides an RFC 2822 [12] email address for
   1.885 +   the person who submitted the registration.  It is optional if the
   1.886 +   <owner> element contains an email address.
   1.887 +
   1.888 +   There may be more than one <submitter> element.
   1.889 +
   1.890 +7.2.8.  The Owner Element
   1.891 +
   1.892 +   The <owner> element contains either the four letters "IETF" or an
   1.893 +   email address of the owner of the registration.  The <owner> element
   1.894 +   is mandatory.  There may be more than one <owner> element.  If so,
   1.895 +   all owners are equal.  Each owner can speak for all.
   1.896 +
   1.897 +
   1.898 +
   1.899 +
   1.900 +
   1.901 +Newman, et al.              Standards Track                    [Page 16]
   1.902 +
   1.903 +RFC 4790                   Collation Registry                 March 2007
   1.904 +
   1.905 +
   1.906 +7.2.9.  The Version Element
   1.907 +
   1.908 +   The <version> element MUST be included when the registration is
   1.909 +   likely to be revised, or has been revised in such a way that the
   1.910 +   results change for one or more input strings.  The <version> element
   1.911 +   is optional.
   1.912 +
   1.913 +7.2.10.  The Variable Element
   1.914 +
   1.915 +   The <variable> element specifies an optional variable to control the
   1.916 +   collation's behaviour, for example whether it is case sensitive.  The
   1.917 +   <variable> element is optional.  When <variable> is used, it must
   1.918 +   contain <name> and <default> elements, and it may contain one or more
   1.919 +   <value> elements.
   1.920 +
   1.921 +7.2.10.1.  The Name Element
   1.922 +
   1.923 +   The <name> element specifies the name value of a variable.  The
   1.924 +   <name> element is mandatory.
   1.925 +
   1.926 +7.2.10.2.  The Default Element
   1.927 +
   1.928 +   The <default> element specifies the default value of a variable.  The
   1.929 +   <default> element is mandatory.
   1.930 +
   1.931 +7.2.10.3.  The Value Element
   1.932 +
   1.933 +   The <value> element specifies a legal value of a variable.  The
   1.934 +   <value> element is optional.  If one or more <value> elements are
   1.935 +   present, only those values are legal.  If none are, then the
   1.936 +   variable's legal values do not form an enumerated set, and the rules
   1.937 +   MUST be specified in an RFC accompanying the registration.
   1.938 +
   1.939 +7.3.  Structure of Collation Registry
   1.940 +
   1.941 +   Once the registration is approved, IANA will store each XML
   1.942 +   registration document in a URL of the form
   1.943 +   http://www.iana.org/assignments/collation/collation-id.xml, where
   1.944 +   collation-id is the content of the identifier element in the
   1.945 +   registration.  Both the submitter and the designated expert are
   1.946 +   responsible for verifying that the XML is well-formed.  The
   1.947 +   registration document should avoid using new elements.  If any are
   1.948 +   necessary, it is important to be consistent with other registrations.
   1.949 +
   1.950 +   IANA will also maintain a text summary of the registry under the name
   1.951 +   http://www.iana.org/assignments/collation/collation-index.html.  This
   1.952 +   summary is divided into four sections.  The first section is for
   1.953 +   collations intended for common use.  This section is intended for
   1.954 +
   1.955 +
   1.956 +
   1.957 +Newman, et al.              Standards Track                    [Page 17]
   1.958 +
   1.959 +RFC 4790                   Collation Registry                 March 2007
   1.960 +
   1.961 +
   1.962 +   collation registrations published in IESG-approved RFCs, or for
   1.963 +   locally scoped collations from the primary standards body for that
   1.964 +   locale.  The designated expert is encouraged to reject collation
   1.965 +   registrations with an intended use of "common" if the expert believes
   1.966 +   it should be "limited", as it is desirable to keep the number of
   1.967 +   "common" registrations small and of high quality.  The second section
   1.968 +   is reserved for limited-use collations.  The third section is
   1.969 +   reserved for registered vendor-specific collations.  The final
   1.970 +   section is reserved for deprecated collations.
   1.971 +
   1.972 +7.4.  Example Initial Registry Summary
   1.973 +
   1.974 +   The following is an example of how IANA might structure the initial
   1.975 +   registry summary.html file:
   1.976 +
   1.977 +     Collation                              Functions Scope Reference
   1.978 +     ---------                              --------- ----- ---------
   1.979 +   Common Use Collations:
   1.980 +     i;ascii-casemap                        e, o, s   Local [RFC 4790]
   1.981 +
   1.982 +   Limited Use Collations:
   1.983 +     i;octet                                e, o, s   Other [RFC 4790]
   1.984 +     i;ascii-numeric                        e, o      Other [RFC 4790]
   1.985 +
   1.986 +   Vendor Collations:
   1.987 +
   1.988 +   Deprecated Collations:
   1.989 +
   1.990 +
   1.991 +   References
   1.992 +   ----------
   1.993 +   [RFC 4790]  Newman, C., Duerst, M., Gulbrandsen, A., "Internet
   1.994 +               Application Protocol Collation Registry", RFC 4790,
   1.995 +               Sun Microsystems, March 2007.
   1.996 +
   1.997 +8.  Guidelines for Expert Reviewer
   1.998 +
   1.999 +   The expert reviewer appointed by the IESG has fairly broad latitude
  1.1000 +   for this registry.  While a number of collations are expected
  1.1001 +   (particularly customizations of the UCA for localized use), an
  1.1002 +   explosion of collations (particularly common-use collations) is not
  1.1003 +   desirable for widespread interoperability.  However, it is important
  1.1004 +   for the expert reviewer to provide cause when rejecting a
  1.1005 +   registration, and, when possible, to describe corrective action to
  1.1006 +
  1.1007 +
  1.1008 +
  1.1009 +
  1.1010 +
  1.1011 +
  1.1012 +
  1.1013 +Newman, et al.              Standards Track                    [Page 18]
  1.1014 +
  1.1015 +RFC 4790                   Collation Registry                 March 2007
  1.1016 +
  1.1017 +
  1.1018 +   permit the registration to proceed.  The following table includes
  1.1019 +   some example reasons to reject a registration with cause:
  1.1020 +
  1.1021 +   o  The registration is not a well-formed XML document.
  1.1022 +
  1.1023 +   o  The registration has an intended use of "common", but there is no
  1.1024 +      evidence the collation will be widely deployed, so it should be
  1.1025 +      listed as "limited".
  1.1026 +
  1.1027 +   o  The registration has an intended use of "common", but it is
  1.1028 +      redundant with the functionality of a previously registered
  1.1029 +      "common" collation.
  1.1030 +
  1.1031 +   o  The registration has an intended use of "common", but the
  1.1032 +      specification is not detailed enough to allow interoperable
  1.1033 +      implementations by others.
  1.1034 +
  1.1035 +   o  The collation identifier fails to precisely identify the version
  1.1036 +      numbers of relevant tables to use.
  1.1037 +
  1.1038 +   o  The registration fails to meet one of the "MUST" requirements in
  1.1039 +      Section 4.
  1.1040 +
  1.1041 +   o  The collation identifier fails to meet the syntax in Section 3.
  1.1042 +
  1.1043 +   o  The collation specification referenced in the registration is
  1.1044 +      vague or has optional features without a clear behavior specified.
  1.1045 +
  1.1046 +   o  The referenced specification does not adequately address security
  1.1047 +      considerations specific to that collation.
  1.1048 +
  1.1049 +   o  The registration's operations are needlessly different from those
  1.1050 +      of traditional operations.
  1.1051 +
  1.1052 +   o  The registration's XML is needlessly different from that of
  1.1053 +      already registered collations.
  1.1054 +
  1.1055 +9.  Initial Collations
  1.1056 +
  1.1057 +   This section registers the three collations that were originally
  1.1058 +   defined in [11], and are implemented in most [14] engines.  Some of
  1.1059 +   the behavior of these collations is perhaps not ideal, such as
  1.1060 +   i;ascii-casemap accepting non-ASCII input.  Compatibility with widely
  1.1061 +   deployed code was judged more important than fixing the collations.
  1.1062 +   Some of the aspects of these collations are necessary to maintain
  1.1063 +   compatibility with widely deployed code.
  1.1064 +
  1.1065 +
  1.1066 +
  1.1067 +
  1.1068 +
  1.1069 +Newman, et al.              Standards Track                    [Page 19]
  1.1070 +
  1.1071 +RFC 4790                   Collation Registry                 March 2007
  1.1072 +
  1.1073 +
  1.1074 +9.1.  ASCII Numeric Collation
  1.1075 +
  1.1076 +9.1.1.  ASCII Numeric Collation Description
  1.1077 +
  1.1078 +   The "i;ascii-numeric" collation is a simple collation intended for
  1.1079 +   use with arbitrarily-sized, unsigned decimal integer numbers stored
  1.1080 +   as octet strings.  US-ASCII digits (0x30 to 0x39) represent digits of
  1.1081 +   the numbers.  Before converting from string to integer, the input
  1.1082 +   string is truncated at the first non-digit character.  All input is
  1.1083 +   valid; strings that do not start with a digit represent positive
  1.1084 +   infinity.
  1.1085 +
  1.1086 +   The collation supports equality and ordering, but does not support
  1.1087 +   the substring operation.
  1.1088 +
  1.1089 +   The equality operation returns "match" if the two strings represent
  1.1090 +   the same number (i.e., leading zeroes and trailing non-digits are
  1.1091 +   disregarded), and "no-match" if the two strings represent different
  1.1092 +   numbers.
  1.1093 +
  1.1094 +   The ordering operation returns "less" if the first string represents
  1.1095 +   a smaller number than the second, "equal" if they represent the same
  1.1096 +   number, and "greater" if the first string represents a larger number
  1.1097 +   than the second.
  1.1098 +
  1.1099 +   Some examples: "0" is less than "1", and "1" is less than
  1.1100 +   "4294967298". "4294967298", "04294967298", and "4294967298b" are all
  1.1101 +   equal. "04294967298" is less than "". "", "x", and "y" are equal.
  1.1102 +
  1.1103 +9.1.2.  ASCII Numeric Collation Registration
  1.1104 +
  1.1105 +   <?xml version='1.0'?>
  1.1106 +   <!DOCTYPE collation SYSTEM 'collationreg.dtd'>
  1.1107 +   <collation rfc="4790" scope="other" intendedUse="limited">
  1.1108 +     <identifier>i;ascii-numeric</identifier>
  1.1109 +     <title>ASCII Numeric</title>
  1.1110 +     <operations>equality order</operations>
  1.1111 +     <specification>RFC 4790</specification>
  1.1112 +     <owner>IETF</owner>
  1.1113 +     <submitter>chris.newman@sun.com</submitter>
  1.1114 +   </collation>
  1.1115 +
  1.1116 +
  1.1117 +
  1.1118 +
  1.1119 +
  1.1120 +
  1.1121 +
  1.1122 +
  1.1123 +
  1.1124 +
  1.1125 +Newman, et al.              Standards Track                    [Page 20]
  1.1126 +
  1.1127 +RFC 4790                   Collation Registry                 March 2007
  1.1128 +
  1.1129 +
  1.1130 +9.2.  ASCII Casemap Collation
  1.1131 +
  1.1132 +9.2.1.  ASCII Casemap Collation Description
  1.1133 +
  1.1134 +   The "i;ascii-casemap" collation is a simple collation that operates
  1.1135 +   on octet strings and treats US-ASCII letters case-insensitively.  It
  1.1136 +   provides equality, substring, and ordering operations.  All input is
  1.1137 +   valid.  Note that letters outside ASCII are not treated case-
  1.1138 +   insensitively.
  1.1139 +
  1.1140 +   Its equality, ordering, and substring operations are as for i;octet,
  1.1141 +   except that at first, the lower-case letters (octet values 97-122) in
  1.1142 +   each input string are changed to upper case (octet values 65-90).
  1.1143 +
  1.1144 +   Care should be taken when using OS-supplied functions to implement
  1.1145 +   this collation, as it is not locale sensitive.  Functions, such as
  1.1146 +   strcasecmp and toupper, are sometimes locale sensitive, and may
  1.1147 +   inappropriately map lower-case letters other than a-z to upper case.
  1.1148 +
  1.1149 +   The i;ascii-casemap collation is well-suited for use with many
  1.1150 +   Internet protocols and computer languages.  Use with natural language
  1.1151 +   is often inappropriate; even though the collation apparently supports
  1.1152 +   languages such as Swahili and English, in real-world use, it tends to
  1.1153 +   mis-sort a number of types of string:
  1.1154 +
  1.1155 +   o  people and place names containing non-ASCII,
  1.1156 +
  1.1157 +   o  words such as "naive" (if spelled with an accent, the accented
  1.1158 +      character could push the word to the wrong spot in a sorted list),
  1.1159 +
  1.1160 +   o  names such as "Lloyd" (which, in Welsh, sorts after "Lyon", unlike
  1.1161 +      in English),
  1.1162 +
  1.1163 +   o  strings containing euro and pound sterling symbols, quotation
  1.1164 +      marks other than '"', dashes/hyphens, etc.
  1.1165 +
  1.1166 +
  1.1167 +
  1.1168 +
  1.1169 +
  1.1170 +
  1.1171 +
  1.1172 +
  1.1173 +
  1.1174 +
  1.1175 +
  1.1176 +
  1.1177 +
  1.1178 +
  1.1179 +
  1.1180 +
  1.1181 +Newman, et al.              Standards Track                    [Page 21]
  1.1182 +
  1.1183 +RFC 4790                   Collation Registry                 March 2007
  1.1184 +
  1.1185 +
  1.1186 +9.2.2.  ASCII Casemap Collation Registration
  1.1187 +
  1.1188 +   <?xml version='1.0'?>
  1.1189 +   <!DOCTYPE collation SYSTEM 'collationreg.dtd'>
  1.1190 +   <collation rfc="4790" scope="local" intendedUse="common">
  1.1191 +     <identifier>i;ascii-casemap</identifier>
  1.1192 +     <title>ASCII Casemap</title>
  1.1193 +     <operations>equality order substring</operations>
  1.1194 +     <specification>RFC 4790</specification>
  1.1195 +     <owner>IETF</owner>
  1.1196 +     <submitter>chris.newman@sun.com</submitter>
  1.1197 +   </collation>
  1.1198 +
  1.1199 +9.3.  Octet Collation
  1.1200 +
  1.1201 +9.3.1.  Octet Collation Description
  1.1202 +
  1.1203 +   The "i;octet" collation is a simple and fast collation intended for
  1.1204 +   use on binary octet strings rather than on character data.  Protocols
  1.1205 +   that want to make this collation available have to do so by
  1.1206 +   explicitly allowing it.  If not explicitly allowed, it MUST NOT be
  1.1207 +   used.  It never returns an "undefined" result.  It provides equality,
  1.1208 +   substring, and ordering operations.
  1.1209 +
  1.1210 +   The ordering algorithm is as follows:
  1.1211 +
  1.1212 +   1.  If both strings are the empty string, return the result "equal".
  1.1213 +
  1.1214 +   2.  If the first string is empty and the second is not, return the
  1.1215 +       result "less".
  1.1216 +
  1.1217 +   3.  If the second string is empty and the first is not, return the
  1.1218 +       result "greater".
  1.1219 +
  1.1220 +   4.  If both strings begin with the same octet value, remove the first
  1.1221 +       octet from both strings and repeat this algorithm from step 1.
  1.1222 +
  1.1223 +   5.  If the unsigned value (0 to 255) of the first octet of the first
  1.1224 +       string is less than the unsigned value of the first octet of the
  1.1225 +       second string, then return "less".
  1.1226 +
  1.1227 +   6.  If this step is reached, return "greater".
  1.1228 +
  1.1229 +   This algorithm is roughly equivalent to the C library function
  1.1230 +   memcmp, with appropriate length checks added.
  1.1231 +
  1.1232 +
  1.1233 +
  1.1234 +
  1.1235 +
  1.1236 +
  1.1237 +Newman, et al.              Standards Track                    [Page 22]
  1.1238 +
  1.1239 +RFC 4790                   Collation Registry                 March 2007
  1.1240 +
  1.1241 +
  1.1242 +   The matching operation returns "match" if the sorting algorithm would
  1.1243 +   return "equal".  Otherwise, the matching operation returns "no-
  1.1244 +   match".
  1.1245 +
  1.1246 +   The substring operation returns "match" if the first string is the
  1.1247 +   empty string, or if there exists a substring of the second string of
  1.1248 +   length equal to the length of the first string, which would result in
  1.1249 +   a "match" result from the equality function.  Otherwise, the
  1.1250 +   substring operation returns "no-match".
  1.1251 +
  1.1252 +9.3.2.  Octet Collation Registration
  1.1253 +
  1.1254 +   This collation is defined with intendedUse="limited" because it can
  1.1255 +   only be used by protocols that explicitly allow it.
  1.1256 +
  1.1257 +   <?xml version='1.0'?>
  1.1258 +   <!DOCTYPE collation SYSTEM 'collationreg.dtd'>
  1.1259 +   <collation rfc="4790" scope="global" intendedUse="limited">
  1.1260 +     <identifier>i;octet</identifier>
  1.1261 +     <title>Octet</title>
  1.1262 +     <operations>equality order substring</operations>
  1.1263 +     <specification>RFC 4790</specification>
  1.1264 +     <owner>IETF</owner>
  1.1265 +     <submitter>chris.newman@sun.com</submitter>
  1.1266 +   </collation>
  1.1267 +
  1.1268 +10.  IANA Considerations
  1.1269 +
  1.1270 +   Section 7 defines how to register collations with IANA.  Section 9
  1.1271 +   defines a list of predefined collations that have been registered
  1.1272 +   with IANA.
  1.1273 +
  1.1274 +11.  Security Considerations
  1.1275 +
  1.1276 +   Collations will normally be used with UTF-8 strings.  Thus, the
  1.1277 +   security considerations for UTF-8 [3], stringprep [6], and Unicode
  1.1278 +   TR-36 [8] also apply, and are normative to this specification.
  1.1279 +
  1.1280 +12.  Acknowledgements
  1.1281 +
  1.1282 +   The authors want to thank all who have contributed to this document,
  1.1283 +   including Brian Carpenter, John Cowan, Dave Cridland, Mark Davis,
  1.1284 +   Spencer Dawkins, Lisa Dusseault, Lars Eggert, Frank Ellermann, Philip
  1.1285 +   Guenther, Tony Hansen, Ted Hardie, Sam Hartman, Kjetil Torgrim Homme,
  1.1286 +   Michael Kay, John Klensin, Alexey Melnikov, Jim Melton, and Abhijit
  1.1287 +   Menon-Sen.
  1.1288 +
  1.1289 +
  1.1290 +
  1.1291 +
  1.1292 +
  1.1293 +Newman, et al.              Standards Track                    [Page 23]
  1.1294 +
  1.1295 +RFC 4790                   Collation Registry                 March 2007
  1.1296 +
  1.1297 +
  1.1298 +13.  References
  1.1299 +
  1.1300 +13.1.  Normative References
  1.1301 +
  1.1302 +   [1]   Bradner, S., "Key words for use in RFCs to Indicate Requirement
  1.1303 +         Levels", BCP 14, RFC 2119, March 1997.
  1.1304 +
  1.1305 +   [2]   Crocker, D. and P. Overell, "Augmented BNF for Syntax
  1.1306 +         Specifications: ABNF", RFC 4234, October 2005.
  1.1307 +
  1.1308 +   [3]   Yergeau, F., "UTF-8, a transformation format of ISO 10646",
  1.1309 +         STD 63, RFC 3629, November 2003.
  1.1310 +
  1.1311 +   [4]   Berners-Lee, T., Fielding, R., and L. Masinter, "Uniform
  1.1312 +         Resource Identifier (URI): Generic Syntax", RFC 3986,
  1.1313 +         January 2005.
  1.1314 +
  1.1315 +   [5]   Phillips, A. and M. Davis, "Tags for Identifying Languages",
  1.1316 +         BCP 47, RFC 4646, September 2006.
  1.1317 +
  1.1318 +   [6]   Hoffman, P. and M. Blanchet, "Preparation of Internationalized
  1.1319 +         Strings ("stringprep")", RFC 3454, December 2002.
  1.1320 +
  1.1321 +   [7]   Davis, M. and K. Whistler, "Unicode Collation Algorithm version
  1.1322 +         14", May 2005,
  1.1323 +         <http://www.unicode.org/reports/tr10/tr10-14.html>.
  1.1324 +
  1.1325 +   [8]   Davis, M. and M. Suignard, "Unicode Security Considerations",
  1.1326 +         February 2006, <http://www.unicode.org/reports/tr36/>.
  1.1327 +
  1.1328 +13.2.  Informative References
  1.1329 +
  1.1330 +   [9]   Freed, N. and N. Borenstein, "Multipurpose Internet Mail
  1.1331 +         Extensions (MIME) Part One: Format of Internet Message Bodies",
  1.1332 +         RFC 2045, November 1996.
  1.1333 +
  1.1334 +   [10]  Melnikov, A., "Simple Authentication and Security Layer
  1.1335 +         (SASL)", RFC 4422, June 2006.
  1.1336 +
  1.1337 +   [11]  Newman, C. and J. Myers, "ACAP -- Application Configuration
  1.1338 +         Access Protocol", RFC 2244, November 1997.
  1.1339 +
  1.1340 +   [12]  Resnick, P., "Internet Message Format", RFC 2822, April 2001.
  1.1341 +
  1.1342 +   [13]  Freed, N. and J. Postel, "IANA Charset Registration
  1.1343 +         Procedures", BCP 19, RFC 2978, October 2000.
  1.1344 +
  1.1345 +
  1.1346 +
  1.1347 +
  1.1348 +
  1.1349 +Newman, et al.              Standards Track                    [Page 24]
  1.1350 +
  1.1351 +RFC 4790                   Collation Registry                 March 2007
  1.1352 +
  1.1353 +
  1.1354 +   [14]  Showalter, T., "Sieve: A Mail Filtering Language", RFC 3028,
  1.1355 +         January 2001.
  1.1356 +
  1.1357 +   [15]  Crispin, M., "Internet Message Access Protocol - Version
  1.1358 +         4rev1", RFC 3501, March 2003.
  1.1359 +
  1.1360 +   [16]  Crispin, M. and K. Murchison, "Internet Message Access Protocol
  1.1361 +         - Sort and Thread Extensions", Work in Progress, May 2004.
  1.1362 +
  1.1363 +   [17]  Newman, C. and A. Gulbrandsen, "Internet Message Access
  1.1364 +         Protocol Internationalization", Work in Progress, January 2006.
  1.1365 +
  1.1366 +Authors' Addresses
  1.1367 +
  1.1368 +   Chris Newman
  1.1369 +   Sun Microsystems
  1.1370 +   1050 Lakes Drive
  1.1371 +   West Covina, CA  91790
  1.1372 +   USA
  1.1373 +
  1.1374 +   EMail: chris.newman@sun.com
  1.1375 +
  1.1376 +
  1.1377 +   Martin Duerst
  1.1378 +   Aoyama Gakuin University
  1.1379 +   5-10-1 Fuchinobe
  1.1380 +   Sagamihara, Kanagawa  229-8558
  1.1381 +   Japan
  1.1382 +
  1.1383 +   Phone: +81 42 759 6329
  1.1384 +   Fax:   +81 42 759 6495
  1.1385 +   EMail: duerst@it.aoyama.ac.jp
  1.1386 +   URI:   http://www.sw.it.aoyama.ac.jp/D%C3%BCrst/
  1.1387 +
  1.1388 +   Note: Please write "Duerst" with u-umlaut wherever possible, for
  1.1389 +   example as "D&#252;rst" in XML and HTML.
  1.1390 +
  1.1391 +
  1.1392 +   Arnt Gulbrandsen
  1.1393 +   Oryx Mail Systems GmbH
  1.1394 +   Schweppermannstr. 8
  1.1395 +   81671 Munich
  1.1396 +   Germany
  1.1397 +
  1.1398 +   Fax:   +49 89 4502 9758
  1.1399 +   EMail: arnt@oryx.com
  1.1400 +   URI:   http://www.oryx.com/arnt/
  1.1401 +
  1.1402 +
  1.1403 +
  1.1404 +
  1.1405 +Newman, et al.              Standards Track                    [Page 25]
  1.1406 +
  1.1407 +RFC 4790                   Collation Registry                 March 2007
  1.1408 +
  1.1409 +
  1.1410 +Full Copyright Statement
  1.1411 +
  1.1412 +   Copyright (C) The IETF Trust (2007).
  1.1413 +
  1.1414 +   This document is subject to the rights, licenses and restrictions
  1.1415 +   contained in BCP 78, and except as set forth therein, the authors
  1.1416 +   retain all their rights.
  1.1417 +
  1.1418 +   This document and the information contained herein are provided on an
  1.1419 +   "AS IS" basis and THE CONTRIBUTOR, THE ORGANIZATION HE/SHE REPRESENTS
  1.1420 +   OR IS SPONSORED BY (IF ANY), THE INTERNET SOCIETY, THE IETF TRUST AND
  1.1421 +   THE INTERNET ENGINEERING TASK FORCE DISCLAIM ALL WARRANTIES, EXPRESS
  1.1422 +   OR IMPLIED, INCLUDING BUT NOT LIMITED TO ANY WARRANTY THAT THE USE OF
  1.1423 +   THE INFORMATION HEREIN WILL NOT INFRINGE ANY RIGHTS OR ANY IMPLIED
  1.1424 +   WARRANTIES OF MERCHANTABILITY OR FITNESS FOR A PARTICULAR PURPOSE.
  1.1425 +
  1.1426 +Intellectual Property
  1.1427 +
  1.1428 +   The IETF takes no position regarding the validity or scope of any
  1.1429 +   Intellectual Property Rights or other rights that might be claimed to
  1.1430 +   pertain to the implementation or use of the technology described in
  1.1431 +   this document or the extent to which any license under such rights
  1.1432 +   might or might not be available; nor does it represent that it has
  1.1433 +   made any independent effort to identify any such rights.  Information
  1.1434 +   on the procedures with respect to rights in RFC documents can be
  1.1435 +   found in BCP 78 and BCP 79.
  1.1436 +
  1.1437 +   Copies of IPR disclosures made to the IETF Secretariat and any
  1.1438 +   assurances of licenses to be made available, or the result of an
  1.1439 +   attempt made to obtain a general license or permission for the use of
  1.1440 +   such proprietary rights by implementers or users of this
  1.1441 +   specification can be obtained from the IETF on-line IPR repository at
  1.1442 +   http://www.ietf.org/ipr.
  1.1443 +
  1.1444 +   The IETF invites any interested party to bring to its attention any
  1.1445 +   copyrights, patents or patent applications, or other proprietary
  1.1446 +   rights that may cover technology that may be required to implement
  1.1447 +   this standard.  Please address the information to the IETF at
  1.1448 +   ietf-ipr@ietf.org.
  1.1449 +
  1.1450 +Acknowledgement
  1.1451 +
  1.1452 +   Funding for the RFC Editor function is currently provided by the
  1.1453 +   Internet Society.
  1.1454 +
  1.1455 +
  1.1456 +
  1.1457 +
  1.1458 +
  1.1459 +
  1.1460 +
  1.1461 +Newman, et al.              Standards Track                    [Page 26]
  1.1462 +

UW-IMAP'd extensions by yuuji