rev |
line source |
yuuji@0
|
1
|
yuuji@0
|
2
|
yuuji@0
|
3
|
yuuji@0
|
4
|
yuuji@0
|
5
|
yuuji@0
|
6
|
yuuji@0
|
7 Network Working Group C. Newman
|
yuuji@0
|
8 Request for Comments: 4790 Sun Microsystems
|
yuuji@0
|
9 Category: Standards Track M. Duerst
|
yuuji@0
|
10 Aoyama Gakuin University
|
yuuji@0
|
11 A. Gulbrandsen
|
yuuji@0
|
12 Oryx
|
yuuji@0
|
13 March 2007
|
yuuji@0
|
14
|
yuuji@0
|
15
|
yuuji@0
|
16 Internet Application Protocol Collation Registry
|
yuuji@0
|
17
|
yuuji@0
|
18 Status of This Memo
|
yuuji@0
|
19
|
yuuji@0
|
20 This document specifies an Internet standards track protocol for the
|
yuuji@0
|
21 Internet community, and requests discussion and suggestions for
|
yuuji@0
|
22 improvements. Please refer to the current edition of the "Internet
|
yuuji@0
|
23 Official Protocol Standards" (STD 1) for the standardization state
|
yuuji@0
|
24 and status of this protocol. Distribution of this memo is unlimited.
|
yuuji@0
|
25
|
yuuji@0
|
26 Copyright Notice
|
yuuji@0
|
27
|
yuuji@0
|
28 Copyright (C) The IETF Trust (2007).
|
yuuji@0
|
29
|
yuuji@0
|
30 Abstract
|
yuuji@0
|
31
|
yuuji@0
|
32 Many Internet application protocols include string-based lookup,
|
yuuji@0
|
33 searching, or sorting operations. However, the problem space for
|
yuuji@0
|
34 searching and sorting international strings is large, not fully
|
yuuji@0
|
35 explored, and is outside the area of expertise for the Internet
|
yuuji@0
|
36 Engineering Task Force (IETF). Rather than attempt to solve such a
|
yuuji@0
|
37 large problem, this specification creates an abstraction framework so
|
yuuji@0
|
38 that application protocols can precisely identify a comparison
|
yuuji@0
|
39 function, and the repertoire of comparison functions can be extended
|
yuuji@0
|
40 in the future.
|
yuuji@0
|
41
|
yuuji@0
|
42
|
yuuji@0
|
43
|
yuuji@0
|
44
|
yuuji@0
|
45
|
yuuji@0
|
46
|
yuuji@0
|
47
|
yuuji@0
|
48
|
yuuji@0
|
49
|
yuuji@0
|
50
|
yuuji@0
|
51
|
yuuji@0
|
52
|
yuuji@0
|
53
|
yuuji@0
|
54
|
yuuji@0
|
55
|
yuuji@0
|
56
|
yuuji@0
|
57
|
yuuji@0
|
58 Newman, et al. Standards Track [Page 1]
|
yuuji@0
|
59
|
yuuji@0
|
60 RFC 4790 Collation Registry March 2007
|
yuuji@0
|
61
|
yuuji@0
|
62
|
yuuji@0
|
63 Table of Contents
|
yuuji@0
|
64
|
yuuji@0
|
65 1. Introduction . . . . . . . . . . . . . . . . . . . . . . . . . 4
|
yuuji@0
|
66 1.1. Conventions Used in This Document . . . . . . . . . . . . 4
|
yuuji@0
|
67 2. Collation Definition and Purpose . . . . . . . . . . . . . . . 4
|
yuuji@0
|
68 2.1. Definition . . . . . . . . . . . . . . . . . . . . . . . . 4
|
yuuji@0
|
69 2.2. Purpose . . . . . . . . . . . . . . . . . . . . . . . . . 4
|
yuuji@0
|
70 2.3. Some Other Terms Used in this Document . . . . . . . . . . 5
|
yuuji@0
|
71 2.4. Sort Keys . . . . . . . . . . . . . . . . . . . . . . . . 5
|
yuuji@0
|
72 3. Collation Identifier Syntax . . . . . . . . . . . . . . . . . 6
|
yuuji@0
|
73 3.1. Basic Syntax . . . . . . . . . . . . . . . . . . . . . . . 6
|
yuuji@0
|
74 3.2. Wildcards . . . . . . . . . . . . . . . . . . . . . . . . 6
|
yuuji@0
|
75 3.3. Ordering Direction . . . . . . . . . . . . . . . . . . . . 7
|
yuuji@0
|
76 3.4. URIs . . . . . . . . . . . . . . . . . . . . . . . . . . . 7
|
yuuji@0
|
77 3.5. Naming Guidelines . . . . . . . . . . . . . . . . . . . . 7
|
yuuji@0
|
78 4. Collation Specification Requirements . . . . . . . . . . . . . 8
|
yuuji@0
|
79 4.1. Collation/Server Interface . . . . . . . . . . . . . . . . 8
|
yuuji@0
|
80 4.2. Operations Supported . . . . . . . . . . . . . . . . . . . 8
|
yuuji@0
|
81 4.2.1. Validity . . . . . . . . . . . . . . . . . . . . . . . 9
|
yuuji@0
|
82 4.2.2. Equality . . . . . . . . . . . . . . . . . . . . . . . 9
|
yuuji@0
|
83 4.2.3. Substring . . . . . . . . . . . . . . . . . . . . . . 9
|
yuuji@0
|
84 4.2.4. Ordering . . . . . . . . . . . . . . . . . . . . . . . 10
|
yuuji@0
|
85 4.3. Sort Keys . . . . . . . . . . . . . . . . . . . . . . . . 10
|
yuuji@0
|
86 4.4. Use of Lookup Tables . . . . . . . . . . . . . . . . . . . 11
|
yuuji@0
|
87 5. Application Protocol Requirements . . . . . . . . . . . . . . 11
|
yuuji@0
|
88 5.1. Character Encoding . . . . . . . . . . . . . . . . . . . . 11
|
yuuji@0
|
89 5.2. Operations . . . . . . . . . . . . . . . . . . . . . . . . 11
|
yuuji@0
|
90 5.3. Wildcards . . . . . . . . . . . . . . . . . . . . . . . . 12
|
yuuji@0
|
91 5.4. String Comparison . . . . . . . . . . . . . . . . . . . . 12
|
yuuji@0
|
92 5.5. Disconnected Clients . . . . . . . . . . . . . . . . . . . 12
|
yuuji@0
|
93 5.6. Error Codes . . . . . . . . . . . . . . . . . . . . . . . 13
|
yuuji@0
|
94 5.7. Octet Collation . . . . . . . . . . . . . . . . . . . . . 13
|
yuuji@0
|
95 6. Use by Existing Protocols . . . . . . . . . . . . . . . . . . 13
|
yuuji@0
|
96 7. Collation Registration . . . . . . . . . . . . . . . . . . . . 14
|
yuuji@0
|
97 7.1. Collation Registration Procedure . . . . . . . . . . . . . 14
|
yuuji@0
|
98 7.2. Collation Registration Format . . . . . . . . . . . . . . 15
|
yuuji@0
|
99 7.2.1. Registration Template . . . . . . . . . . . . . . . . 15
|
yuuji@0
|
100 7.2.2. The Collation Element . . . . . . . . . . . . . . . . 15
|
yuuji@0
|
101 7.2.3. The Identifier Element . . . . . . . . . . . . . . . . 16
|
yuuji@0
|
102 7.2.4. The Title Element . . . . . . . . . . . . . . . . . . 16
|
yuuji@0
|
103 7.2.5. The Operations Element . . . . . . . . . . . . . . . . 16
|
yuuji@0
|
104 7.2.6. The Specification Element . . . . . . . . . . . . . . 16
|
yuuji@0
|
105 7.2.7. The Submitter Element . . . . . . . . . . . . . . . . 16
|
yuuji@0
|
106 7.2.8. The Owner Element . . . . . . . . . . . . . . . . . . 16
|
yuuji@0
|
107 7.2.9. The Version Element . . . . . . . . . . . . . . . . . 17
|
yuuji@0
|
108 7.2.10. The Variable Element . . . . . . . . . . . . . . . . . 17
|
yuuji@0
|
109 7.3. Structure of Collation Registry . . . . . . . . . . . . . 17
|
yuuji@0
|
110 7.4. Example Initial Registry Summary . . . . . . . . . . . . . 18
|
yuuji@0
|
111
|
yuuji@0
|
112
|
yuuji@0
|
113
|
yuuji@0
|
114 Newman, et al. Standards Track [Page 2]
|
yuuji@0
|
115
|
yuuji@0
|
116 RFC 4790 Collation Registry March 2007
|
yuuji@0
|
117
|
yuuji@0
|
118
|
yuuji@0
|
119 8. Guidelines for Expert Reviewer . . . . . . . . . . . . . . . . 18
|
yuuji@0
|
120 9. Initial Collations . . . . . . . . . . . . . . . . . . . . . . 19
|
yuuji@0
|
121 9.1. ASCII Numeric Collation . . . . . . . . . . . . . . . . . 20
|
yuuji@0
|
122 9.1.1. ASCII Numeric Collation Description . . . . . . . . . 20
|
yuuji@0
|
123 9.1.2. ASCII Numeric Collation Registration . . . . . . . . . 20
|
yuuji@0
|
124 9.2. ASCII Casemap Collation . . . . . . . . . . . . . . . . . 21
|
yuuji@0
|
125 9.2.1. ASCII Casemap Collation Description . . . . . . . . . 21
|
yuuji@0
|
126 9.2.2. ASCII Casemap Collation Registration . . . . . . . . . 22
|
yuuji@0
|
127 9.3. Octet Collation . . . . . . . . . . . . . . . . . . . . . 22
|
yuuji@0
|
128 9.3.1. Octet Collation Description . . . . . . . . . . . . . 22
|
yuuji@0
|
129 9.3.2. Octet Collation Registration . . . . . . . . . . . . . 23
|
yuuji@0
|
130 10. IANA Considerations . . . . . . . . . . . . . . . . . . . . . 23
|
yuuji@0
|
131 11. Security Considerations . . . . . . . . . . . . . . . . . . . 23
|
yuuji@0
|
132 12. Acknowledgements . . . . . . . . . . . . . . . . . . . . . . . 23
|
yuuji@0
|
133 13. References . . . . . . . . . . . . . . . . . . . . . . . . . . 24
|
yuuji@0
|
134 13.1. Normative References . . . . . . . . . . . . . . . . . . . 24
|
yuuji@0
|
135 13.2. Informative References . . . . . . . . . . . . . . . . . . 24
|
yuuji@0
|
136
|
yuuji@0
|
137
|
yuuji@0
|
138
|
yuuji@0
|
139
|
yuuji@0
|
140
|
yuuji@0
|
141
|
yuuji@0
|
142
|
yuuji@0
|
143
|
yuuji@0
|
144
|
yuuji@0
|
145
|
yuuji@0
|
146
|
yuuji@0
|
147
|
yuuji@0
|
148
|
yuuji@0
|
149
|
yuuji@0
|
150
|
yuuji@0
|
151
|
yuuji@0
|
152
|
yuuji@0
|
153
|
yuuji@0
|
154
|
yuuji@0
|
155
|
yuuji@0
|
156
|
yuuji@0
|
157
|
yuuji@0
|
158
|
yuuji@0
|
159
|
yuuji@0
|
160
|
yuuji@0
|
161
|
yuuji@0
|
162
|
yuuji@0
|
163
|
yuuji@0
|
164
|
yuuji@0
|
165
|
yuuji@0
|
166
|
yuuji@0
|
167
|
yuuji@0
|
168
|
yuuji@0
|
169
|
yuuji@0
|
170 Newman, et al. Standards Track [Page 3]
|
yuuji@0
|
171
|
yuuji@0
|
172 RFC 4790 Collation Registry March 2007
|
yuuji@0
|
173
|
yuuji@0
|
174
|
yuuji@0
|
175 1. Introduction
|
yuuji@0
|
176
|
yuuji@0
|
177 The Application Configuration Access Protocol ACAP [11] specification
|
yuuji@0
|
178 introduced the concept of a comparator (which we call collation in
|
yuuji@0
|
179 this document), but failed to create an IANA registry. With the
|
yuuji@0
|
180 introduction of stringprep [6] and the Unicode Collation Algorithm
|
yuuji@0
|
181 [7], it is now time to create that registry and populate it with some
|
yuuji@0
|
182 initial values appropriate for an international community. This
|
yuuji@0
|
183 specification replaces and generalizes the definition of a comparator
|
yuuji@0
|
184 in ACAP, and creates a collation registry.
|
yuuji@0
|
185
|
yuuji@0
|
186 1.1. Conventions Used in This Document
|
yuuji@0
|
187
|
yuuji@0
|
188 The key words "MUST", "MUST NOT", "SHOULD", "SHOULD NOT", and "MAY"
|
yuuji@0
|
189 in this document are to be interpreted as defined in "Key words for
|
yuuji@0
|
190 use in RFCs to Indicate Requirement Levels" [1].
|
yuuji@0
|
191
|
yuuji@0
|
192 The attribute syntax specifications use the Augmented Backus-Naur
|
yuuji@0
|
193 Form (ABNF) [2] notation, including the core rules defined in
|
yuuji@0
|
194 Appendix A. The ABNF production "Language-tag" is imported from
|
yuuji@0
|
195 Language Tags [5] and "reg-name" from URI: Generic Syntax [4].
|
yuuji@0
|
196
|
yuuji@0
|
197 2. Collation Definition and Purpose
|
yuuji@0
|
198
|
yuuji@0
|
199 2.1. Definition
|
yuuji@0
|
200
|
yuuji@0
|
201 A collation is a named function which takes two arbitrary length
|
yuuji@0
|
202 strings as input and can be used to perform one or more of three
|
yuuji@0
|
203 basic comparison operations: equality test, substring match, and
|
yuuji@0
|
204 ordering test.
|
yuuji@0
|
205
|
yuuji@0
|
206 2.2. Purpose
|
yuuji@0
|
207
|
yuuji@0
|
208 Collations are an abstraction for comparison functions so that these
|
yuuji@0
|
209 comparison functions can be used in multiple protocols. The details
|
yuuji@0
|
210 of a particular comparison operation can be specified by someone with
|
yuuji@0
|
211 appropriate expertise, independent of the application protocols that
|
yuuji@0
|
212 use that collation. This is similar to the way a charset [13]
|
yuuji@0
|
213 separates the details of octet to character mapping from a protocol
|
yuuji@0
|
214 specification, such as MIME [9], or the way SASL [10] separates the
|
yuuji@0
|
215 details of an authentication mechanism from a protocol specification,
|
yuuji@0
|
216 such as ACAP [11].
|
yuuji@0
|
217
|
yuuji@0
|
218
|
yuuji@0
|
219
|
yuuji@0
|
220
|
yuuji@0
|
221
|
yuuji@0
|
222
|
yuuji@0
|
223
|
yuuji@0
|
224
|
yuuji@0
|
225
|
yuuji@0
|
226 Newman, et al. Standards Track [Page 4]
|
yuuji@0
|
227
|
yuuji@0
|
228 RFC 4790 Collation Registry March 2007
|
yuuji@0
|
229
|
yuuji@0
|
230
|
yuuji@0
|
231 Here is a small diagram to help illustrate the value of this
|
yuuji@0
|
232 abstraction:
|
yuuji@0
|
233
|
yuuji@0
|
234 +-------------------+ +-----------------+
|
yuuji@0
|
235 | IMAP i18n SEARCH |--+ | Basic |
|
yuuji@0
|
236 +-------------------+ | +--| Collation Spec |
|
yuuji@0
|
237 | | +-----------------+
|
yuuji@0
|
238 +-------------------+ | +-------------+ | +-----------------+
|
yuuji@0
|
239 | ACAP i18n SEARCH |--+--| Collation |--+--| A stringprep |
|
yuuji@0
|
240 +-------------------+ | | Registry | | | Collation Spec |
|
yuuji@0
|
241 | +-------------+ | +-----------------+
|
yuuji@0
|
242 +-------------------+ | | +-----------------+
|
yuuji@0
|
243 | ...other protocol |--+ | | locale-specific |
|
yuuji@0
|
244 +-------------------+ +--| Collation Spec |
|
yuuji@0
|
245 +-----------------+
|
yuuji@0
|
246
|
yuuji@0
|
247 Thus IMAP, ACAP, and future application protocols with international
|
yuuji@0
|
248 search capability simply specify how to interface to the collation
|
yuuji@0
|
249 registry instead of each protocol specification having to specify all
|
yuuji@0
|
250 the collations it supports.
|
yuuji@0
|
251
|
yuuji@0
|
252 2.3. Some Other Terms Used in this Document
|
yuuji@0
|
253
|
yuuji@0
|
254 The terms client, server, and protocol are used in somewhat unusual
|
yuuji@0
|
255 senses.
|
yuuji@0
|
256
|
yuuji@0
|
257 Client means a user, or a program acting directly on behalf of a
|
yuuji@0
|
258 user. This may be a mail reader acting as an IMAP client, or it may
|
yuuji@0
|
259 be an interactive shell, where the user can type protocol commands/
|
yuuji@0
|
260 requests directly, or it may be a script or program written by the
|
yuuji@0
|
261 user.
|
yuuji@0
|
262
|
yuuji@0
|
263 Server means a program that performs services requested by the
|
yuuji@0
|
264 client. This may be a traditional server such as an HTTP server, or
|
yuuji@0
|
265 it may be a Sieve [14] interpreter running a Sieve script written by
|
yuuji@0
|
266 a user. A server needs to use the operations provided by collations
|
yuuji@0
|
267 in order to fulfill the client's requests.
|
yuuji@0
|
268
|
yuuji@0
|
269 The protocol describes how the client tells the server what it wants
|
yuuji@0
|
270 done, and (if applicable) how the server tells the client about the
|
yuuji@0
|
271 results. IMAP is a protocol by this definition, and so is the Sieve
|
yuuji@0
|
272 language.
|
yuuji@0
|
273
|
yuuji@0
|
274 2.4. Sort Keys
|
yuuji@0
|
275
|
yuuji@0
|
276 One component of a collation is a transformation, which turns a
|
yuuji@0
|
277 string into a sort key, which is then used while sorting.
|
yuuji@0
|
278
|
yuuji@0
|
279
|
yuuji@0
|
280
|
yuuji@0
|
281
|
yuuji@0
|
282 Newman, et al. Standards Track [Page 5]
|
yuuji@0
|
283
|
yuuji@0
|
284 RFC 4790 Collation Registry March 2007
|
yuuji@0
|
285
|
yuuji@0
|
286
|
yuuji@0
|
287 The transformation can range from an identity mapping (e.g., the
|
yuuji@0
|
288 i;octet collation Section 9.3) to a mapping that makes the string
|
yuuji@0
|
289 unreadable to a human.
|
yuuji@0
|
290
|
yuuji@0
|
291 This is an implementation detail of collations or servers. A
|
yuuji@0
|
292 protocol SHOULD NOT expose it to clients, since some collations leave
|
yuuji@0
|
293 the sort key's format up to the implementation, and current
|
yuuji@0
|
294 conformant implementations are known to use different formats.
|
yuuji@0
|
295
|
yuuji@0
|
296 3. Collation Identifier Syntax
|
yuuji@0
|
297
|
yuuji@0
|
298 3.1. Basic Syntax
|
yuuji@0
|
299
|
yuuji@0
|
300 The collation identifier itself is a single US-ASCII string. The
|
yuuji@0
|
301 identifier MUST NOT be longer than 254 characters, and obeys the
|
yuuji@0
|
302 following grammar:
|
yuuji@0
|
303
|
yuuji@0
|
304 collation-char = ALPHA / DIGIT / "-" / ";" / "=" / "."
|
yuuji@0
|
305
|
yuuji@0
|
306 collation-id = collation-prefix ";" collation-core-name
|
yuuji@0
|
307 *collation-arg
|
yuuji@0
|
308
|
yuuji@0
|
309 collation-scope = Language-tag / "vnd-" reg-name
|
yuuji@0
|
310
|
yuuji@0
|
311 collation-core-name = ALPHA *( ALPHA / DIGIT / "-" )
|
yuuji@0
|
312
|
yuuji@0
|
313 collation-arg = ";" ALPHA *( ALPHA / DIGIT ) "="
|
yuuji@0
|
314 1*( ALPHA / DIGIT / "." )
|
yuuji@0
|
315
|
yuuji@0
|
316
|
yuuji@0
|
317 Note: the ABNF production "Language-tag" is imported from Language
|
yuuji@0
|
318 Tags [5] and "reg-name" from URI: Generic Syntax [4].
|
yuuji@0
|
319
|
yuuji@0
|
320 There is a special identifier called "default". For protocols that
|
yuuji@0
|
321 have a default collation, "default" refers to that collation. For
|
yuuji@0
|
322 other protocols, the identifier "default" MUST match no collations,
|
yuuji@0
|
323 and servers SHOULD treat it in the same way as they treat nonexistent
|
yuuji@0
|
324 collations.
|
yuuji@0
|
325
|
yuuji@0
|
326 3.2. Wildcards
|
yuuji@0
|
327
|
yuuji@0
|
328 The string a client uses to select a collation MAY contain one or
|
yuuji@0
|
329 more wildcard ("*") characters that match zero or more collation-
|
yuuji@0
|
330 chars. Wildcard characters MUST NOT be adjacent. If the wildcard
|
yuuji@0
|
331 string matches multiple collations, the server SHOULD attempt to
|
yuuji@0
|
332 select a widely useful collation in preference to a narrowly useful
|
yuuji@0
|
333 one.
|
yuuji@0
|
334
|
yuuji@0
|
335
|
yuuji@0
|
336
|
yuuji@0
|
337
|
yuuji@0
|
338 Newman, et al. Standards Track [Page 6]
|
yuuji@0
|
339
|
yuuji@0
|
340 RFC 4790 Collation Registry March 2007
|
yuuji@0
|
341
|
yuuji@0
|
342
|
yuuji@0
|
343 collation-wild = ("*" / (ALPHA ["*"])) *(collation-char ["*"])
|
yuuji@0
|
344 ; MUST NOT exceed 254 characters total
|
yuuji@0
|
345
|
yuuji@0
|
346 3.3. Ordering Direction
|
yuuji@0
|
347
|
yuuji@0
|
348 When used as a protocol element for ordering, the collation
|
yuuji@0
|
349 identifier MAY be prefixed by either "+" or "-" to explicitly specify
|
yuuji@0
|
350 an ordering direction. "+" has no effect on the ordering operation,
|
yuuji@0
|
351 while "-" inverts the result of the ordering operation. In general,
|
yuuji@0
|
352 collation-order is used when a client requests a collation, and
|
yuuji@0
|
353 collation-selected is used when the server informs the client of the
|
yuuji@0
|
354 selected collation.
|
yuuji@0
|
355
|
yuuji@0
|
356 collation-selected = ["+" / "-"] collation-id
|
yuuji@0
|
357
|
yuuji@0
|
358 collation-order = ["+" / "-"] collation-wild
|
yuuji@0
|
359
|
yuuji@0
|
360 3.4. URIs
|
yuuji@0
|
361
|
yuuji@0
|
362 Some protocols are designed to use URIs [4] to refer to collations
|
yuuji@0
|
363 rather than simple tokens. A special section of the IANA URL space
|
yuuji@0
|
364 is reserved for such usage. The "collation-uri" form is used to
|
yuuji@0
|
365 refer to a specific named collation (the collation registration may
|
yuuji@0
|
366 not actually be present). The "collation-auri" form is an abstract
|
yuuji@0
|
367 name for an ordering, a collation pattern or a vendor private
|
yuuji@0
|
368 collator.
|
yuuji@0
|
369
|
yuuji@0
|
370 collation-uri = "http://www.iana.org/assignments/collation/"
|
yuuji@0
|
371 collation-id ".xml"
|
yuuji@0
|
372
|
yuuji@0
|
373 collation-auri = ( "http://www.iana.org/assignments/collation/"
|
yuuji@0
|
374 collation-order ".xml" ) / other-uri
|
yuuji@0
|
375
|
yuuji@0
|
376 other-uri = <absoluteURI>
|
yuuji@0
|
377 ; excluding the IANA collation namespace.
|
yuuji@0
|
378
|
yuuji@0
|
379 3.5. Naming Guidelines
|
yuuji@0
|
380
|
yuuji@0
|
381 While this specification makes no absolute requirements on the
|
yuuji@0
|
382 structure of collation identifiers, naming consistency is important,
|
yuuji@0
|
383 so the following initial guidelines are provided.
|
yuuji@0
|
384
|
yuuji@0
|
385 Collation identifiers with an international audience typically begin
|
yuuji@0
|
386 with "i;". Collation identifiers intended for a particular language
|
yuuji@0
|
387 or locale typically begin with a language tag [5] followed by a ";".
|
yuuji@0
|
388 After the first ";" is normally the name of the general collation
|
yuuji@0
|
389 algorithm, followed by a series of algorithm modifications separated
|
yuuji@0
|
390 by the ";" delimiter. Parameterized modifications will use "=" to
|
yuuji@0
|
391
|
yuuji@0
|
392
|
yuuji@0
|
393
|
yuuji@0
|
394 Newman, et al. Standards Track [Page 7]
|
yuuji@0
|
395
|
yuuji@0
|
396 RFC 4790 Collation Registry March 2007
|
yuuji@0
|
397
|
yuuji@0
|
398
|
yuuji@0
|
399 delimit the parameter from the value. The version numbers of any
|
yuuji@0
|
400 lookup tables used by the algorithm SHOULD be present as
|
yuuji@0
|
401 parameterized modifications.
|
yuuji@0
|
402
|
yuuji@0
|
403 Collation identifiers of the form *;vnd-hostname;* are reserved for
|
yuuji@0
|
404 vendor-specific collations created by the owner of the hostname
|
yuuji@0
|
405 following the "vnd-" prefix (e.g., vnd-example.com for the vendor
|
yuuji@0
|
406 example.com). Registration of such collations (or the name space as
|
yuuji@0
|
407 a whole), with intended use of the "Vendor", is encouraged when a
|
yuuji@0
|
408 public specification or open-source implementation is available, but
|
yuuji@0
|
409 is not required.
|
yuuji@0
|
410
|
yuuji@0
|
411 4. Collation Specification Requirements
|
yuuji@0
|
412
|
yuuji@0
|
413 4.1. Collation/Server Interface
|
yuuji@0
|
414
|
yuuji@0
|
415 The collation itself defines what it operates on. Most collations
|
yuuji@0
|
416 are expected to operate on character strings. The i;octet
|
yuuji@0
|
417 (Section 9.3) collation operates on octet strings. The i;ascii-
|
yuuji@0
|
418 numeric (Section 9.1) operation operates on numbers.
|
yuuji@0
|
419
|
yuuji@0
|
420 This specification defines the collation interface in terms of octet
|
yuuji@0
|
421 strings. However, implementations may choose to use character
|
yuuji@0
|
422 strings instead. Such implementations may not be able to implement
|
yuuji@0
|
423 e.g., i;octet. Since i;octet is not currently mandatory to implement
|
yuuji@0
|
424 for any protocol, this should not be a problem.
|
yuuji@0
|
425
|
yuuji@0
|
426 4.2. Operations Supported
|
yuuji@0
|
427
|
yuuji@0
|
428 A collation specification MUST state which of the three basic
|
yuuji@0
|
429 operations are supported (equality, substring, ordering) and how to
|
yuuji@0
|
430 perform each of the supported operations on any two input character
|
yuuji@0
|
431 strings, including empty strings. Collations must be deterministic,
|
yuuji@0
|
432 i.e., given a collation with a specific identifier, and any two fixed
|
yuuji@0
|
433 input strings, the result MUST be the same for the same operation.
|
yuuji@0
|
434
|
yuuji@0
|
435 In general, collation operations should behave as their names
|
yuuji@0
|
436 suggest. While a collation may be new, the operations are not, so
|
yuuji@0
|
437 the new collation's operations should be similar to those of older
|
yuuji@0
|
438 collations. For example, a date/time collation should not provide a
|
yuuji@0
|
439 "substring" operation that would morph IMAP substring SEARCH into
|
yuuji@0
|
440 e.g., a date-range search.
|
yuuji@0
|
441
|
yuuji@0
|
442 A non-obvious consequence of the rules for each collation operation
|
yuuji@0
|
443 is that, for any single collation, either none or all of the
|
yuuji@0
|
444 operations can return "undefined". For example, it is not possible
|
yuuji@0
|
445 to have an equality operation that never returns "undefined", and a
|
yuuji@0
|
446 substring operation that occasionally does.
|
yuuji@0
|
447
|
yuuji@0
|
448
|
yuuji@0
|
449
|
yuuji@0
|
450 Newman, et al. Standards Track [Page 8]
|
yuuji@0
|
451
|
yuuji@0
|
452 RFC 4790 Collation Registry March 2007
|
yuuji@0
|
453
|
yuuji@0
|
454
|
yuuji@0
|
455 4.2.1. Validity
|
yuuji@0
|
456
|
yuuji@0
|
457 The validity test takes one string as argument. It returns valid if
|
yuuji@0
|
458 its input string is a valid input to the collation's other
|
yuuji@0
|
459 operations, and invalid if not. (In other words, a string is valid
|
yuuji@0
|
460 if it is equal to itself according to the collation's equality
|
yuuji@0
|
461 operation.)
|
yuuji@0
|
462
|
yuuji@0
|
463 The validity test is provided by all collations. It MUST NOT be
|
yuuji@0
|
464 listed separately in the collation registration.
|
yuuji@0
|
465
|
yuuji@0
|
466 4.2.2. Equality
|
yuuji@0
|
467
|
yuuji@0
|
468 The equality test always returns "match" or "no-match" when it is
|
yuuji@0
|
469 supplied valid input, and MAY return "undefined" if one or both input
|
yuuji@0
|
470 strings are not valid.
|
yuuji@0
|
471
|
yuuji@0
|
472 The equality test MUST be reflexive and symmetric. For valid input,
|
yuuji@0
|
473 it MUST be transitive.
|
yuuji@0
|
474
|
yuuji@0
|
475 If a collation provides either a substring or an ordering test, it
|
yuuji@0
|
476 MUST also provide an equality test. The substring and/or ordering
|
yuuji@0
|
477 tests MUST be consistent with the equality test.
|
yuuji@0
|
478
|
yuuji@0
|
479 The return values of the equality test are called "match", "no-match"
|
yuuji@0
|
480 and "undefined" in this document.
|
yuuji@0
|
481
|
yuuji@0
|
482 4.2.3. Substring
|
yuuji@0
|
483
|
yuuji@0
|
484 The substring matching operation determines if the first string is a
|
yuuji@0
|
485 substring of the second string, i.e., if one or more substrings of
|
yuuji@0
|
486 the second string is equal to the first, as defined by the
|
yuuji@0
|
487 collation's equality operation.
|
yuuji@0
|
488
|
yuuji@0
|
489 A collation that supports substring matching will automatically
|
yuuji@0
|
490 support two special cases of substring matching: prefix and suffix
|
yuuji@0
|
491 matching, if those special cases are supported by the application
|
yuuji@0
|
492 protocol. It returns "match" or "no-match" when it is supplied valid
|
yuuji@0
|
493 input and returns "undefined" when supplied invalid input.
|
yuuji@0
|
494
|
yuuji@0
|
495 Application protocols MAY return position information for substring
|
yuuji@0
|
496 matches. If this is done, the position information SHOULD include
|
yuuji@0
|
497 both the starting offset and the ending offset for each match. This
|
yuuji@0
|
498 is important because more sophisticated collations can match strings
|
yuuji@0
|
499 of unequal length (for example, a pre-composed accented character can
|
yuuji@0
|
500 match a decomposed accented character). In general, overlapping
|
yuuji@0
|
501 matches SHOULD be reported (as when "ana" occurs twice within
|
yuuji@0
|
502 "banana"), although there are cases where a collation may decide not
|
yuuji@0
|
503
|
yuuji@0
|
504
|
yuuji@0
|
505
|
yuuji@0
|
506 Newman, et al. Standards Track [Page 9]
|
yuuji@0
|
507
|
yuuji@0
|
508 RFC 4790 Collation Registry March 2007
|
yuuji@0
|
509
|
yuuji@0
|
510
|
yuuji@0
|
511 to. For example, in a collation which treats all whitespace
|
yuuji@0
|
512 sequences as identical, the substring operation could be defined such
|
yuuji@0
|
513 that " 1 " (SP "1" SP) is reported just once within " 1 " (SP SP
|
yuuji@0
|
514 "1" SP SP), not four times (SP SP "1" SP, SP "1" SP, SP "1" SP SP and
|
yuuji@0
|
515 SP SP "1" SP SP), since the four matches are, in a sense, the same
|
yuuji@0
|
516 match.
|
yuuji@0
|
517
|
yuuji@0
|
518 A string is a substring of itself. The empty string is a substring
|
yuuji@0
|
519 of all strings.
|
yuuji@0
|
520
|
yuuji@0
|
521 Note that the substring operation of some collations can match
|
yuuji@0
|
522 strings of unequal length. For example, a pre-composed accented
|
yuuji@0
|
523 character can match a decomposed accented character. The Unicode
|
yuuji@0
|
524 Collation Algorithm [7] discusses this in more detail.
|
yuuji@0
|
525
|
yuuji@0
|
526 The return values of the substring operation are called "match", "no-
|
yuuji@0
|
527 match", and "undefined" in this document.
|
yuuji@0
|
528
|
yuuji@0
|
529 4.2.4. Ordering
|
yuuji@0
|
530
|
yuuji@0
|
531 The ordering operation determines how two strings are ordered. It
|
yuuji@0
|
532 MUST be reflexive. For valid input, it MUST be transitive and
|
yuuji@0
|
533 trichotomous.
|
yuuji@0
|
534
|
yuuji@0
|
535 Ordering returns "less" if the first string is listed before the
|
yuuji@0
|
536 second string, according to the collation; "greater", if the second
|
yuuji@0
|
537 string is listed before the first string; and "equal", if the two
|
yuuji@0
|
538 strings are equal, as defined by the collation's equality operation.
|
yuuji@0
|
539 If one or both strings are invalid, the result of ordering is
|
yuuji@0
|
540 "undefined".
|
yuuji@0
|
541
|
yuuji@0
|
542 When the collation is used with a "+" prefix, the behavior is the
|
yuuji@0
|
543 same as when used with no prefix. When the collation is used with a
|
yuuji@0
|
544 "-" prefix, the result of the ordering operation of the collation
|
yuuji@0
|
545 MUST be reversed.
|
yuuji@0
|
546
|
yuuji@0
|
547 The return values of the ordering operation are called "less",
|
yuuji@0
|
548 "equal", "greater", and "undefined" in this document.
|
yuuji@0
|
549
|
yuuji@0
|
550 4.3. Sort Keys
|
yuuji@0
|
551
|
yuuji@0
|
552 A collation specification SHOULD describe the internal transformation
|
yuuji@0
|
553 algorithm to generate sort keys. This algorithm can be applied to
|
yuuji@0
|
554 individual strings, and the result can be stored to potentially
|
yuuji@0
|
555 optimize future comparison operations. A collation MAY specify that
|
yuuji@0
|
556 the sort key is generated by the identity function. The sort key may
|
yuuji@0
|
557 have no meaning to a human. The sort key may not be valid input to
|
yuuji@0
|
558 the collation.
|
yuuji@0
|
559
|
yuuji@0
|
560
|
yuuji@0
|
561
|
yuuji@0
|
562 Newman, et al. Standards Track [Page 10]
|
yuuji@0
|
563
|
yuuji@0
|
564 RFC 4790 Collation Registry March 2007
|
yuuji@0
|
565
|
yuuji@0
|
566
|
yuuji@0
|
567 4.4. Use of Lookup Tables
|
yuuji@0
|
568
|
yuuji@0
|
569 Some collations use customizable lookup tables, e.g., because the
|
yuuji@0
|
570 tables depend on locale, and may be modified after shipping the
|
yuuji@0
|
571 software. Collations that use more than one customizable lookup
|
yuuji@0
|
572 table in a documented format MUST assign numbers to the tables they
|
yuuji@0
|
573 use. This permits an application protocol command to access the
|
yuuji@0
|
574 tables used by a server collation, so that clients and servers use
|
yuuji@0
|
575 the same tables.
|
yuuji@0
|
576
|
yuuji@0
|
577 5. Application Protocol Requirements
|
yuuji@0
|
578
|
yuuji@0
|
579 This section describes the requirements and issues that an
|
yuuji@0
|
580 application protocol needs to consider if it offers searching,
|
yuuji@0
|
581 substring matching and/or sorting, and permits the use of characters
|
yuuji@0
|
582 outside the US-ASCII charset.
|
yuuji@0
|
583
|
yuuji@0
|
584 5.1. Character Encoding
|
yuuji@0
|
585
|
yuuji@0
|
586 The protocol specification has to make sure that it is clear on which
|
yuuji@0
|
587 characters (rather than just octets) the collations are used. This
|
yuuji@0
|
588 can be done by specifying the protocol itself in terms of characters
|
yuuji@0
|
589 (e.g., in the case of a query language), by specifying a single
|
yuuji@0
|
590 character encoding for the protocol (e.g., UTF-8 [3]), or by
|
yuuji@0
|
591 carefully describing the relevant issues of character encoding
|
yuuji@0
|
592 labeling and conversion. In the later case, details to consider
|
yuuji@0
|
593 include how to handle unknown charsets, any charsets that are
|
yuuji@0
|
594 mandatory-to-implement, any issues with byte-order that might apply,
|
yuuji@0
|
595 and any transfer encodings that need to be supported.
|
yuuji@0
|
596
|
yuuji@0
|
597 5.2. Operations
|
yuuji@0
|
598
|
yuuji@0
|
599 The protocol must specify which of the operations defined in this
|
yuuji@0
|
600 specification (equality matching, substring matching, and ordering)
|
yuuji@0
|
601 can be invoked in the protocol, and how they are invoked. There may
|
yuuji@0
|
602 be more than one way to invoke an operation.
|
yuuji@0
|
603
|
yuuji@0
|
604 The protocol MUST provide a mechanism for the client to select the
|
yuuji@0
|
605 collation to use with equality matching, substring matching, and
|
yuuji@0
|
606 ordering.
|
yuuji@0
|
607
|
yuuji@0
|
608 If a protocol needs a total ordering and the collation chosen does
|
yuuji@0
|
609 not provide it because the ordering operation returns "undefined" at
|
yuuji@0
|
610 least once, the recommended fallback is to sort all invalid strings
|
yuuji@0
|
611 after the valid ones, and use i;octet to order the invalid strings.
|
yuuji@0
|
612
|
yuuji@0
|
613 Although the collation's substring function provides a list of
|
yuuji@0
|
614 matches, a protocol need not provide all that to the client. It may
|
yuuji@0
|
615
|
yuuji@0
|
616
|
yuuji@0
|
617
|
yuuji@0
|
618 Newman, et al. Standards Track [Page 11]
|
yuuji@0
|
619
|
yuuji@0
|
620 RFC 4790 Collation Registry March 2007
|
yuuji@0
|
621
|
yuuji@0
|
622
|
yuuji@0
|
623 provide only the first matching substring, or even just the
|
yuuji@0
|
624 information that the substring search matched. In this way,
|
yuuji@0
|
625 collations can be used with protocols that are defined such that "x
|
yuuji@0
|
626 is a substring of y" returns true-false.
|
yuuji@0
|
627
|
yuuji@0
|
628 If the protocol provides positional information for the results of a
|
yuuji@0
|
629 substring match, that positional information SHOULD fully specify the
|
yuuji@0
|
630 substring(s) in the result that matches, independent of the length of
|
yuuji@0
|
631 the search string. For example, returning both the starting and
|
yuuji@0
|
632 ending offset of the match would suffice, as would the starting
|
yuuji@0
|
633 offset and a length. Returning just the starting offset is not
|
yuuji@0
|
634 acceptable. This rule is necessary because advanced collations can
|
yuuji@0
|
635 treat strings of different lengths as equal (for example, pre-
|
yuuji@0
|
636 composed and decomposed accented characters).
|
yuuji@0
|
637
|
yuuji@0
|
638 5.3. Wildcards
|
yuuji@0
|
639
|
yuuji@0
|
640 The protocol MUST specify whether it allows the use of wildcards in
|
yuuji@0
|
641 collation identifiers. If the protocol allows wildcards, then:
|
yuuji@0
|
642 The protocol MUST specify how comparisons behave in the absence of
|
yuuji@0
|
643 explicit collation negotiation, or when a collation of "default"
|
yuuji@0
|
644 is requested. The protocol MAY specify that the default collation
|
yuuji@0
|
645 used in such circumstances is sensitive to server configuration.
|
yuuji@0
|
646
|
yuuji@0
|
647 The protocol SHOULD provide a way to list available collations
|
yuuji@0
|
648 matching a given wildcard pattern, or patterns.
|
yuuji@0
|
649
|
yuuji@0
|
650 5.4. String Comparison
|
yuuji@0
|
651
|
yuuji@0
|
652 If a protocol compares strings in any nontrivial way, using a
|
yuuji@0
|
653 collation may be appropriate. As an example, many protocols use
|
yuuji@0
|
654 case-independent strings. In many cases, a simple ASCII mapping to
|
yuuji@0
|
655 upper/lower case works well. In other cases, it may be better to use
|
yuuji@0
|
656 a specifiable collation; for example, so that a server can treat "i"
|
yuuji@0
|
657 and "I" as equivalent in Italy, and different in Turkey (Turkish also
|
yuuji@0
|
658 has a dotted upper-case" I" and a dotless lower-case "i").
|
yuuji@0
|
659
|
yuuji@0
|
660 Protocol designers should consider, in each case, whether to use a
|
yuuji@0
|
661 specifiable collation. Keywords often have other needs than user
|
yuuji@0
|
662 variables, and search arguments may be different again.
|
yuuji@0
|
663
|
yuuji@0
|
664 5.5. Disconnected Clients
|
yuuji@0
|
665
|
yuuji@0
|
666 If the protocol supports disconnected clients, and a collation is
|
yuuji@0
|
667 used that can use configurable tables (e.g., to support
|
yuuji@0
|
668 locale-specific extensions), then the client may not be able to
|
yuuji@0
|
669 reproduce the server's collation operations while offline.
|
yuuji@0
|
670
|
yuuji@0
|
671
|
yuuji@0
|
672
|
yuuji@0
|
673
|
yuuji@0
|
674 Newman, et al. Standards Track [Page 12]
|
yuuji@0
|
675
|
yuuji@0
|
676 RFC 4790 Collation Registry March 2007
|
yuuji@0
|
677
|
yuuji@0
|
678
|
yuuji@0
|
679 A mechanism to download such tables has been discussed. Such a
|
yuuji@0
|
680 mechanism is not included in the present specification, since the
|
yuuji@0
|
681 problem is not yet well understood.
|
yuuji@0
|
682
|
yuuji@0
|
683 5.6. Error Codes
|
yuuji@0
|
684
|
yuuji@0
|
685 The protocol specification should consider assigning protocol error
|
yuuji@0
|
686 codes for the following circumstances:
|
yuuji@0
|
687
|
yuuji@0
|
688 o The client requests the use of a collation by identifier or
|
yuuji@0
|
689 pattern, but no implemented collation matches that pattern.
|
yuuji@0
|
690
|
yuuji@0
|
691 o The client attempts to use a collation for an operation that is
|
yuuji@0
|
692 not supported by that collation -- for example, attempting to use
|
yuuji@0
|
693 the "i;ascii-numeric" collation for substring matching.
|
yuuji@0
|
694
|
yuuji@0
|
695 o The client uses an equality or substring matching collation, and
|
yuuji@0
|
696 the result is an error. It may be appropriate to distinguish
|
yuuji@0
|
697 between the two input strings, particularly when one is supplied
|
yuuji@0
|
698 by the client and the other is stored by the server. It might
|
yuuji@0
|
699 also be appropriate to distinguish the specific case of an invalid
|
yuuji@0
|
700 UTF-8 string.
|
yuuji@0
|
701
|
yuuji@0
|
702 5.7. Octet Collation
|
yuuji@0
|
703
|
yuuji@0
|
704 The i;octet (Section 9.3) collation is only usable with protocols
|
yuuji@0
|
705 based on octet-strings. Clients and servers MUST NOT use i;octet
|
yuuji@0
|
706 with other protocols.
|
yuuji@0
|
707
|
yuuji@0
|
708 If the protocol permits the use of collations with data structures
|
yuuji@0
|
709 other than strings, the protocol MUST describe the default behavior
|
yuuji@0
|
710 for a collation with those data structures.
|
yuuji@0
|
711
|
yuuji@0
|
712 6. Use by Existing Protocols
|
yuuji@0
|
713
|
yuuji@0
|
714 This section is informative.
|
yuuji@0
|
715
|
yuuji@0
|
716 Both ACAP [11] and Sieve [14] are standards track specifications that
|
yuuji@0
|
717 used collations prior to the creation of this specification and
|
yuuji@0
|
718 registry. Those standards do not meet all the application protocol
|
yuuji@0
|
719 requirements described in Section 5.
|
yuuji@0
|
720
|
yuuji@0
|
721 These protocols allow the use of the i;octet (Section 9.3) collation
|
yuuji@0
|
722 working directly on UTF-8 data, as used in these protocols.
|
yuuji@0
|
723
|
yuuji@0
|
724
|
yuuji@0
|
725
|
yuuji@0
|
726
|
yuuji@0
|
727
|
yuuji@0
|
728
|
yuuji@0
|
729
|
yuuji@0
|
730 Newman, et al. Standards Track [Page 13]
|
yuuji@0
|
731
|
yuuji@0
|
732 RFC 4790 Collation Registry March 2007
|
yuuji@0
|
733
|
yuuji@0
|
734
|
yuuji@0
|
735 In Sieve, all matches are either true or false. Accordingly, Sieve
|
yuuji@0
|
736 servers must treat "undefined" and "no-match" results of the equality
|
yuuji@0
|
737 and substring operations as false, and only "match" as true.
|
yuuji@0
|
738
|
yuuji@0
|
739 In ACAP and Sieve, there are no invalid strings. In this document's
|
yuuji@0
|
740 terms, invalid strings sort after valid strings.
|
yuuji@0
|
741
|
yuuji@0
|
742 IMAP [15] also collates, although that is explicit only when the
|
yuuji@0
|
743 COMPARATOR [17] extension is used. The built-in IMAP substring
|
yuuji@0
|
744 operation and the ordering provided by the SORT [16] extension may
|
yuuji@0
|
745 not meet the requirements made in this document.
|
yuuji@0
|
746
|
yuuji@0
|
747 Other protocols may be in a similar position.
|
yuuji@0
|
748
|
yuuji@0
|
749 In IMAP, the default collation is i;ascii-casemap, because its
|
yuuji@0
|
750 operations are understood to match IMAP's built-in operations.
|
yuuji@0
|
751
|
yuuji@0
|
752 7. Collation Registration
|
yuuji@0
|
753
|
yuuji@0
|
754 7.1. Collation Registration Procedure
|
yuuji@0
|
755
|
yuuji@0
|
756 The IETF will create a mailing list, collation@ietf.org, which can be
|
yuuji@0
|
757 used for public discussion of collation proposals prior to
|
yuuji@0
|
758 registration. Use of the mailing list is strongly encouraged. The
|
yuuji@0
|
759 IESG will appoint a designated expert who will monitor the
|
yuuji@0
|
760 collation@ietf.org mailing list and review registrations.
|
yuuji@0
|
761
|
yuuji@0
|
762 The registration procedure begins when a completed registration
|
yuuji@0
|
763 template is sent to iana@iana.org and collation@ietf.org. The
|
yuuji@0
|
764 designated expert is expected to tell IANA and the submitter of the
|
yuuji@0
|
765 registration within two weeks whether the registration is approved,
|
yuuji@0
|
766 approved with minor changes, or rejected with cause. When a
|
yuuji@0
|
767 registration is rejected with cause, it can be re-submitted if the
|
yuuji@0
|
768 concerns listed in the cause are addressed. Decisions made by the
|
yuuji@0
|
769 designated expert can be appealed to the IESG Applications Area
|
yuuji@0
|
770 Director, then to the IESG. They follow the normal appeals procedure
|
yuuji@0
|
771 for IESG decisions.
|
yuuji@0
|
772
|
yuuji@0
|
773 Collation registrations in a standards track, BCP, or IESG-approved
|
yuuji@0
|
774 experimental RFC are owned by the IETF, and changes to the
|
yuuji@0
|
775 registration follow normal procedures for updating such documents.
|
yuuji@0
|
776 Collation registrations in other RFCs are owned by the RFC author(s).
|
yuuji@0
|
777 Other collation registrations are owned by the individual(s) listed
|
yuuji@0
|
778 in the contact field of the registration, and IANA will preserve this
|
yuuji@0
|
779 information.
|
yuuji@0
|
780
|
yuuji@0
|
781 If the registration is a change of an existing collation, it MUST be
|
yuuji@0
|
782 approved by the owner. In the event the owner cannot be contacted
|
yuuji@0
|
783
|
yuuji@0
|
784
|
yuuji@0
|
785
|
yuuji@0
|
786 Newman, et al. Standards Track [Page 14]
|
yuuji@0
|
787
|
yuuji@0
|
788 RFC 4790 Collation Registry March 2007
|
yuuji@0
|
789
|
yuuji@0
|
790
|
yuuji@0
|
791 for a period of one month, and the designated expert deems the change
|
yuuji@0
|
792 necessary, the IESG MAY re-assign ownership to an appropriate party.
|
yuuji@0
|
793
|
yuuji@0
|
794 7.2. Collation Registration Format
|
yuuji@0
|
795
|
yuuji@0
|
796 Registration of a collation is done by sending a well-formed XML
|
yuuji@0
|
797 document to collation@ietf.org and iana@iana.org.
|
yuuji@0
|
798
|
yuuji@0
|
799 7.2.1. Registration Template
|
yuuji@0
|
800
|
yuuji@0
|
801 Here is a template for the registration:
|
yuuji@0
|
802
|
yuuji@0
|
803 <?xml version='1.0'?>
|
yuuji@0
|
804 <!DOCTYPE collation SYSTEM 'collationreg.dtd'>
|
yuuji@0
|
805 <collation rfc="YYYY" scope="global" intendedUse="common">
|
yuuji@0
|
806 <identifier>collation identifier</identifier>
|
yuuji@0
|
807 <title>technical title for collation</title>
|
yuuji@0
|
808 <operations>equality order substring</operations>
|
yuuji@0
|
809 <specification>specification reference</specification>
|
yuuji@0
|
810 <owner>email address of owner or IETF</owner>
|
yuuji@0
|
811 <submitter>email address of submitter</submitter>
|
yuuji@0
|
812 <version>1</version>
|
yuuji@0
|
813 </collation>
|
yuuji@0
|
814
|
yuuji@0
|
815 7.2.2. The Collation Element
|
yuuji@0
|
816
|
yuuji@0
|
817 The root of the registration document MUST be a <collation> element.
|
yuuji@0
|
818 The collation element contains the other elements in the
|
yuuji@0
|
819 registration, which are described in the following sub-subsections,
|
yuuji@0
|
820 in the order given here.
|
yuuji@0
|
821
|
yuuji@0
|
822 The <collation> element MAY include an "rfc=" attribute if the
|
yuuji@0
|
823 specification is in an RFC. The "rfc=" attribute gives only the
|
yuuji@0
|
824 number of the RFC, without any prefix, such as "RFC", or suffix, such
|
yuuji@0
|
825 as ".txt".
|
yuuji@0
|
826
|
yuuji@0
|
827 The <collation> element MUST include a "scope=" attribute, which MUST
|
yuuji@0
|
828 have one of the values "global", "local", or "other".
|
yuuji@0
|
829
|
yuuji@0
|
830 The <collation> element MUST include an "intendedUse=" attribute,
|
yuuji@0
|
831 which must have one of the values "common", "limited", "vendor", or
|
yuuji@0
|
832 "deprecated". Collation specifications intended for "common" use are
|
yuuji@0
|
833 expected to reference standards from standards bodies with
|
yuuji@0
|
834 significant experience dealing with the details of international
|
yuuji@0
|
835 character sets.
|
yuuji@0
|
836
|
yuuji@0
|
837 Be aware that future revisions of this specification may add
|
yuuji@0
|
838 additional function types, as well as additional XML attributes,
|
yuuji@0
|
839
|
yuuji@0
|
840
|
yuuji@0
|
841
|
yuuji@0
|
842 Newman, et al. Standards Track [Page 15]
|
yuuji@0
|
843
|
yuuji@0
|
844 RFC 4790 Collation Registry March 2007
|
yuuji@0
|
845
|
yuuji@0
|
846
|
yuuji@0
|
847 values, and elements. Any system that automatically parses these XML
|
yuuji@0
|
848 documents MUST take this into account to preserve future
|
yuuji@0
|
849 compatibility.
|
yuuji@0
|
850
|
yuuji@0
|
851 7.2.3. The Identifier Element
|
yuuji@0
|
852
|
yuuji@0
|
853 The <identifier> element gives the precise identifier of the
|
yuuji@0
|
854 collation, e.g., i;ascii-casemap. The <identifier> element is
|
yuuji@0
|
855 mandatory.
|
yuuji@0
|
856
|
yuuji@0
|
857 7.2.4. The Title Element
|
yuuji@0
|
858
|
yuuji@0
|
859 The <title> element gives the title of the collation. The <title>
|
yuuji@0
|
860 element is mandatory.
|
yuuji@0
|
861
|
yuuji@0
|
862 7.2.5. The Operations Element
|
yuuji@0
|
863
|
yuuji@0
|
864 The <operations> element lists which of the three operations
|
yuuji@0
|
865 ("equality", "order" or "substring") the collation provides,
|
yuuji@0
|
866 separated by single spaces. The <operations> element is mandatory.
|
yuuji@0
|
867
|
yuuji@0
|
868 7.2.6. The Specification Element
|
yuuji@0
|
869
|
yuuji@0
|
870 The <specification> element describes where to find the
|
yuuji@0
|
871 specification. The <specification> element is mandatory. It MAY
|
yuuji@0
|
872 have a URI attribute. There may be more than one <specification>
|
yuuji@0
|
873 element, in which case, they together form the specification.
|
yuuji@0
|
874
|
yuuji@0
|
875 If it is discovered that parts of a collation specification conflict,
|
yuuji@0
|
876 a new revision of the collation is necessary, and the
|
yuuji@0
|
877 collation@ietf.org mailing list should be notified.
|
yuuji@0
|
878
|
yuuji@0
|
879 7.2.7. The Submitter Element
|
yuuji@0
|
880
|
yuuji@0
|
881 The <submitter> element provides an RFC 2822 [12] email address for
|
yuuji@0
|
882 the person who submitted the registration. It is optional if the
|
yuuji@0
|
883 <owner> element contains an email address.
|
yuuji@0
|
884
|
yuuji@0
|
885 There may be more than one <submitter> element.
|
yuuji@0
|
886
|
yuuji@0
|
887 7.2.8. The Owner Element
|
yuuji@0
|
888
|
yuuji@0
|
889 The <owner> element contains either the four letters "IETF" or an
|
yuuji@0
|
890 email address of the owner of the registration. The <owner> element
|
yuuji@0
|
891 is mandatory. There may be more than one <owner> element. If so,
|
yuuji@0
|
892 all owners are equal. Each owner can speak for all.
|
yuuji@0
|
893
|
yuuji@0
|
894
|
yuuji@0
|
895
|
yuuji@0
|
896
|
yuuji@0
|
897
|
yuuji@0
|
898 Newman, et al. Standards Track [Page 16]
|
yuuji@0
|
899
|
yuuji@0
|
900 RFC 4790 Collation Registry March 2007
|
yuuji@0
|
901
|
yuuji@0
|
902
|
yuuji@0
|
903 7.2.9. The Version Element
|
yuuji@0
|
904
|
yuuji@0
|
905 The <version> element MUST be included when the registration is
|
yuuji@0
|
906 likely to be revised, or has been revised in such a way that the
|
yuuji@0
|
907 results change for one or more input strings. The <version> element
|
yuuji@0
|
908 is optional.
|
yuuji@0
|
909
|
yuuji@0
|
910 7.2.10. The Variable Element
|
yuuji@0
|
911
|
yuuji@0
|
912 The <variable> element specifies an optional variable to control the
|
yuuji@0
|
913 collation's behaviour, for example whether it is case sensitive. The
|
yuuji@0
|
914 <variable> element is optional. When <variable> is used, it must
|
yuuji@0
|
915 contain <name> and <default> elements, and it may contain one or more
|
yuuji@0
|
916 <value> elements.
|
yuuji@0
|
917
|
yuuji@0
|
918 7.2.10.1. The Name Element
|
yuuji@0
|
919
|
yuuji@0
|
920 The <name> element specifies the name value of a variable. The
|
yuuji@0
|
921 <name> element is mandatory.
|
yuuji@0
|
922
|
yuuji@0
|
923 7.2.10.2. The Default Element
|
yuuji@0
|
924
|
yuuji@0
|
925 The <default> element specifies the default value of a variable. The
|
yuuji@0
|
926 <default> element is mandatory.
|
yuuji@0
|
927
|
yuuji@0
|
928 7.2.10.3. The Value Element
|
yuuji@0
|
929
|
yuuji@0
|
930 The <value> element specifies a legal value of a variable. The
|
yuuji@0
|
931 <value> element is optional. If one or more <value> elements are
|
yuuji@0
|
932 present, only those values are legal. If none are, then the
|
yuuji@0
|
933 variable's legal values do not form an enumerated set, and the rules
|
yuuji@0
|
934 MUST be specified in an RFC accompanying the registration.
|
yuuji@0
|
935
|
yuuji@0
|
936 7.3. Structure of Collation Registry
|
yuuji@0
|
937
|
yuuji@0
|
938 Once the registration is approved, IANA will store each XML
|
yuuji@0
|
939 registration document in a URL of the form
|
yuuji@0
|
940 http://www.iana.org/assignments/collation/collation-id.xml, where
|
yuuji@0
|
941 collation-id is the content of the identifier element in the
|
yuuji@0
|
942 registration. Both the submitter and the designated expert are
|
yuuji@0
|
943 responsible for verifying that the XML is well-formed. The
|
yuuji@0
|
944 registration document should avoid using new elements. If any are
|
yuuji@0
|
945 necessary, it is important to be consistent with other registrations.
|
yuuji@0
|
946
|
yuuji@0
|
947 IANA will also maintain a text summary of the registry under the name
|
yuuji@0
|
948 http://www.iana.org/assignments/collation/collation-index.html. This
|
yuuji@0
|
949 summary is divided into four sections. The first section is for
|
yuuji@0
|
950 collations intended for common use. This section is intended for
|
yuuji@0
|
951
|
yuuji@0
|
952
|
yuuji@0
|
953
|
yuuji@0
|
954 Newman, et al. Standards Track [Page 17]
|
yuuji@0
|
955
|
yuuji@0
|
956 RFC 4790 Collation Registry March 2007
|
yuuji@0
|
957
|
yuuji@0
|
958
|
yuuji@0
|
959 collation registrations published in IESG-approved RFCs, or for
|
yuuji@0
|
960 locally scoped collations from the primary standards body for that
|
yuuji@0
|
961 locale. The designated expert is encouraged to reject collation
|
yuuji@0
|
962 registrations with an intended use of "common" if the expert believes
|
yuuji@0
|
963 it should be "limited", as it is desirable to keep the number of
|
yuuji@0
|
964 "common" registrations small and of high quality. The second section
|
yuuji@0
|
965 is reserved for limited-use collations. The third section is
|
yuuji@0
|
966 reserved for registered vendor-specific collations. The final
|
yuuji@0
|
967 section is reserved for deprecated collations.
|
yuuji@0
|
968
|
yuuji@0
|
969 7.4. Example Initial Registry Summary
|
yuuji@0
|
970
|
yuuji@0
|
971 The following is an example of how IANA might structure the initial
|
yuuji@0
|
972 registry summary.html file:
|
yuuji@0
|
973
|
yuuji@0
|
974 Collation Functions Scope Reference
|
yuuji@0
|
975 --------- --------- ----- ---------
|
yuuji@0
|
976 Common Use Collations:
|
yuuji@0
|
977 i;ascii-casemap e, o, s Local [RFC 4790]
|
yuuji@0
|
978
|
yuuji@0
|
979 Limited Use Collations:
|
yuuji@0
|
980 i;octet e, o, s Other [RFC 4790]
|
yuuji@0
|
981 i;ascii-numeric e, o Other [RFC 4790]
|
yuuji@0
|
982
|
yuuji@0
|
983 Vendor Collations:
|
yuuji@0
|
984
|
yuuji@0
|
985 Deprecated Collations:
|
yuuji@0
|
986
|
yuuji@0
|
987
|
yuuji@0
|
988 References
|
yuuji@0
|
989 ----------
|
yuuji@0
|
990 [RFC 4790] Newman, C., Duerst, M., Gulbrandsen, A., "Internet
|
yuuji@0
|
991 Application Protocol Collation Registry", RFC 4790,
|
yuuji@0
|
992 Sun Microsystems, March 2007.
|
yuuji@0
|
993
|
yuuji@0
|
994 8. Guidelines for Expert Reviewer
|
yuuji@0
|
995
|
yuuji@0
|
996 The expert reviewer appointed by the IESG has fairly broad latitude
|
yuuji@0
|
997 for this registry. While a number of collations are expected
|
yuuji@0
|
998 (particularly customizations of the UCA for localized use), an
|
yuuji@0
|
999 explosion of collations (particularly common-use collations) is not
|
yuuji@0
|
1000 desirable for widespread interoperability. However, it is important
|
yuuji@0
|
1001 for the expert reviewer to provide cause when rejecting a
|
yuuji@0
|
1002 registration, and, when possible, to describe corrective action to
|
yuuji@0
|
1003
|
yuuji@0
|
1004
|
yuuji@0
|
1005
|
yuuji@0
|
1006
|
yuuji@0
|
1007
|
yuuji@0
|
1008
|
yuuji@0
|
1009
|
yuuji@0
|
1010 Newman, et al. Standards Track [Page 18]
|
yuuji@0
|
1011
|
yuuji@0
|
1012 RFC 4790 Collation Registry March 2007
|
yuuji@0
|
1013
|
yuuji@0
|
1014
|
yuuji@0
|
1015 permit the registration to proceed. The following table includes
|
yuuji@0
|
1016 some example reasons to reject a registration with cause:
|
yuuji@0
|
1017
|
yuuji@0
|
1018 o The registration is not a well-formed XML document.
|
yuuji@0
|
1019
|
yuuji@0
|
1020 o The registration has an intended use of "common", but there is no
|
yuuji@0
|
1021 evidence the collation will be widely deployed, so it should be
|
yuuji@0
|
1022 listed as "limited".
|
yuuji@0
|
1023
|
yuuji@0
|
1024 o The registration has an intended use of "common", but it is
|
yuuji@0
|
1025 redundant with the functionality of a previously registered
|
yuuji@0
|
1026 "common" collation.
|
yuuji@0
|
1027
|
yuuji@0
|
1028 o The registration has an intended use of "common", but the
|
yuuji@0
|
1029 specification is not detailed enough to allow interoperable
|
yuuji@0
|
1030 implementations by others.
|
yuuji@0
|
1031
|
yuuji@0
|
1032 o The collation identifier fails to precisely identify the version
|
yuuji@0
|
1033 numbers of relevant tables to use.
|
yuuji@0
|
1034
|
yuuji@0
|
1035 o The registration fails to meet one of the "MUST" requirements in
|
yuuji@0
|
1036 Section 4.
|
yuuji@0
|
1037
|
yuuji@0
|
1038 o The collation identifier fails to meet the syntax in Section 3.
|
yuuji@0
|
1039
|
yuuji@0
|
1040 o The collation specification referenced in the registration is
|
yuuji@0
|
1041 vague or has optional features without a clear behavior specified.
|
yuuji@0
|
1042
|
yuuji@0
|
1043 o The referenced specification does not adequately address security
|
yuuji@0
|
1044 considerations specific to that collation.
|
yuuji@0
|
1045
|
yuuji@0
|
1046 o The registration's operations are needlessly different from those
|
yuuji@0
|
1047 of traditional operations.
|
yuuji@0
|
1048
|
yuuji@0
|
1049 o The registration's XML is needlessly different from that of
|
yuuji@0
|
1050 already registered collations.
|
yuuji@0
|
1051
|
yuuji@0
|
1052 9. Initial Collations
|
yuuji@0
|
1053
|
yuuji@0
|
1054 This section registers the three collations that were originally
|
yuuji@0
|
1055 defined in [11], and are implemented in most [14] engines. Some of
|
yuuji@0
|
1056 the behavior of these collations is perhaps not ideal, such as
|
yuuji@0
|
1057 i;ascii-casemap accepting non-ASCII input. Compatibility with widely
|
yuuji@0
|
1058 deployed code was judged more important than fixing the collations.
|
yuuji@0
|
1059 Some of the aspects of these collations are necessary to maintain
|
yuuji@0
|
1060 compatibility with widely deployed code.
|
yuuji@0
|
1061
|
yuuji@0
|
1062
|
yuuji@0
|
1063
|
yuuji@0
|
1064
|
yuuji@0
|
1065
|
yuuji@0
|
1066 Newman, et al. Standards Track [Page 19]
|
yuuji@0
|
1067
|
yuuji@0
|
1068 RFC 4790 Collation Registry March 2007
|
yuuji@0
|
1069
|
yuuji@0
|
1070
|
yuuji@0
|
1071 9.1. ASCII Numeric Collation
|
yuuji@0
|
1072
|
yuuji@0
|
1073 9.1.1. ASCII Numeric Collation Description
|
yuuji@0
|
1074
|
yuuji@0
|
1075 The "i;ascii-numeric" collation is a simple collation intended for
|
yuuji@0
|
1076 use with arbitrarily-sized, unsigned decimal integer numbers stored
|
yuuji@0
|
1077 as octet strings. US-ASCII digits (0x30 to 0x39) represent digits of
|
yuuji@0
|
1078 the numbers. Before converting from string to integer, the input
|
yuuji@0
|
1079 string is truncated at the first non-digit character. All input is
|
yuuji@0
|
1080 valid; strings that do not start with a digit represent positive
|
yuuji@0
|
1081 infinity.
|
yuuji@0
|
1082
|
yuuji@0
|
1083 The collation supports equality and ordering, but does not support
|
yuuji@0
|
1084 the substring operation.
|
yuuji@0
|
1085
|
yuuji@0
|
1086 The equality operation returns "match" if the two strings represent
|
yuuji@0
|
1087 the same number (i.e., leading zeroes and trailing non-digits are
|
yuuji@0
|
1088 disregarded), and "no-match" if the two strings represent different
|
yuuji@0
|
1089 numbers.
|
yuuji@0
|
1090
|
yuuji@0
|
1091 The ordering operation returns "less" if the first string represents
|
yuuji@0
|
1092 a smaller number than the second, "equal" if they represent the same
|
yuuji@0
|
1093 number, and "greater" if the first string represents a larger number
|
yuuji@0
|
1094 than the second.
|
yuuji@0
|
1095
|
yuuji@0
|
1096 Some examples: "0" is less than "1", and "1" is less than
|
yuuji@0
|
1097 "4294967298". "4294967298", "04294967298", and "4294967298b" are all
|
yuuji@0
|
1098 equal. "04294967298" is less than "". "", "x", and "y" are equal.
|
yuuji@0
|
1099
|
yuuji@0
|
1100 9.1.2. ASCII Numeric Collation Registration
|
yuuji@0
|
1101
|
yuuji@0
|
1102 <?xml version='1.0'?>
|
yuuji@0
|
1103 <!DOCTYPE collation SYSTEM 'collationreg.dtd'>
|
yuuji@0
|
1104 <collation rfc="4790" scope="other" intendedUse="limited">
|
yuuji@0
|
1105 <identifier>i;ascii-numeric</identifier>
|
yuuji@0
|
1106 <title>ASCII Numeric</title>
|
yuuji@0
|
1107 <operations>equality order</operations>
|
yuuji@0
|
1108 <specification>RFC 4790</specification>
|
yuuji@0
|
1109 <owner>IETF</owner>
|
yuuji@0
|
1110 <submitter>chris.newman@sun.com</submitter>
|
yuuji@0
|
1111 </collation>
|
yuuji@0
|
1112
|
yuuji@0
|
1113
|
yuuji@0
|
1114
|
yuuji@0
|
1115
|
yuuji@0
|
1116
|
yuuji@0
|
1117
|
yuuji@0
|
1118
|
yuuji@0
|
1119
|
yuuji@0
|
1120
|
yuuji@0
|
1121
|
yuuji@0
|
1122 Newman, et al. Standards Track [Page 20]
|
yuuji@0
|
1123
|
yuuji@0
|
1124 RFC 4790 Collation Registry March 2007
|
yuuji@0
|
1125
|
yuuji@0
|
1126
|
yuuji@0
|
1127 9.2. ASCII Casemap Collation
|
yuuji@0
|
1128
|
yuuji@0
|
1129 9.2.1. ASCII Casemap Collation Description
|
yuuji@0
|
1130
|
yuuji@0
|
1131 The "i;ascii-casemap" collation is a simple collation that operates
|
yuuji@0
|
1132 on octet strings and treats US-ASCII letters case-insensitively. It
|
yuuji@0
|
1133 provides equality, substring, and ordering operations. All input is
|
yuuji@0
|
1134 valid. Note that letters outside ASCII are not treated case-
|
yuuji@0
|
1135 insensitively.
|
yuuji@0
|
1136
|
yuuji@0
|
1137 Its equality, ordering, and substring operations are as for i;octet,
|
yuuji@0
|
1138 except that at first, the lower-case letters (octet values 97-122) in
|
yuuji@0
|
1139 each input string are changed to upper case (octet values 65-90).
|
yuuji@0
|
1140
|
yuuji@0
|
1141 Care should be taken when using OS-supplied functions to implement
|
yuuji@0
|
1142 this collation, as it is not locale sensitive. Functions, such as
|
yuuji@0
|
1143 strcasecmp and toupper, are sometimes locale sensitive, and may
|
yuuji@0
|
1144 inappropriately map lower-case letters other than a-z to upper case.
|
yuuji@0
|
1145
|
yuuji@0
|
1146 The i;ascii-casemap collation is well-suited for use with many
|
yuuji@0
|
1147 Internet protocols and computer languages. Use with natural language
|
yuuji@0
|
1148 is often inappropriate; even though the collation apparently supports
|
yuuji@0
|
1149 languages such as Swahili and English, in real-world use, it tends to
|
yuuji@0
|
1150 mis-sort a number of types of string:
|
yuuji@0
|
1151
|
yuuji@0
|
1152 o people and place names containing non-ASCII,
|
yuuji@0
|
1153
|
yuuji@0
|
1154 o words such as "naive" (if spelled with an accent, the accented
|
yuuji@0
|
1155 character could push the word to the wrong spot in a sorted list),
|
yuuji@0
|
1156
|
yuuji@0
|
1157 o names such as "Lloyd" (which, in Welsh, sorts after "Lyon", unlike
|
yuuji@0
|
1158 in English),
|
yuuji@0
|
1159
|
yuuji@0
|
1160 o strings containing euro and pound sterling symbols, quotation
|
yuuji@0
|
1161 marks other than '"', dashes/hyphens, etc.
|
yuuji@0
|
1162
|
yuuji@0
|
1163
|
yuuji@0
|
1164
|
yuuji@0
|
1165
|
yuuji@0
|
1166
|
yuuji@0
|
1167
|
yuuji@0
|
1168
|
yuuji@0
|
1169
|
yuuji@0
|
1170
|
yuuji@0
|
1171
|
yuuji@0
|
1172
|
yuuji@0
|
1173
|
yuuji@0
|
1174
|
yuuji@0
|
1175
|
yuuji@0
|
1176
|
yuuji@0
|
1177
|
yuuji@0
|
1178 Newman, et al. Standards Track [Page 21]
|
yuuji@0
|
1179
|
yuuji@0
|
1180 RFC 4790 Collation Registry March 2007
|
yuuji@0
|
1181
|
yuuji@0
|
1182
|
yuuji@0
|
1183 9.2.2. ASCII Casemap Collation Registration
|
yuuji@0
|
1184
|
yuuji@0
|
1185 <?xml version='1.0'?>
|
yuuji@0
|
1186 <!DOCTYPE collation SYSTEM 'collationreg.dtd'>
|
yuuji@0
|
1187 <collation rfc="4790" scope="local" intendedUse="common">
|
yuuji@0
|
1188 <identifier>i;ascii-casemap</identifier>
|
yuuji@0
|
1189 <title>ASCII Casemap</title>
|
yuuji@0
|
1190 <operations>equality order substring</operations>
|
yuuji@0
|
1191 <specification>RFC 4790</specification>
|
yuuji@0
|
1192 <owner>IETF</owner>
|
yuuji@0
|
1193 <submitter>chris.newman@sun.com</submitter>
|
yuuji@0
|
1194 </collation>
|
yuuji@0
|
1195
|
yuuji@0
|
1196 9.3. Octet Collation
|
yuuji@0
|
1197
|
yuuji@0
|
1198 9.3.1. Octet Collation Description
|
yuuji@0
|
1199
|
yuuji@0
|
1200 The "i;octet" collation is a simple and fast collation intended for
|
yuuji@0
|
1201 use on binary octet strings rather than on character data. Protocols
|
yuuji@0
|
1202 that want to make this collation available have to do so by
|
yuuji@0
|
1203 explicitly allowing it. If not explicitly allowed, it MUST NOT be
|
yuuji@0
|
1204 used. It never returns an "undefined" result. It provides equality,
|
yuuji@0
|
1205 substring, and ordering operations.
|
yuuji@0
|
1206
|
yuuji@0
|
1207 The ordering algorithm is as follows:
|
yuuji@0
|
1208
|
yuuji@0
|
1209 1. If both strings are the empty string, return the result "equal".
|
yuuji@0
|
1210
|
yuuji@0
|
1211 2. If the first string is empty and the second is not, return the
|
yuuji@0
|
1212 result "less".
|
yuuji@0
|
1213
|
yuuji@0
|
1214 3. If the second string is empty and the first is not, return the
|
yuuji@0
|
1215 result "greater".
|
yuuji@0
|
1216
|
yuuji@0
|
1217 4. If both strings begin with the same octet value, remove the first
|
yuuji@0
|
1218 octet from both strings and repeat this algorithm from step 1.
|
yuuji@0
|
1219
|
yuuji@0
|
1220 5. If the unsigned value (0 to 255) of the first octet of the first
|
yuuji@0
|
1221 string is less than the unsigned value of the first octet of the
|
yuuji@0
|
1222 second string, then return "less".
|
yuuji@0
|
1223
|
yuuji@0
|
1224 6. If this step is reached, return "greater".
|
yuuji@0
|
1225
|
yuuji@0
|
1226 This algorithm is roughly equivalent to the C library function
|
yuuji@0
|
1227 memcmp, with appropriate length checks added.
|
yuuji@0
|
1228
|
yuuji@0
|
1229
|
yuuji@0
|
1230
|
yuuji@0
|
1231
|
yuuji@0
|
1232
|
yuuji@0
|
1233
|
yuuji@0
|
1234 Newman, et al. Standards Track [Page 22]
|
yuuji@0
|
1235
|
yuuji@0
|
1236 RFC 4790 Collation Registry March 2007
|
yuuji@0
|
1237
|
yuuji@0
|
1238
|
yuuji@0
|
1239 The matching operation returns "match" if the sorting algorithm would
|
yuuji@0
|
1240 return "equal". Otherwise, the matching operation returns "no-
|
yuuji@0
|
1241 match".
|
yuuji@0
|
1242
|
yuuji@0
|
1243 The substring operation returns "match" if the first string is the
|
yuuji@0
|
1244 empty string, or if there exists a substring of the second string of
|
yuuji@0
|
1245 length equal to the length of the first string, which would result in
|
yuuji@0
|
1246 a "match" result from the equality function. Otherwise, the
|
yuuji@0
|
1247 substring operation returns "no-match".
|
yuuji@0
|
1248
|
yuuji@0
|
1249 9.3.2. Octet Collation Registration
|
yuuji@0
|
1250
|
yuuji@0
|
1251 This collation is defined with intendedUse="limited" because it can
|
yuuji@0
|
1252 only be used by protocols that explicitly allow it.
|
yuuji@0
|
1253
|
yuuji@0
|
1254 <?xml version='1.0'?>
|
yuuji@0
|
1255 <!DOCTYPE collation SYSTEM 'collationreg.dtd'>
|
yuuji@0
|
1256 <collation rfc="4790" scope="global" intendedUse="limited">
|
yuuji@0
|
1257 <identifier>i;octet</identifier>
|
yuuji@0
|
1258 <title>Octet</title>
|
yuuji@0
|
1259 <operations>equality order substring</operations>
|
yuuji@0
|
1260 <specification>RFC 4790</specification>
|
yuuji@0
|
1261 <owner>IETF</owner>
|
yuuji@0
|
1262 <submitter>chris.newman@sun.com</submitter>
|
yuuji@0
|
1263 </collation>
|
yuuji@0
|
1264
|
yuuji@0
|
1265 10. IANA Considerations
|
yuuji@0
|
1266
|
yuuji@0
|
1267 Section 7 defines how to register collations with IANA. Section 9
|
yuuji@0
|
1268 defines a list of predefined collations that have been registered
|
yuuji@0
|
1269 with IANA.
|
yuuji@0
|
1270
|
yuuji@0
|
1271 11. Security Considerations
|
yuuji@0
|
1272
|
yuuji@0
|
1273 Collations will normally be used with UTF-8 strings. Thus, the
|
yuuji@0
|
1274 security considerations for UTF-8 [3], stringprep [6], and Unicode
|
yuuji@0
|
1275 TR-36 [8] also apply, and are normative to this specification.
|
yuuji@0
|
1276
|
yuuji@0
|
1277 12. Acknowledgements
|
yuuji@0
|
1278
|
yuuji@0
|
1279 The authors want to thank all who have contributed to this document,
|
yuuji@0
|
1280 including Brian Carpenter, John Cowan, Dave Cridland, Mark Davis,
|
yuuji@0
|
1281 Spencer Dawkins, Lisa Dusseault, Lars Eggert, Frank Ellermann, Philip
|
yuuji@0
|
1282 Guenther, Tony Hansen, Ted Hardie, Sam Hartman, Kjetil Torgrim Homme,
|
yuuji@0
|
1283 Michael Kay, John Klensin, Alexey Melnikov, Jim Melton, and Abhijit
|
yuuji@0
|
1284 Menon-Sen.
|
yuuji@0
|
1285
|
yuuji@0
|
1286
|
yuuji@0
|
1287
|
yuuji@0
|
1288
|
yuuji@0
|
1289
|
yuuji@0
|
1290 Newman, et al. Standards Track [Page 23]
|
yuuji@0
|
1291
|
yuuji@0
|
1292 RFC 4790 Collation Registry March 2007
|
yuuji@0
|
1293
|
yuuji@0
|
1294
|
yuuji@0
|
1295 13. References
|
yuuji@0
|
1296
|
yuuji@0
|
1297 13.1. Normative References
|
yuuji@0
|
1298
|
yuuji@0
|
1299 [1] Bradner, S., "Key words for use in RFCs to Indicate Requirement
|
yuuji@0
|
1300 Levels", BCP 14, RFC 2119, March 1997.
|
yuuji@0
|
1301
|
yuuji@0
|
1302 [2] Crocker, D. and P. Overell, "Augmented BNF for Syntax
|
yuuji@0
|
1303 Specifications: ABNF", RFC 4234, October 2005.
|
yuuji@0
|
1304
|
yuuji@0
|
1305 [3] Yergeau, F., "UTF-8, a transformation format of ISO 10646",
|
yuuji@0
|
1306 STD 63, RFC 3629, November 2003.
|
yuuji@0
|
1307
|
yuuji@0
|
1308 [4] Berners-Lee, T., Fielding, R., and L. Masinter, "Uniform
|
yuuji@0
|
1309 Resource Identifier (URI): Generic Syntax", RFC 3986,
|
yuuji@0
|
1310 January 2005.
|
yuuji@0
|
1311
|
yuuji@0
|
1312 [5] Phillips, A. and M. Davis, "Tags for Identifying Languages",
|
yuuji@0
|
1313 BCP 47, RFC 4646, September 2006.
|
yuuji@0
|
1314
|
yuuji@0
|
1315 [6] Hoffman, P. and M. Blanchet, "Preparation of Internationalized
|
yuuji@0
|
1316 Strings ("stringprep")", RFC 3454, December 2002.
|
yuuji@0
|
1317
|
yuuji@0
|
1318 [7] Davis, M. and K. Whistler, "Unicode Collation Algorithm version
|
yuuji@0
|
1319 14", May 2005,
|
yuuji@0
|
1320 <http://www.unicode.org/reports/tr10/tr10-14.html>.
|
yuuji@0
|
1321
|
yuuji@0
|
1322 [8] Davis, M. and M. Suignard, "Unicode Security Considerations",
|
yuuji@0
|
1323 February 2006, <http://www.unicode.org/reports/tr36/>.
|
yuuji@0
|
1324
|
yuuji@0
|
1325 13.2. Informative References
|
yuuji@0
|
1326
|
yuuji@0
|
1327 [9] Freed, N. and N. Borenstein, "Multipurpose Internet Mail
|
yuuji@0
|
1328 Extensions (MIME) Part One: Format of Internet Message Bodies",
|
yuuji@0
|
1329 RFC 2045, November 1996.
|
yuuji@0
|
1330
|
yuuji@0
|
1331 [10] Melnikov, A., "Simple Authentication and Security Layer
|
yuuji@0
|
1332 (SASL)", RFC 4422, June 2006.
|
yuuji@0
|
1333
|
yuuji@0
|
1334 [11] Newman, C. and J. Myers, "ACAP -- Application Configuration
|
yuuji@0
|
1335 Access Protocol", RFC 2244, November 1997.
|
yuuji@0
|
1336
|
yuuji@0
|
1337 [12] Resnick, P., "Internet Message Format", RFC 2822, April 2001.
|
yuuji@0
|
1338
|
yuuji@0
|
1339 [13] Freed, N. and J. Postel, "IANA Charset Registration
|
yuuji@0
|
1340 Procedures", BCP 19, RFC 2978, October 2000.
|
yuuji@0
|
1341
|
yuuji@0
|
1342
|
yuuji@0
|
1343
|
yuuji@0
|
1344
|
yuuji@0
|
1345
|
yuuji@0
|
1346 Newman, et al. Standards Track [Page 24]
|
yuuji@0
|
1347
|
yuuji@0
|
1348 RFC 4790 Collation Registry March 2007
|
yuuji@0
|
1349
|
yuuji@0
|
1350
|
yuuji@0
|
1351 [14] Showalter, T., "Sieve: A Mail Filtering Language", RFC 3028,
|
yuuji@0
|
1352 January 2001.
|
yuuji@0
|
1353
|
yuuji@0
|
1354 [15] Crispin, M., "Internet Message Access Protocol - Version
|
yuuji@0
|
1355 4rev1", RFC 3501, March 2003.
|
yuuji@0
|
1356
|
yuuji@0
|
1357 [16] Crispin, M. and K. Murchison, "Internet Message Access Protocol
|
yuuji@0
|
1358 - Sort and Thread Extensions", Work in Progress, May 2004.
|
yuuji@0
|
1359
|
yuuji@0
|
1360 [17] Newman, C. and A. Gulbrandsen, "Internet Message Access
|
yuuji@0
|
1361 Protocol Internationalization", Work in Progress, January 2006.
|
yuuji@0
|
1362
|
yuuji@0
|
1363 Authors' Addresses
|
yuuji@0
|
1364
|
yuuji@0
|
1365 Chris Newman
|
yuuji@0
|
1366 Sun Microsystems
|
yuuji@0
|
1367 1050 Lakes Drive
|
yuuji@0
|
1368 West Covina, CA 91790
|
yuuji@0
|
1369 USA
|
yuuji@0
|
1370
|
yuuji@0
|
1371 EMail: chris.newman@sun.com
|
yuuji@0
|
1372
|
yuuji@0
|
1373
|
yuuji@0
|
1374 Martin Duerst
|
yuuji@0
|
1375 Aoyama Gakuin University
|
yuuji@0
|
1376 5-10-1 Fuchinobe
|
yuuji@0
|
1377 Sagamihara, Kanagawa 229-8558
|
yuuji@0
|
1378 Japan
|
yuuji@0
|
1379
|
yuuji@0
|
1380 Phone: +81 42 759 6329
|
yuuji@0
|
1381 Fax: +81 42 759 6495
|
yuuji@0
|
1382 EMail: duerst@it.aoyama.ac.jp
|
yuuji@0
|
1383 URI: http://www.sw.it.aoyama.ac.jp/D%C3%BCrst/
|
yuuji@0
|
1384
|
yuuji@0
|
1385 Note: Please write "Duerst" with u-umlaut wherever possible, for
|
yuuji@0
|
1386 example as "Dürst" in XML and HTML.
|
yuuji@0
|
1387
|
yuuji@0
|
1388
|
yuuji@0
|
1389 Arnt Gulbrandsen
|
yuuji@0
|
1390 Oryx Mail Systems GmbH
|
yuuji@0
|
1391 Schweppermannstr. 8
|
yuuji@0
|
1392 81671 Munich
|
yuuji@0
|
1393 Germany
|
yuuji@0
|
1394
|
yuuji@0
|
1395 Fax: +49 89 4502 9758
|
yuuji@0
|
1396 EMail: arnt@oryx.com
|
yuuji@0
|
1397 URI: http://www.oryx.com/arnt/
|
yuuji@0
|
1398
|
yuuji@0
|
1399
|
yuuji@0
|
1400
|
yuuji@0
|
1401
|
yuuji@0
|
1402 Newman, et al. Standards Track [Page 25]
|
yuuji@0
|
1403
|
yuuji@0
|
1404 RFC 4790 Collation Registry March 2007
|
yuuji@0
|
1405
|
yuuji@0
|
1406
|
yuuji@0
|
1407 Full Copyright Statement
|
yuuji@0
|
1408
|
yuuji@0
|
1409 Copyright (C) The IETF Trust (2007).
|
yuuji@0
|
1410
|
yuuji@0
|
1411 This document is subject to the rights, licenses and restrictions
|
yuuji@0
|
1412 contained in BCP 78, and except as set forth therein, the authors
|
yuuji@0
|
1413 retain all their rights.
|
yuuji@0
|
1414
|
yuuji@0
|
1415 This document and the information contained herein are provided on an
|
yuuji@0
|
1416 "AS IS" basis and THE CONTRIBUTOR, THE ORGANIZATION HE/SHE REPRESENTS
|
yuuji@0
|
1417 OR IS SPONSORED BY (IF ANY), THE INTERNET SOCIETY, THE IETF TRUST AND
|
yuuji@0
|
1418 THE INTERNET ENGINEERING TASK FORCE DISCLAIM ALL WARRANTIES, EXPRESS
|
yuuji@0
|
1419 OR IMPLIED, INCLUDING BUT NOT LIMITED TO ANY WARRANTY THAT THE USE OF
|
yuuji@0
|
1420 THE INFORMATION HEREIN WILL NOT INFRINGE ANY RIGHTS OR ANY IMPLIED
|
yuuji@0
|
1421 WARRANTIES OF MERCHANTABILITY OR FITNESS FOR A PARTICULAR PURPOSE.
|
yuuji@0
|
1422
|
yuuji@0
|
1423 Intellectual Property
|
yuuji@0
|
1424
|
yuuji@0
|
1425 The IETF takes no position regarding the validity or scope of any
|
yuuji@0
|
1426 Intellectual Property Rights or other rights that might be claimed to
|
yuuji@0
|
1427 pertain to the implementation or use of the technology described in
|
yuuji@0
|
1428 this document or the extent to which any license under such rights
|
yuuji@0
|
1429 might or might not be available; nor does it represent that it has
|
yuuji@0
|
1430 made any independent effort to identify any such rights. Information
|
yuuji@0
|
1431 on the procedures with respect to rights in RFC documents can be
|
yuuji@0
|
1432 found in BCP 78 and BCP 79.
|
yuuji@0
|
1433
|
yuuji@0
|
1434 Copies of IPR disclosures made to the IETF Secretariat and any
|
yuuji@0
|
1435 assurances of licenses to be made available, or the result of an
|
yuuji@0
|
1436 attempt made to obtain a general license or permission for the use of
|
yuuji@0
|
1437 such proprietary rights by implementers or users of this
|
yuuji@0
|
1438 specification can be obtained from the IETF on-line IPR repository at
|
yuuji@0
|
1439 http://www.ietf.org/ipr.
|
yuuji@0
|
1440
|
yuuji@0
|
1441 The IETF invites any interested party to bring to its attention any
|
yuuji@0
|
1442 copyrights, patents or patent applications, or other proprietary
|
yuuji@0
|
1443 rights that may cover technology that may be required to implement
|
yuuji@0
|
1444 this standard. Please address the information to the IETF at
|
yuuji@0
|
1445 ietf-ipr@ietf.org.
|
yuuji@0
|
1446
|
yuuji@0
|
1447 Acknowledgement
|
yuuji@0
|
1448
|
yuuji@0
|
1449 Funding for the RFC Editor function is currently provided by the
|
yuuji@0
|
1450 Internet Society.
|
yuuji@0
|
1451
|
yuuji@0
|
1452
|
yuuji@0
|
1453
|
yuuji@0
|
1454
|
yuuji@0
|
1455
|
yuuji@0
|
1456
|
yuuji@0
|
1457
|
yuuji@0
|
1458 Newman, et al. Standards Track [Page 26]
|
yuuji@0
|
1459
|