XRI Forms and Transformations
Contents
1. Introduction
This page is for the XRI Syntax and \$ Dictionary editing teams to harmonize documentation of the different normal forms of XRI and the steps involved with transformation between them. These forms and transformation steps will now be explicitly called out in proposed news sections of the XRI Syntax 2.1 spec – see XriCd02/XriSyntax2dot1Outline.
2. Proposed XRI Forms
Following are the proposed XRI forms for XRI Syntax 2.1 based on the proposed ABNF at XriCd02/XriAbnf2dot1.
Form |
Purpose |
Native Form |
Display and transcription inside native applications or in non-digitized media |
XRI-Normal Form |
Canonical comparison form for all XRI-aware applications |
XRI-UTF-8 Form |
XRI-Normal Form encoded in UTF-8 (for directories and databases requiring UTF-8 encoding) |
IRI-Normal Form |
For IRI compatability |
IRI-UTF-8 Form |
IRI-Normal Form encoded in UTF-8 |
URI-Normal Form |
For URI compatability |
HXRI Form |
Adaptation of safe form for XRI proxy resolution (HTTP URI form recognized by non-XRI aware browsers and web applications) |
MXRI Form |
Adaptation of safe form for email and messaging (RFC 2822 form recognized by non-XRI aware mail agents) |
The relationships between these forms and the transformation paths are shown in the following diagram:
3. Native Form
Native form is "step 0" of the XRI transformation ladder. It is defined as an XRI reference represented in the context of a computer application or in other media (print, audio, etc.) which, if the transformation to XRI-normal form is applied, will result in a valid XRI reference in XRI-normal form.
4. XRI-Normal Form
XRI-normal form applies a rigourous set of rules necessary to transform from native form into a valid XRI reference that can be understood by any other application that recognizes XRI-normal form. Starting with XRI Syntax 2.1, XRI-normal form is also suitable for character-by-character comparison of XRIs because the rules for XRI-normal form achieve a "canonical" XRI.
The steps required to transform an XRI reference in native form to XRI-normal form are:
If the representation of the XRI reference in native form is not yet digitized, or if it is digitized but does not use an encoding based on the Unicode Character Set (UCS), it MUST be converted to a sequence of characters from the UCS. The UTF-8 encoding is RECOMMENDED. Programatically this is typically a check to discover the character encoding of the identifier, followed by a library called to convert from any non-UCS-based encoding to a UCS-based encoding.
If any information present in Native Form would be lost in the conversion to a sequence of UCS character, this information MAY be encoded as identifier metadata in the form of cross-references from the XRI \$ Dictionary. For example, a \$lang (language) cross-reference may be inserted to indicate the glyph of a Kanji character. Note that this metadata becomes part of the formal XRI and thus is significant for comparison in all XRI forms. This step will also typically be a library call, although one needed only in applications that deal with XRIs subject to this type of ambiguity.
This sequence of UCS characters MUST be normalized according to NFKC as defined in the Unicode specifications (UTR15). This is typically a library call to an NFKC normalization library.
- The authority segment of the XRI MUST be normalized to lowercase, unless it overriden by \$ces or other \$ metadata specifying case.
- The target values of \$ip, \$dns, and \$uri authority segments MUST be normalized as specified in the XRI \$ Dictionary.
- All unnecessary percent-encoding MUST be removed, and all remaining percent encoding MUST use uppercase A through F for hex digits.
- All /./ and /../ segments MUST be removed.
- All XRIs used as encapsulated references MUST be in XRI-normal form, i.e., an xri:// prefix MUST be removed.
4.1. XRI-UTF-8 Form
This is a subset of XRI-normal form for use in directory and database applications which require UTF-8 encoding. The only additional requirement is that the UCS encoding be UTF-8.
5. IRI-Normal Form
IRI-normal form is required for full compatability with the IRI specification (RFC 3987). The steps for transformation of an XRI reference into IRI-normal form are currently defined in section 2.3.1 of XRI Syntax 2.0. The corrected set of steps for XRI Syntax 2.1 will be:
- If the XRI reference is not in XRI-normal form, first transform it into XRI-normal form.
- If the XRI reference is not relative (i.e., if it matches the "xri" ABNF production) prepend "xri://" (in all lowercase) to the XRI reference.
- Apply the XRI escaping rules defined below.
5.1. XRI Escaping Rules
These rules are currently defined in section 2.3.2 of XRI Syntax 2.0.
Note that this step is not idempotent (i.e., it may yield a different result if applied more than once), so it is very important that implementers not apply this step more than once to avoid changing the semantics of the identifier.
- Percent-encode all percent “%” characters as “%25” across the entire XRI reference.
- Percent-encode all number sign “#” characters that appear within a cross-reference as “%23”.
- Percent-encode all question mark “?” characters that appear within a cross-reference as “%3F”.
- Percent-encode all slash “/” characters that appear within a cross-reference as “%2F”.
5.2. IRI-UTF-8 Form
This is a subset of IRI-normal form for use in directory and database applications which require and IRI and UTF-8 encoding. The only additional requirement is that the UCS encoding be UTF-8.
6. URI-Normal Form
URI-normal form is required for full compatability with the URI specification (RFC 3986). The rules for transformation of a valid IRI into a valid URI are defined by section 3.1 of RFC 3987. The steps required to transform an XRI reference to URI-normal form are:
- If the XRI reference is not in IRI-UTF-8 form, first transform it into IRI-UTF-8 form.
- Percent-encode the UTF-8 octets.
7. HXRI Form
HXRI (HTTP XRI) form is URI-normal form with the "xri://" prefix replaced by an XRI proxy resolver prefix so the XRI is expressed as a valid HTTP(S) URI.
- If the XRI reference is not in URI-normal form, first transform it into URI-normal form.
- Remove the "xri://" prefix.
Add a valid HXRI proxy resolver prefix – see XriCd02/Xri2dot1Formats.
8. MXRI Form
MXRI (Mail XRI) form is URI-normal form with the "xri://" prefix removed and additional encoding (if necessary) that allows an XRI to be expressed as a valid RFC 2822 email address.
- If the XRI reference is not in URI-normal form, first transform it into URI-normal form.
- Remove the "xri://" prefix.
- If the XRI does not end in a valid MXRI mail agent suffix ("@" plus a valid RFC 2822 domain name), append one.
Apply escaping rules required to be compatible with RFC 2822. These escaping rules to be completed -- see XriCd02/Xri2dot1Formats.
XRI Wiki