For requirements list see: OneContentModel/Requirements

Comparison

This section show the different requirements along with example of how they would be represented in each option

Must be able to represent standalone codes

Original: Before <br/> after

Option A: Before <ic id="1" data="&lt;br/>"/> after

Option D: Before <ph id="1">&lt;br/></ph> after (with native data) or Before <ph id="1"/> after (without native data)

Must be able to represent balanced paired-codes

Original: <b>bold text</b>

Option A: <pc id="1" start="&lt;b>" end="&lt;/b>">bold text</pc>

Option D: <pc id="1">bold text</pc> (only without native data)

Must be able to represent paired codes that have been separated

Original:

Option A:

Must be able to represent paired codes that are overlapping each other

Original: <b>Bold <i>bold+italic</b> italic</i>

Option A: <sc id="1" data="&lt;b>"/>Bold <sc id="2" data="&lt;i>"/>bold+italic<ec id="1" data="&lt;/b>"/> italic<ec id="2" data="&lt;/i>"/>

Must allow to associate spans of content with metadata

text <mrk ...>text</mrk>

Metadata here can be both XLIFF-specifc or custom. Using <mrk> as a general purpose span marker. It can hold XLIFF attributes or attributes from an extension.

Should it have mandatory attribute like an id? probably not.

Important object (like term) and the associated info (like source ref, etc.)

Structure info: like comments may need to point to an external element.

Should the <mrk> be able to contain multiple metadata or just one (and have several mrk)?

Some span-related metadata:

Modularity: should we have more specific elements? More control on the layer.

It seem a specialization of the mrk into different elements would be better: allow better validation and a clear separation of the data categories

flag indicating the span must not be translated

Original: MonkeyWrench is a powerful application that does everything (where "MonkeyWrench" needs to be protected).

Option A: <mrk type="protect">MonkeyWrench</mrk> is a powerful application that does everything

Option B: <mrk its:translate="no">MonkeyWrench</mrk> is a powerful application that does everything

flag indicating the span is a term

Original: I have a doppelganger (where "doppelganger" is a term).

Option A: I have a <mrk type="term">doppelganger</mrk>.

Option B: I have a <mrk its:term="yes">doppelganger</mrk>.

reference ID used to point to external annotation

Original: TODO

Option A: <mrk rid="1">...</mrk>

Maybe we should have a type associate with the mrk here.

translator comment annotation

Original: TODO

Option A: <mrk comment="some text">...</mrk>

tool-specific processing instructions

Original: TODO

Option A: <mrk myNS:myAttr="value"/>

Must be able to store a display-friendly representation of an inline code for informational purpose

Original:

Option A: <ic id="1" data="blah..." disp="bold"/>

Must be able to store a text-equivalent representation of an inline code for linguistic processes

Original: &File (in a resource string, where & indicate that F is a hot key)

Option A: <ic id="1" data="&amp;" equiv=""/>File...

Must be able to identify uniquely an inline code within a segment

Original:

Option A:

Must be able to associate each code of the source with its corresponding code in the target

Original: TODO

Option A:

Must be able to represent the duplication of inline codes in the segment

Original: TODO

Option A:

Must be able to represent inline codes added to the segment

Original: TODO

Option A:

Must allow three ways to deal with the native data corresponding to an XLIFF inline code

to store only the XLIFF representation, discarding the native data

Original: Line 1<br/>Line 2

Option A: Line 1<ic id="1" type="lb"/>

to store it along with its XLIFF representation

Original: Line 1<br/>Line 2

Option A: Line 1<ic id="1" data="&lt;br/>"/>Line 2

to store a pointer to it along with its XLIFF representation

Original: Line 1<br/>Line 2

Option A: Line 1<ic id="1" rid="c123"/>

Must be able to represent separately different flows of text and codes when, in the original format, they are mixed together

Original: Text with <img alt="Alt text" src="img.png"/>

Option A:

Should be able to represent the mutual relationships between a nested flow of text and its parent

Original: TODO

Option A:

Should be able to represent illegal XML characters in the content

Original: Text with \u0003 (e.g. in java properties file)

Option A: Text with _#x3;

Inline codes should have a way to store information about the effect on the segmentation

(Note: Not sure how any kind of tag-based marker can be useful with SRX)

Original: Line 1<br/>Line 2

Option A: Line 1<ic id="1" data="&lt;br/>" type="lb"/>Line 2

Should preserve span-like structures

Original: Line 1<br/><b>Line 2</b>.

Option A: Line 1.<ic id="1" data="&lt;br/>"/><pc id="1" start="&ltb>" end="&lt;/b>">Line 2</pc>.

Details for Option A

This option uses four elements:

The native data can be represented different ways:

Issues

Not sure how line-breaks can be preserved in attribute values. It seems using explicit NCR (e.g. &#10;) should allow this (but spaces are normalized)

The second possible solution may be to declare the value of the attributes holding the data (start, end, data) as CDATA. That would preserve the content. See Attribute-value normalization in XML specification.

Tentative schema

Details for Option B

This option (re)uses namespaces/vocabularies such as W3C Internationalization Tag Set.

Examples:

* For translatability

* For marking terms

Details for Option C

This option is patterned after the mechanisms of the W3C Internationalization Tag Set (ITS), and thus includes what ITS calls "local" and "global" approach.

Examples:

If an identifier exists, or can be added Before <br id="a"/> after then the following is possible ('gim' abbreviates 'generic inline markup', and 'ic' abbreviates 'inline code')

Details for Option D

TODO

Summary of the Possible Options for Representing Native Data

Storing in the element

Pros:

Cons:

Storing in attributes

Pros:

Cons:

Store outside the content

Pros:

Cons:

Outside : Option 1 : Global

In this representation the native data are represented as a list of entries all group together somewhere at the top or the bottom of the document.

For example, something like:

<nativeCodes>
 <native id='120'>&lt;br></native>
 <native id='121'><start>&lt;b></start><end>&lt;/b></end></native>
</nativeCodes>
...
<trans-unit id='1'>
 <source>Text in <code id='1' ref='121'>bold</code>.<code id='2' ref='120'/></source>
</trans-unit>

or

<nativeCodes>
 <native id='120'>&lt;br></native>
 <native id='121'>&lt;b></native>
 <native id='122'>&lt;/b></native>
</nativeCodes>
...
<trans-unit id='1'>
 <source>Text in <code id='1' scref='121' ccref='122'>bold</code>.<code id='2' ref='120'/></source>
</trans-unit>

Outside : Option 2 : Within the translation unit

In this option the native codes are outside the source/target content, but still within the translation unit. This allows sharing between source and target, but keep some level of self-containment: splitting/merging operations would be fine at the translation unit level.

For example, something like:

<trans-unit>
 <nativeCodes>
  <native id='1'><start>&lt;b></start><end>&lt;/b></end></native>
  <native id='2'>&lt;br></native>
 </nativeCodes>
 <source>Text in <code id='1' ref='1'>bold</code>.<code id='2' ref='2'/></source>
</trans-unit>

or like:

<trans-unit>
 <nativeCodes>
  <native id='1'>&lt;b></native>
  <native id='2'>&lt;/b></native>
  <native id='3'>&lt;br></native>
 </nativeCodes>
 <source>Text in <code id='1' scref='1' ecref='2'>bold</code>.<code id='2' ref='3'/></source>
</trans-unit>

OneContentModel/Comparison (last edited 2011-05-09 22:10:50 by ysavourel)