This document has been produced and approved by the Localisation Industry Standards (LIS) ETSI Industry Specification Group (ISG) and represents the views of those members who participated in this ISG. It does not necessarily represent the views of the entire ETSI membership.
The present document may be made available in more than one electronic version or in print. In any case of existing or perceived difference in contents between such versions, the reference version is the Portable Document Format (PDF). In case of dispute, the reference shall be the printing on ETSI printers of the PDF version kept on a specific network drive within ETSI Secretariat.
the present document should be aware that the document may be subject to revision or change of status. Information on the current status of this and other ETSI documents is available at http://portal.etsi.org/tb/status/status.asp
If you find errors in the present document, please send your comment to one of the following services: http://portal.etsi.org/chaircor/ETSI_support.asp
No part may be reproduced except as authorized by written permission. The copyright and the foregoing restriction extend to reproduction in all media.
Copyright © European Telecommunications Standards Institute 2013. All rights reserved.
This document defines the XML based Text Memory specification ( xml:tm
). The purpose of this vocabulary is to store text memory information within an XML document using the XML namespace syntax.
Recommendation
Appendices
XML based Text Memory (xml:tm
) is an entirely new
approach to the problem of how to store and use translation memory. It
is totally integrated with XML and uses XML syntax to define memory
segments. It borrows from the Lisa TMX 2.0 specification, mandates the use of the Lisa SRX segmentation standard as well as the W3C ITS Document Rules definition and is designed to tightly integrate with the OASIS XLIFF 1.2
specification for the extraction and actual translation of text.
However, its goals and requirements are different enough from any of the
above so as to warrant its own format.
The xml:tm
namespace is added to a document during a process that is called Text Memory Namespace Application. The xml:tm
namespace application is driven by a W3C ITS Document Rules definition document. The W3C ITS Document Rules is an XML vocabulary that specifies the following:
An additional aspect of the xml:tm
namespace application
process is the subdivision of text into identifiable sentences which
are referred to as individual text units. The text unit is the lowest
level of granularity within xml:tm
enabled documents. The xml:tm
standard mandates that this segmentation process is driven by SRX
rules. Unique identifier attributes are allocated to each text unit in a
document. These identifiers are designed to be immutable for the
lifetime of the document.
During the document's life cycle the xml:tm
namespace
and the unique text unit identifiers are maintained by a process
referred to as DOM differencing. This process compares the current and
previous versions of the document using the Document Object Model (DOM) structure of the document and allocates unique identifiers accordingly.
For the purposes of translation the xml:tm
namespace greatly simplifies the creation of an XLIFF form of the document's translatable text as well as a skeleton file for merging the translated text. xml:tm
is designed to work tightly with XLIFF and to simplify the process of translating XML documents.
Once the XLIFF text has been translated and merged with the skeleton file to form a target version of the document, the source and target documents are perfectly aligned at the text unit level. In the next iteration of the document when it comes to translation those text units that remain unchanged are automatically allocated the target language text in a process known as Exact Matching.
The xml:tm
namespace can be stripped from the document by means of a simple XSLT
transformation if non-namespace versions of the document are required
for specific forms of processing such as print composition.
Version 2.0 deprecates the use of the original top level tm:tm
element in favour of placing all of the original tm:tm
element attributes
in the document root element after the xmlns:tm
namespace delaration. This change facilitates the use of XPath expressions to address individual xml:tm
end nodes
without disrupting normal XPath expressions addressing original document elements.
Text Memory is an XML namespace application, and as such it is designed to co-exist within any well formed XML document. The Text Memory namespace needs to be declared before the top level Text Memory element is inserted, ideally within the top level element of the document using the following namespace declaration syntax:
xmlns:tm="urn:X-etsi-xml-tm-tags"
For consistency it is recommended that all Text Memory namespace
elements are prefixed with the text memory namespace identifier tm:
. If the tm:
namespace has already been used in the document then another
appropriate namespace identifier should be chosen. For the purposes of
this specification it is assumed that the tm:
namespace is used.
All ID attribute values for xml:tm
namespace elements are deemed to be immutable for the life of the document. If an element is deleted its ID is never reused.
It is recommended that the encoding scheme for xml:tm
documents be either Unicode
UTF-8 or UTF-16, as this simplifies handling of the character data in
the source and translated documents. Other encodings may be used if
supported by the actual XML parser and DOM and SAX libraries used to
process the document. The actual encoding must be declared using the XML
encoding declaration at the very start of the document. The values to
use for the encoding declaration are defined in the [IANA Charsets] listing. For example:
<?xml version="1.0" encoding="utf-8"?>
The Text Memory hierarchical structure view of a document is relatively flat compared to that of a normal XML document. The following is a list of the xml:tm
elements:
xml:tm
namespace declaration and the key administrative xml:tm
attributes.tm:tm
tm:tm
element can contain any number of tm:te
text element elements and any number of tm:ta
non-inline translatable attributes.
The use of this element is to be avoided in favour of placing all of the original tm:tm element attributes in the document root element after the xmlns:tm namespace delaration. tm:vh
tm:te
PCDATA
text. They will contain one or more tm:tu
text unit elementstm:ta
tm:ta
element which is the immediate child of the element within which the translatable attribute occurred.tm:tu
tm:tu
element contains the actual text of a tm:te
element or any subdivision of the text into recognizable sentences. The Lisa SRX standard should be used for segmenting tm:te
element text into tm:tu
elements.tm:ti
tm:tu
elements are pulled out as the immediate child of their element into a tm:ti
element.tm:span
tm:attr
tm:span
elements, each tm:attr
element contains the name and value of each original attribute of the parent span element.tm:mh
tm:tu
text unit element has changed by less than 30% a tm:mh
modification history is established for that text unit.tm:th
tm:tu
elements are held in tm:th
elements.An example of an XML document with Text Memory namespace:
Given the following sample XML document, where translatable text is rendered in blue:
<?xml version="1.0" encoding="UTF-8" ?>
<office-document xmlns:text="http://openoffice.org/2000/text">
..........
<text:p text:style-name="Text body" text:index-qualifier="xml:tm description">
xml:tm is a radical new approach
<text:index name="radical new approach"/>
to dealing with the problems of translation
memory for XML documents by using XML syntax to embed memory
directly into the XML documents themselves.
It makes extensive use of XML namespace.
</text:p>
<text:p text:style-name="Text body">
The “tm” stands for “text memory”.
There are two aspects to text memory:
</text:p>
<text:ordered-list text:continue-numbering="false" text:style-name="L1">
<text:list-item>
<text:p text:style-name="P3">
Author memory
</text:p>
</text:list-item>
<text:list-item>
<text:p text:style-name="P3">
Translation memory
</text:p>
</text:list-item>
..........
</office-document>
The same XML document with xml:tm
namespace would look like this:
<?xml version="1.0" encoding="UTF-8" ?>
<office-document xmlns:text="http://openoffice.org/2000/text"
xmlns:text="http://openoffice.org/2000/text"
xmlns:tm="urn:X-etsi-xml-tm-tags tm:te="543" ta="41" tm:version="2.0" tm:id="8fba2f33"
tm:source-language="en-US" tm:date="20031218T13:06:52Z"
tm:tool-name="XYZ Tool" tm:tool-version="1.23">
<tm:vh version="1.0" date="20030502T14:15:03Z"/>
..........
<text:p text:style-name="Text body">
<tm:ta id="a1" name="text:index-qualifier" version="1.0">
xml:tm description
</tm:ta>
<tm:te id="e1" tu="2" version="1.0">
<tm:tu id="u1.1" ti="1" crc="3275b242" version="1.0";>
xml:tm is a radical new approach
<text:index>
<tm:ti id="i1.1.1" name="text:name" crc="9114ce48" version="2.0">
radical new approach
</tm:ti>
</text:index>
to dealing with the problems of translation
memory for XML documents by using XML syntax to embed memory
directly into the XML documents themselves.
</tm:tu>
<tm:tu id="u1.2" crc="306bf701" version="1.0">
It makes extensive use of XML namespace.
</tm:tu>
</tm:te>
</text:p>
<text:p text:style-name="Text body">
<tm:te id="e2" tu="2" version="1.0">
<tm:tu id="u2.1" crc="f8c012ff" version="1.0">
The “tm” stands for “text memory”.
</tm:tu>
<tm:tu id="u2.2" crc="270af770" version="1.0">
There are two aspects to text memory:
</tm:tu>
</tm:te>
</text:p>
<text:ordered-list text:continue-numbering="false" text:style-name="L1">
<text:list-item>
<text:p text:style-name="P3">
<tm:te id="e3" tu="1" version="1.0">
<tm:tu id="u3.1" crc="851603a2" version="1.0">
Author memory
</tm:tu>
</tm:te>
</text:p>
</text:list-item>
<text:list-item>
<text:p text:style-name="P3">
<tm:te id="e4" tu="1" version="1.0">
<tm:tu id="u4.1" crc="313af159" version="1.0">
Translation memory
</tm:tu>
</tm:te>
</text:p>
</text:list-item>
..........
</tm:tm>
</office-document>
The complete tree structure view is available in Appendix A .
The deprecated <tm:tm>
elementwas used as the top level of the xml:tm
hierarchy.
The <tm:tm>
element is deprecated in xml:tm version 2.0. The use of the <tm:tm>
element resticted the use of XPath expressions to address non <tm:te>
elements and subsequent xml:tm
namespace child objects, and therefore limited
the usability of xml:tm
. The <tm:tm>
element attributes should be placed in the document root element following the xmlns:tm
namespace declaration.
Each time a source language xml:tm
namespace document is updated and the IDs of its xml:tm
elements are updated through the DOM differencing process a new
<tm:vh>
is added as a direct child of the document root element. The <tm:vh>
elements have no content. The date and version number of the history element are specified via the "id"
and "date"
attributes .
Each XML element that contains PCDATA
(parsable character data) is allocated a <tm:te>
element. Each <tm:te>
element has a unique ID attribute value and the version number when it
was first created. The version number corresponds to the ID value of the
current document root element "tm:version"
attribute or the <tm:vh>
"version"
attribute. Each <tm:te>
element is allocated a unique ID value for the life of the document.
Every <tm:te>
text element must have at least one <tm:tu>
text unit element. If there is more than one identifiable sentence in the PCDATA
of a <tm:te>
element, then each sentence will have a separate <tm:tu>
element. Each <tm:tu>
element is allocated a unique ID value for the life of the document as well as the "version"
attribute from the current document root element "tm:version"
attribute when it was created.
Any translatable attributes for inline elements are place in their own <tm:ti>
element as the direct child of the inline element. If that element had no content, then it is deemed to have the content of <tm:ti>
for the purposes of the xml:tm
namespace. Each <tm:ti>
element is allocated a unique ID value for the life of the document as well as the "version"
attribute from the current document root element "tm:version"
attribute when it was created.
Any translatable attributes for non-inline elements are place in their own <tm:ta>
element as the direct child of their element. If that element had no content, then it is deemed to have the content of <tm:ta>
for the purposes of the xml:tm
namespace. Each <tm:ta>
element is allocated a unique ID value for the life of the document as well as the "version"
attribute from the current document root element "tm:version"
attribute when it was created.
Where during DOM differencing it is found that the contents of a <tm:tu>
element have changed by less than 30% (70% or more of the contents are
unchanged), then a modification history is created for that element. The
role of <tm:mh>
elements is to provide previous source and target versions of the <tm:tu>
during translation as a form of in-document fuzzy matching.
Where an inline element spans multiple segments it is necessary convert the start and end tags for the element into tm:span
elements, otherwise it would not be possible to segment the data into multiple tm:tu
elements. In this way the effect of the inline element on segmentation
is neutralized. If the spanning element has one or more attributes, then
the name and value of the attributes are held in tm:attr
inline span attribute elements. This allows for the complete
reconstruction of the original spanning element together with its
attributes when creating a non tm
namespace version of the document.
For example the following XML content:
<text:p text:style-name="P11">
This is used to generate a format <text:span text:style-name="T5" text:emph="normal"> as required.
The generated format can then be applied to the output.
See Appendix C for details</text:span>.
</text:p>
Would generate the following tm:
code:
<text:p text:style-name="P11">
<tm:te id="e9" tu="5" version="1.0">
<tm:tu crc="ebda4c34" id="u9.1" version="1.0">
This is used to generate a format
<tm:span name="text:span" type="start">
<tm:attr name="text:style-name" value="T5"/>
<tm:attr name="text:emph" value="normal"/>
</tm:span> as required.
</tm:tu>
<tm:tu crc="a0bcf600" id="u9.2" version="1.0">
The generated format can then be applied to the output.
</tm:tu>
<tm:tu crc="7e489ac6" id="u9.3" version="1.0">
See Appendix C for details<tm:span name="text:span" type="end"/>.
</tm:tu>
</tm:te>
</text:p>
Inline spanning attribute elements tm:attr
are used to
hold the details of the span element's attributes and their values. For
full details and examples please refer to Section 2. 8 Inline Spanning Elements.
Text Memory only exists as a namespace within another XML document. It is not designed to have an independent existence. The Text Memory namespace must be declared as an attribute of any preceding element of that document, although for clarity it is recommended that this declaration be placed within the attributes of the top document element:
xmlns:tm="urn:X-etsi-xml-tm-tags"
xml:tm
elements can be divided into the following categories: top-level elements, text elements and versioning elements. Some core Attributes are shared among them.
Top Level elements |
document root element, <tm:tm> ,<tm:te> .
|
Text Elements |
<tm:ta> , <tm:tu> , <tm:ti> , <tm:th> .
|
Versioning elements |
<tm:vh> , <tm:mh> .
|
Span elements |
<tm:span> , <tm:attr> .
|
The document root element - replaces the version 1.0 tm:tm
element. The xml:tm
attributes can now be added
to the Document root element, rather than as part of the tm:tm
element
Document root element
The document root element has the following xml:tm
attributes:
Required attributes:
tm:id
- the unique document ID.
tm:te
- the next unique text element tm:te
identifier.
tm:ta
- the next unique main element translatable attribute tm:ta
identifier.
tm:version
- the current version identifier for the tm:tm
namespace for this document.
tm:date
- the date that the current version of the tm:tm
namespace was created for this document.
tm:source-language
- the language in which this document is authored.
tm:xmltm-version
- The version of the xml:tm
specification.
tm:tool-name
- The name of the tool that generated the text memory.
tm:tool-version
- The version identifier of the tool that generated the text memory.
Optional attributes:
tm:target-language
- if present signifies the language of this document.
tm:doctype-public
- the public doctype identifier of the original document's DOCTYPE declaration if any.
tm:doctype-system
- the system doctype details of the original document's DOCTYPE declaration if any.
Contents:
Zero or more <tm:vh>
elements, zero or more <tm:ta>
elements, zero or more <tm:te>
elements.
The deprecated text memory element has the following format:
<tm:tm>
From version 2.0 the document root element - replaces the tm:tm element. The use of this element is now deprecated and is included only for backwards compatability.
Text Memory Element - The <tm:tm>
element encloses all the other xml:tm
elements of the document.
Required attributes:
id
- the unique document ID.
te
- the next unique text element tm:te
identifier.
ta
- the next unique main element translatable attribute tm:ta
identifier.
version
- the current version identifier for the tm:tm
namespace for this document.
date
- the date that the current version of the tm:tm
namespace was created for this document.
source-language
- the language in which this document is authored.
xmltm-version
- The version of the xml:tm
specification.
tool-name
- The name of the tool that generated the text memory.
tool-version
- The version identifier of the tool that generated the text memory.
Optional attributes:
target-language
- if present signifies the language of this document.
doctype-public
- the public doctype identifier of the original document's DOCTYPE declaration if any.
doctype-system
- the system doctype details of the original document's DOCTYPE declaration if any.
Contents:
Zero or more <tm:vh>
elements, zero or more <tm:ta>
elements, zero or more <tm:te>
elements.
The use of this element is now deprecated. Its role has been taken by the DOcument root element.
The xml:tm
text memory version histories of this document:
<tm:vh>
Version History Element - There is one <tm:vh>
element for each version of the xml:tm
namespace for the document.
Required attributes:
version
- the version identifier of the document root element tm:version
attribute, or tm:tm
element version
attribute when this tm:tu
element was created.
date
- the date that the version of the xml:tm
namespace was created for this document.
Optional attributes:
source-language
- The original source language of this document.
Contents:
EMPTY
The xml:tm
text memory element tag for elements that contain text:
<tm:te>
Text Element - There is one <tm:te>
element within each native element that has text ( PCDATA
) content.
Required attributes:
id
- A unique immutable identifier for this text element. This will always begin with the character 'e' followed by digits.
version
- the version identifier of the document root element tm:version
attribute, or tm:tm
element version
attribute when this tm:tu
element was created.
tu
- The maximum current identifier for any tm:tu
elements that are children of this tm:te
element.
Optional attributes:
xml:lang
- A language code as described in RFC 4646. Denotes that the text contained in the child tm:tu
elements relates to a language other than that declared in the source-language
or, for target language documents, declared in the target-language
attribute for the document.
xml:space
- specifies how white spaces (ASCII spaces, tabs and line-breaks) should be treated.
Contents:
One or more <tm:tu>
elements.
The xml:tm
text memory element for individual tm:te
element contents or subdivision of the same into recognizable sentences:
<tm:tu>
Text Unit - There is one <tm:tu>
element within each native element that has text ( PCDATA
) content, or a subdivision of the contents into recognizable sentences.
Required attributes:
id
- A unique immutable identifier for this text unit element. This will always begin with the character 'u'
followed by digits.
crc
- The AUTODIN II polynomial crc hex value for the text contents of this element.
version
- the version identifier of the document root element tm:version
attribute, or tm:tm
element version
attribute when this tm:tu
element was created.
flag
- Used in
target language version of the element to indicate if the translation
for the element is merged with the preceding element, or not to be used
for leveraged matching.
type
- The general type of this text unit. Possible values are "text"
, "alphanumeric"
, "numeric"
, "measurement"
, "punctuation"
, "markup"
or "notrans"
.
Optional attributes:
ti
- The maximum current identifier for any inline <tm:ti>
translatable attribute elements that are children of this <tm:tu>
element.
translate
- indicates the 'translatability' of the contents. The default value is "yes"
.
xml:lang
- A language code as described in RFC 4646. Denotes that the text contained in the element relates to a language other than that declared in the source-language
or, for target language documents, declared in the target-language
attribute for the document.
xml:space
- specifies how white spaces (ASCII spaces, tabs and line-breaks) should be treated.
Contents:
<tm:mh>
text unit modification elements.<tm:ti>
translatable attribute elements.<tm:span>
elements.PCDATA
and any inline elements and inline element <tm:ti>
translatable attribute elements.
The xml:tm
text memory element for individual inline translatable attributes and subflows:
<tm:ti>
Translatable inline attribute and subflow text - Any translatable attributes of inline elements are expanded into a direct child of the inline element. Any inline element content that should be treated as a subflow. A 'subflow' is text that occurs within the current text, but linguistically does not form part of the current text stream. Examples are footnote text, index text etc. Both translatable inline element attributes and subflow text should be extracted separately from the text within which they occur.
Required attributes:
id
- A unique immutable identifier for this text unit element. This will always begin with the character 'i'
followed by digits.
crc
- The AUTODIN II polynomial crc hex value for the text contents of this element.
version
- the version identifier of the document root element tm:version
attribute, or tm:tm
element version
attribute when this tm:tu
element was created.
flag
- Used in
target language version of the element to indicate if the translation
for the element is not to be used for leveraged matching.
type
- The general type of this inline element attribute contents. Possible values are "text"
, "alphanumeric"
, "numeric"
, "measurement"
, "punctuation"
, "markup"
or "notrans"
.
name
- the name of the attribute.
subflow
- indicates that the element is an inline subflow as opposed to a translatable attribute. A value of "yes"
indicates the contents of an inline subflow element. The default value is "no"
.
Optional attributes:
translate
- indicates the 'translatability' of the contents. The default value is "yes"
.
xml:lang
- A language code as described in RFC 4646. Denotes that the text contained in the element relates to a language other than that declared in the source-language
or, for target language documents, declared in the target-language
attribute for the document.
xml:space
- specifies how white spaces (ASCII spaces, tabs and line-breaks) should be treated.
Contents:
The translatable attribute text.
The xml:tm
text memory element for non-inline translatable attributes:
<tm:ta>
Translatable inline attribute - Any translatable attributes of inline elements are expanded into a direct child of the inline element.
Required attributes:
id
- A unique immutable identifier for this text unit element. This will always begin with the character 'a' followed by digits.
crc
- The AUTODIN II polynomial crc hex value for the text contents of this element.
version
- the version identifier of the document root element tm:version
attribute, or tm:tm
element version
attribute when this tm:tu
element was created.
flag
- Used in
target language version of the element to indicate if the translation
for the element is not to be used for leveraged matching.
type
- The general type of this inline element attribute contents. Possible values are "text"
, "alphanumeric"
, "numeric"
, "measurement"
, "punctuation"
, "markup"
or "notrans"
.
name
- the name of the attribute
Optional attributes:
translate
- indicates the 'translatability' of the contents. The default value is "yes"
.
xml:lang
- A language code as described in RFC 4646. Denotes that the text contained in the element relates to a language other than that declared in the source-language
or, for target language documents, declared in the target-language
attribute for the document.
xml:space
- specifies how white spaces (ASCII spaces, tabs and line-breaks) should be treated.
Contents:
The translatable attribute text.
The xml:tu
text unit modification history:
<tm:mh>
Text Unit Modification History - If an individual text unit has changed by less than 70% between revisions a note is made of previous versions of this text unit.
Required attributes:
NONE
Contents:
One or more <tm:th>
text unit history elements.
The xml:tu
history:
<tm:th>
Text Unit History - Previous id of the text
unit. Where a text unit has been modified this element contains the id
of the text unit's previous incarnation. There must be at least one <tm:th>
element for a <tm:mh>
element.
Required attributes:
id
- A unique immutable identifier for this text unit element. This will always begin with the character 'u' followed by digits.
crc
- The AUTODIN II polynomial crc hex value for the text contents of this element.
version
- the version identifier of the document root element tm:version
attribute, or tm:tm
element version
attribute when this tm:tu
element was created.
Optional attributes:
NONE
Contents:
EMPTY
The tm:span
element for inline elements that span across multiple tm:tu
elements:
<tm:span>
Spanning inline elements - Where an inline elements spans multiple segments boundaries it can be escaped by use of a tm:span
element. The effect of the tm:span
element is to remove its role as an element with content and replace
this with an inline no-content element for both the start and end
elements. Without use of the tm:span
it would not be possible to break the text into the appropriate segments. tm:span
elements must always occur in pairs to cover the start and end elements.
Required attributes:
name
- the name of the element that is being escaped.
type
- The type of span element. This can only have the value start
where the start code is being replaces, or end
where the closing element is being replaced.
Optional attributes:
original attributes - The original attributes of the start element appear in their original form in the tm:span
element.
Contents:
Zero, o ne or more tm:attr
elements.
The tm:attr
element contains the attribute name and values of its parent tm:span
element's original attributes:
<tm:attr>
Spanning attribute elements - Where an inline elements spans multiple segments boundaries it can be escaped by use of a tm:span
element. The effect of the tm:span
element is to remove its role as an element with content and replace
this with an inline no-content element for both the start and end
elements. Where an element is being escaped in this way its attribute
names and values are stored in child tm:attr
elements.
Required attributes:
name
- the name of the attribute that is being stored.
value
- The value of the attribute that is being stored.
Optional attributes:
NONE
Contents:
EMPTY
This section lists the various attributes used in the xml:tm
elements. An attribute is never specified more than once for each
element. Along with some of the attributes are the "Recommended
Attribute Values". Values for these attributes are case sensitive. These
lists are purely informative; the goal is to specify a preferred syntax
so tools can have some level of compatibility.
xml:tm attributes
|
crc , date , doctype-public , doctype-system , flag , id , name , source-language , subflow , target-language , ta , te , ti , translate , tu , tool-name , tool-version , type , value , version , xmltm-version
|
XML namespace attributes |
xml:lang , xml:space .
|
CRC - The crc hash value of the serialized contents of this element.
Value description:
The AUTODIN II polynomial crc hex value of the serialized contents of this element. The crc
attribute is critical for validating the consistency of xml:tm
memory alignment.
Default value:
Undefined
Used in:
<tm:ta>
, <tm:th>
, <tm:ti>
, <tm:tu>
.
Date - The date
attribute indicates when a given element was created or modified.
Date in [ ISO 8601 ] Format. The recommended pattern to use is: YYYYMMDDThhmmssZ
Where: YYYY
is the year (4 digits), MM
is the month (2 digits), DD
is the day (2 digits), hh
is the hours (2 digits), mm
is the minutes (2 digits), ss
is the second (2 digits), and Z
indicates the time is UTC time. For example:
date="20020125T21:06:00Z" is January 25, 2002 at 9:06pm GMT is January 25, 2002 at 2:06pm US Mountain Time is January 26, 2002 at 6:06am Japan time
Default value:
Undefined
Used in:
document root element, <tm:tm>
, <tm:vh>
.
DOCTYPE PUBLIC IDENTIFIER - the PUBLIC identifier of the original document's DOCTYPE declaration.
Value description:
For the main <tm:tm>
element this attribute holds the value of the original document's
DOCTYPE declaration if a DOCTYPE was declared for the original document.
Default value:
Undefined
Used in:
document root element, <tm:tm>
.
DOCTYPE SYSTEM DETAILS - the SYSTEM URI of the original document's DOCTYPE declaration.
Value description:
For the main <tm:tm>
element this attribute holds the value of the original document's
DOCTYPE declaration SYSTEM URI if a DOCTYPE was declared for the
original document.
Default value:
Undefined
Used in:
document root element, <tm:tm>
.
Flag - Used to indicate if the text unit has been merged with the preceding text unit, or if the translation should not be used for leveraged memory:
The possible types for the text types are:
normal
merged
<tm:ta>
and <tm:ti>
elements.noequiv
Default value:
normal
Used in:
ID - The unique ID identifier for this element.
Value description:
For the main <tm:tm>
element this is a unique value based on the CRC of the whole file plus a
random 64 bit number in hex character form. For other instances it is
the first character of the element name plus a unique sequential number.
The sequential number is based on the number of the parent plus a
unique number for the current element. The following list details how
the individual identifiers are constructed for the very first occurrence
of that element:
<tm:tm>
<tm:ta>
"a1"
- formed by the letter 'a'
followed by the value of the "tm:ta""
attribute of thedocument root element plus one.<tm:te>
"e1"
- formed by the letter 'e'
followed by the value of the "tm:te"
attribute of thedocument root element plus one.<tm:th>
"u1.1"
- inherited from the original <tm:tu>
element.<tm:ti>
"i1.1.1"
- formed by the letter 'i'
followed by the "id"
value of the parent <tm:tu>
element less the leading letter plus the period character plus value of the "ti"
attribute of the <tm:tu>
element plus one.<tm:tu>
"u1.1"
- formed by the letter 'u'
followed by the "id"
value of the parent <tm:te>
element less the leading letter plus the period character plus value of the "tu"
attribute of the <tm:te>
element plus one.Default value:
Undefined
Used in:
<tm:ta>
, <tm:th>
, <tm:ti>
, <tm:tu>
.
Name
tm:ta
and tm:ti
elements denotes the attribute name of translatable attributes.
Value description:
The name of the attribute for translatable attributes
that have been pulled out as a direct child of their element.
Translatable attributes for inline ( <tm:ti>
) and non-inline ( <tm:ta>
) elements a re pulled out from their element into a xml:tm
element as a direct child.
For example:
Original document
<elm trans="Please translate this"/>
xml:tm
version of the document
<elm><tm:ta id="a1" name="trans">Please translate this</tm:ta></elm>
Default value:
Undefined
tm:attr
denotes the attribute name of one of the parent tm:span
attributes.
Value description:
The name of the attribute that the child tm:attr
element represents regarding its parent tm:span
element.
For example:
Original document
<text:p text:style-name="P11">
This is used to generate a format <text:span text:style-name="T5" text:emph="normal"> as required.
The generated format can then be applied to the output.
See Appendix C for details</text:span>.
</text:p>
xml:tm
version of the document
<text:p text:style-name="P11">
<tm:te id="e9" tu="5" version="1.0">
<tm:tu crc="ebda4c34" id="u9.1" version="1.0">
This is used to generate a format
<tm:span name="text:span" type="start">
<tm:attr name="text:style-name" value="T5"/>
<tm:attr name="text:emph" value="normal"/>
</tm:span> as required.
</tm:tu>
<tm:tu crc="a0bcf600" id="u9.2" version="1.0">
The generated format can then be applied to the output.
</tm:tu>
<tm:tu crc="7e489ac6" id="u9.3" version="1.0">
See Appendix C for details<tm:span name="text:span" type="end"/>.
</tm:tu>
</tm:te>
</text:p>
Default value:
Undefined
tm:span
denotes the name of the original element that is being spanned.
Value description:
The name of the original element that is being replaced by the tm:span
element pair. Please refer to the previous item for examples.
Default value:
Undefined
Used in:
<tm:ta>
, <tm:ti>
, <tm:span>
, <tm:attr>
.
Source language - The language for the main <tm:tm>
element.
Value description:
A language code as described in RFC 4646. The values for this attribute follow the same rules as the values for xml:lang
. Unlike the other xml:tm
attributes, the values for xml:lang
are not case-sensitive. For more information see the section on xml:lang
in the XML specification , and the erratum E11 (which replaces RFC 1766 by RFC 4646).
Default value:
Undefined
Used in:
document root element, <tm:tm>
, <tm:vh>
.
Subflow indicator - defined for a tm:ti
element if it is an inline element that is to be treated as a subflow
for translation, rather than an inline translatable attribute.
Value description:
For use with inline elements such as footnotes or index
markers to indicate that the contents of the inline element are to be
treated separately for translation purposes (appear in their own XLIFF trans-unit
element) and do not form part of the linguistic text unit entity.
Default value:
no
Used in:
<tm:ti>
.
<footnote><tm:ti sublow="yes" id="i1.1.1" crc="3dedf1">footnote text</tm:ti></footnote>
.
target language - The language for the current document if it is a translation of the source language document.
Value description:
A language code as described in RFC 4646. The values for this attribute follow the same rules as the values for xml:lang
. Unlike the other xml:tm
attributes, the values for xml:lang
are not case-sensitive. For more information see the section on xml:lang
in the XML specification , and the erratum E11 (which replaces RFC 1766 by RFC 4646)
Default value:
Undefined
Used in:
document root element, <tm:tm>
.
Non-inline translatable element attribute counter - The maximum value of the <tm:ta>
"id"
attribute within this document.
Value description:
Each time a new non-inline translatable attribute
element is created it is allocated a unique ID identifier formed from
the character 'a'
plus the integer value of the document root element, <tm:tm>
"tm:ta"
attribute value plus one. The document root element, <tm:tm>
"tm:ta"
attribute is then also incremented to reflect the new maximum value.
Default value:
0
Used in:
document root element, <tm:tm>
.
Translatable text element attribute counter - The maximum value of the <tm:te>
"id"
attribute within this document.
Value description:
Each time a new element containing text and/or inline
elements is created it is allocated a unique ID identifier formed from
the character 'e'
plus the integer value of the document root element, <tm:tm>
"tm:te"
attribute value plus one. The document root element, <tm:tm>
"tm:ta"
attribute is then also incremented to reflect the new maximum value.
Default value:
0
Used in:
document root element, <tm:tm>
.
Inline translatable element attribute counter - The maximum value of the <tm:ti>
"id"
attribute within this <tm:tu>
element.
Value description:
Each time a new inline translatable attribute element is encountered within a <tm:tu>
element it is allocated a unique ID identifier formed from the character 'i' plus the integer value of the <tm:tu>
"ti"
attribute value plus one. The <tm:tu>
"ti"
attribute is then also incremented to reflect the new maximum value.
Default value:
0
Used in:
<tm:tu>
.
Name - The identifier of the tool used to create the text memory.
Value description:
the name of the xml:tm
creation tool.
Default value:
Undefined
Used in:
document root element, <tm:tm>
.
Tool Version - The version identifier of the tool used to create the text memory.
Value description:
the version identifier of the xml:tm
creation tool.
Default value:
Undefined
Used in:
document root element, <tm:tm>
.
Translatability indicator - Indicates if the contents of the <tm:tu>
, <tm:ta>
or <tm:ti>
element is translatable or not.
Value description:
This attribute has the possible values "yes"
, or "no"
.
Default value:
yes
Used in:
Text unit element attribute counter - The maximum value of the <tm:tu>
"id"
attribute within a <tm:te>
element.
Value description:
This attribute contains the maximum value of <tm:tu>
id
attributes allocated for this <tm:te>
element. Each time a new text unit element is created within a <tm:te>
element it is allocated a unique ID identifier formed from the character 'u' plus the integer value of the <tm:te>
"tu"
attribute value plus one. The <tm:te>
"tu"
attribute is then also incremented to reflect the new maximum value.
Default value:
0
Used in:
<tm:te>
.
Type
tm:tu
, tm:ti
, tm:ta
, denotes the basic translation classification of the PCDATA
text content of this element.
Value description:
The possible types for the text types are:
alphanumeric
"104AGC"
.numeric
"10.254"
.measurement
"10.52 mm"
.punctuation
"-"
.markup
<inline/>
"."notrans"
"x-"
x-
and will be assumed to be non-translatable.text
Default value:
text
tm:span
, denotes if this is the 'start'
or 'end'
element of the tm:span
element pair.
Value description:
The possible types for the text types are:
start
tm:span
is the start of the tm:span
pair.end
tm:span
is the end of the tm:span
pair.Default value:
NONE
Used in:
<tm:ta>
, <tm:ti>
, <tm:tu>
, <tm:span>
.
Value - The value of the tm:attr
element attribute.
Value description:
The version number of this attribute denoted by the tm:attr
element:
Default value:
Undefined
Used in:
Version - The version number.
Value description:
The version number of this tm
namespace element :
Default value:
Undefined
Used in:
document root element, <tm:tm>
, <tm:vh>
, <tm:te>
, <tm:ta>
, <tm:ti>
, <tm:th>
, <tm:tu>
.
xml:tm Version - The version number of the xml:tm specification implemented.
Value description:
The version number of this specification
Default value:
Undefined
Used in:
document root element, <tm:tm>
.
Language - The xml:lang
attribute specifies the locale language variant of the xml:tm
document.
Value description:
A language code as described in RFC 4646.
This declared value is considered to apply to all elements within the
content of the element where it is specified, unless overridden with
another instance of the xml:lang
attribute. Unlike the other xml:tm
attributes, the values for xml:lang
are not case-sensitive. For more information see the section on xml:lang
in the XML specification , and the erratum E11 (which replaces RFC 1766 by RFC 4646)
The use of xml:lang for the Document root element or deprecated <tm:tm>
element is not recommended. The source-language
attribute must be used to denote the original source language of the document. The target-language
attribute must be used to denote the target language of the document where applicable. The use of xml:lang
would therefore be ambiguous with regard to target language documents.
The use of xml:lang for xml:tm
elements apart from document root element, <tm:tm>
denotes that the text contained in the element relates to a language other than that declared in the source-language
or, for target language documents, declared in the target-language
attribute for the document.
Default value:
Undefined
Used in:
<tm:ta>
, <tm:ti>
, <tm:te>
, <tm:tu>
.
White spaces - The xml:space
attribute specifies how white spaces (ASCII spaces, tabs and line-breaks) should be treated.
Value description:
default
or preserve
. The value default
signals that applications' default white-space processing modes are acceptable for this element; the value preserve
indicates the intent that applications preserve all the white space.
This declared intent is considered to apply to all elements within the
content of the element where it is specified, unless overridden with
another instance of the xml:space
attribute.
For more information see the section on xml:space
in the XML specification .
Default value:
default
.
Used in:
<tm:ta>
, <tm:ti>
, <tm:te>
, <tm:tu>
.
The use of the xml:tm
namespace facilitates various types of advanced translation memory matching at the XML document level:
Once a document has been translated there is an exact alignment
between the source language version and the target versions. If the
source language version is subsequently updated, then there is an exact
link between the source and target language tm:tu
elements which is maintained at the 'id'
attribute level. This information, along with the other xml:tm
namespace facilities allow for translation memory matching at the document level.
Within xml:tm
it is assumed that all translation memory
matching will be done during the creation of an XLIFF file and that all
translation operations will occur at the XLIFF file level.
Where a tm:tu
element has the same id
attribute as the previous version then an Exact Match
can be declared. Please note that if the text unit immediately before
an unchanged text unit has been changed or deleted, then the exact match
should be degraded to a leveraged match as it will need to be reviewed
for correctness as the change could affect the translation of the
unchanged text unit.
The tm:tu
'crc'
attribute value can be used for in-document leveraged matching.
If a tm:tu
element has a modification history (tm:mh
), then the first text unit history element (tm:th
) can be used to locate the previous target translation as an in-document fuzzy match.
If a tm:tu
element has a 'type'
attribute value other than 'text'
then the contents can be described as non-translatable in the resultant XLIFF file and no match needs to take place.
The following figure shows the possible structure as a tree. Each element is followed by notation indicating its possible occurrence according to the corresponding legend.
(legend: 1 = one + = one or more ? = zero or one * = zero, one or more) document root element 1 | +--- <tm:vh>* | +--- <tm:ta>* | +--- <tm:te>* | +--- <tm:tu>+ | +--- <tm:ti>* | +--- <tm:span>* | | | *---<tm:attr>* | +--- <tm:mh>? | +--- <tm:th>+
Please note that the use of the tm:tm
element is now deprecated in favour of using the Document root element to store the key xml:tm
document attributes.
<?xml version="1.0" encoding="UTF-8"?> <!-- Document : tm.xsd Version : 1.0 Created on : 26 feburary 2007 Authors : azydron@xml-intl.com, rmraya@heartsome.net Description : This XML Schema defines the structure of the xml:tm namespace Note : Final version approved 26 February 2007 Copyright © 2007 The Localisation Industry Standards Association [LISA]. All Rights Reserved. --> <xs:schema xmlns:tm="urn:X-etsi-xml-tm-tags" targetNamespace="urn:X-etsi-xml-tm-tags" xml:lang="en" xmlns:xs="http://www.w3.org/2001/XMLSchema" elementFormDefault="qualified"> <xs:import namespace="http://www.w3.org/XML/1998/namespace" schemaLocation="http://www.w3.org/2001/xml.xsd"/> <!-- ================================================== Restrictions ================================================== --> <!-- Restrictions for "type" attribute --> <xs:simpleType name="type"> <xs:restriction base="xs:token"> <xs:enumeration value="text"/> <xs:enumeration value="alphanumeric"/> <xs:enumeration value="numeric"/> <xs:enumeration value="measurement"/> <xs:enumeration value="punctuation"/> <xs:enumeration value="markup"/> <xs:enumeration value="notrans"/> </xs:restriction> </xs:simpleType> <!-- "flag" attribute for text units --> <xs:simpleType name="flag1"> <xs:restriction base="xs:token"> <xs:enumeration value="normal"/> <xs:enumeration value="merged"/> <xs:enumeration value="noequiv"/> </xs:restriction> </xs:simpleType> <!-- "flag" attribute for tm:ta and tm:ti --> <xs:simpleType name="flag2"> <xs:restriction base="xs:token"> <xs:enumeration value="normal"/> <xs:enumeration value="noequiv"/> </xs:restriction> </xs:simpleType> <!-- "yes/no" values --> <xs:simpleType name="YesNo"> <xs:restriction base="xs:token"> <xs:enumeration value="yes"/> <xs:enumeration value="no"/> </xs:restriction> </xs:simpleType> <!-- "start/end" values --> <xs:simpleType name="StartEnd"> <xs:restriction base="xs:token"> <xs:enumeration value="start"/> <xs:enumeration value="end"/> </xs:restriction> </xs:simpleType> <!-- Restrictions for "id" attribute in tm:tu --> <xs:simpleType name="UnitID"> <xs:restriction base="xs:string"> <xs:pattern value="u[0-9]+\.[0-9]+"/> </xs:restriction> </xs:simpleType> <!-- Restrictions for "crc" "id" attribute in tm:tm --> <xs:simpleType name="CRC"> <xs:restriction base="xs:string"> <xs:pattern value="[0-9a-fA-F]+"/> </xs:restriction> </xs:simpleType> <!-- Restrictions for "id" attribute in tm:te --> <xs:simpleType name="ElementID"> <xs:restriction base="xs:string"> <xs:pattern value="u[0-9]+"/> </xs:restriction> </xs:simpleType> <!-- Restrictions for "id" attribute in tm:ti --> <xs:simpleType name="InlineID"> <xs:restriction base="xs:string"> <xs:pattern value="i[0-9]+(\.[0-9]+)+"/> </xs:restriction> </xs:simpleType> <!-- Restrictions for "id" attribute in tm:ta --> <xs:simpleType name="AttributeID"> <xs:restriction base="xs:string"> <xs:pattern value="a[0-9]+"/> </xs:restriction> </xs:simpleType> <!-- Restrictions for "date" attributes --> <xs:simpleType name="Date"> <xs:restriction base="xs:string"> <xs:pattern value="[1-2][0|9][0-9][0-9][0-1][0-9][0-3][0-9]T[0-2][0-9]([0-5][0-9]){2}Z"/> </xs:restriction> </xs:simpleType> <!-- Restrictions for user-defined attribute values --> <xs:simpleType name="Custom"> <xs:restriction base="xs:string"> <xs:pattern value="x-[^\s]+"/> </xs:restriction> </xs:simpleType> <!-- Restrictions for xml:space attribute --> <xs:simpleType name="space"> <xs:restriction base="xs:token"> <xs:enumeration value="default"/> <xs:enumeration value="preserve"/> </xs:restriction> </xs:simpleType> <!-- ================================================== Structural Elements ================================================== --> <!-- The main tm object --> <xs:element name="tm"> <xs:complexType> <xs:sequence> <xs:element minOccurs="0" maxOccurs="unbounded" ref="tm:ta"/> <xs:element minOccurs="0" maxOccurs="unbounded" ref="tm:te"/> <xs:element minOccurs="0" maxOccurs="unbounded" ref="tm:vh"/> </xs:sequence> <xs:attribute name="id" use="required" type="tm:CRC"/> <xs:attribute name="te" use="required"> <xs:simpleType> <xs:restriction base="xs:integer"> <xs:minInclusive value="0"/> </xs:restriction> </xs:simpleType> </xs:attribute> <xs:attribute name="ta" use="required"> <xs:simpleType> <xs:restriction base="xs:integer"> <xs:minInclusive value="0"/> </xs:restriction> </xs:simpleType> </xs:attribute> <xs:attribute name="version" use="required" type="xs:ID"/> <xs:attribute name="date" use="required" type="tm:Date"/> <xs:attribute name="source-language" use="required" type="xs:NMTOKEN"/> <xs:attribute name="xmltm-version" use="required"> <xs:simpleType> <xs:restriction base="xs:string"> <xs:enumeration value="1.0"/> </xs:restriction> </xs:simpleType> </xs:attribute> <xs:attribute name="tool-name" use="required" type="xs:NMTOKEN"/> <xs:attribute name="tool-version" use="required" type="xs:NMTOKEN"/> <xs:attribute name="target-language" use="optional" type="xs:NMTOKEN"/> <xs:attribute name="doctype-public" use="optional" type="xs:string"/> <xs:attribute name="doctype-system" use="optional" type="xs:string"/> </xs:complexType> </xs:element> <!-- The version history for this object --> <xs:element name="vh"> <xs:complexType> <xs:attribute name="version" use="required" type="xs:ID"/> <xs:attribute name="date" use="required" type="tm:Date"/> <xs:attribute name="source-language" use="optional" type="xs:NMTOKEN"/> </xs:complexType> </xs:element> <!-- Translatable attributes for non-inline elements --> <xs:element name="ta"> <xs:complexType mixed="true"> <xs:attribute name="id" use="required" type="tm:AttributeID"/> <xs:attribute name="crc" use="required" type="tm:CRC"/> <xs:attribute name="version" use="required" type="xs:IDREF"/> <xs:attribute name="flag" use="required" type="tm:flag2"/> <xs:attribute name="type" use="required"> <xs:simpleType> <xs:union memberTypes="tm:type tm:Custom"/> </xs:simpleType> </xs:attribute> <xs:attribute name="name" use="required" type="xs:NMTOKEN"/> <xs:attribute name="translate" use="optional" default="yes" type="tm:YesNo"/> <xs:attribute ref="xml:space" default="default"/> <xs:attribute ref="xml:lang"/> </xs:complexType> </xs:element> <!-- Text elements --> <xs:element name="te"> <xs:complexType> <xs:sequence> <xs:element minOccurs="1" maxOccurs="unbounded" ref="tm:tu"/> </xs:sequence> <xs:attribute name="id" use="required" type="tm:ElementID"/> <xs:attribute name="version" use="required" type="xs:IDREF"/> <xs:attribute name="tu" use="required"> <xs:simpleType> <xs:restriction base="xs:integer"> <xs:minInclusive value="1"/> </xs:restriction> </xs:simpleType> </xs:attribute> <xs:attribute ref="xml:space" default="default"/> <xs:attribute ref="xml:lang"/> </xs:complexType> </xs:element> <!-- Text units --> <xs:element name="tu"> <xs:complexType mixed="true"> <xs:choice minOccurs="0" maxOccurs="unbounded"> <xs:element minOccurs="0" maxOccurs="unbounded" ref="tm:ti"/> <xs:element minOccurs="0" maxOccurs="1" ref="tm:mh"/> <xs:element minOccurs="0" maxOccurs="unbounded" ref="tm:span"/> </xs:choice> <xs:attribute name="id" use="required" type="tm:UnitID"/> <xs:attribute name="crc" use="required" type="tm:CRC"/> <xs:attribute name="version" use="required" type="xs:IDREF"/> <xs:attribute name="flag" use="required" type="tm:flag1"/> <xs:attribute name="type" use="required"> <xs:simpleType> <xs:union memberTypes="tm:type tm:Custom"/> </xs:simpleType> </xs:attribute> <xs:attribute name="ti"> <xs:simpleType> <xs:restriction base="xs:integer"> <xs:minInclusive value="0"/> </xs:restriction> </xs:simpleType> </xs:attribute> <xs:attribute name="translate" use="optional" default="yes" type="tm:YesNo"/> <xs:attribute ref="xml:space" default="default"/> <xs:attribute ref="xml:lang"/> </xs:complexType> </xs:element> <!-- Translatable in-line element attributes --> <xs:element name="ti"> <xs:complexType mixed="true"> <xs:attribute name="id" use="required" type="tm:InlineID"/> <xs:attribute name="crc" use="required" type="tm:CRC"/> <xs:attribute name="version" use="required" type="xs:IDREF"/> <xs:attribute name="flag" use="required" type="tm:flag2"/> <xs:attribute name="type" use="required"> <xs:simpleType> <xs:union memberTypes="tm:type tm:Custom"/> </xs:simpleType> </xs:attribute> <xs:attribute name="name" use="required" type="xs:NMTOKEN"/> <xs:attribute name="subflow" use="required" type="tm:YesNo"/> <xs:attribute name="translate" use="optional" default="yes" type="tm:YesNo"/> <xs:attribute ref="xml:space" default="default"/> <xs:attribute ref="xml:lang"/> </xs:complexType> </xs:element> <!-- Modification history --> <xs:element name="mh"> <xs:complexType> <xs:sequence> <xs:element minOccurs="1" maxOccurs="unbounded" ref="tm:th"/> </xs:sequence> </xs:complexType> </xs:element> <!-- Text history --> <xs:element name="th"> <xs:complexType> <xs:attribute name="id" use="required" type="tm:UnitID"/> <xs:attribute name="crc" use="required" type="tm:CRC"/> <xs:attribute name="version" use="required" type="xs:IDREF"/> </xs:complexType> </xs:element> <!-- Span element --> <xs:element name="span"> <xs:complexType> <xs:sequence> <xs:element minOccurs="0" maxOccurs="unbounded" ref="tm:attr"/> </xs:sequence> <xs:attribute name="name" use="required" type="xs:NMTOKEN"/> <xs:attribute name="type" use="required"> <xs:simpleType> <xs:union memberTypes="tm:type tm:Custom"/> </xs:simpleType> </xs:attribute> </xs:complexType> </xs:element> <!-- Attribute element --> <xs:element name="attr"> <xs:complexType> <xs:attribute name="name" use="required" type="xs:NMTOKEN"/> <xs:attribute name="value" use="required" type="xs:string"/> </xs:complexType> </xs:element> </xs:schema>