ITS Conformance

This page discusses Mallard's conformance to the requirements in the W3C Internationalization and Localization Markup Requirements, as well as its usage of attributes and elements from the W3C Internationalization Tag Set.

As of the time of this writing, there are 26 requirements, though not all of them are complete. This page discusses a selection of the requirements. Future versions may discuss more requirements.

R002: Span-Like Element

[R002] span-like element is required to allow authors to mark sections text that may have special properties, from a localization and internationalization point of view.

Mallard provides the span element, a general-purpose span-like element. The span element accepts attributes from external namespaces, allowing attributes such as xml:lang and its:translate to be used in Mallard documents.

R004: Unique Identifier

[R004] It should be possible to attach a unique identifier to any localizable item. This identifier should be unique within a document set, but should be identical across all translations of the same item.

While the id attribute is only allowed on page and section elements, Mallard does allow attributes from external namespaces to be used on all elements. If necessary for translation purposes, any attribute from an external namespace may be used as a unique identifier. In particular, Mallard does not use the common xml:id for page and section IDs, but it may be used on any element to provide a unique identifier for translation or any other purposes.

R006: Identifying Language/Locale

[R006] Any document at its beginning should declare a language/locale that is applied to both main content and external content stored separately. While the language/locale may be declared for the whole document, when an element or a text span is in a different language/locale from the document-level language, it should be labeled appropriately. Therefore, DTD/Schema should allow any elements to have a language/locale specifying attribute. The language/locale declaration should use industry standard approaches.

Mallard allows the standard xml:lang attribute to be used on all elements.

Note that there are two different methods of identifying language and locale information that are likely to be encountered by those working with Mallard. Since Mallard is an XML format, language identifiers are expected to conform to IETF RFC 3066. Since Mallard is designed to be used in a desktop help system, POSIX locale identifiers are more convenient. This is a potentially serious interchange issue, and this document currently offers no solutions to this problem.

R007: Identifying Terms

[R007] It should be possible to identify terms inside an element or a span and to provide data for terminology management and index generation. Terms should be either associated with attributes for related term information or linked to external terminology data.

Mallard does not currently provide a means of marking up terms and definitions. When necessary for translation purposes, the its:term and its:termInfoRef attributes may be used on any elements to indicate such a relationship.

R008: Purpose Specification/Mapping

[R008] Currently, it does not appear to be realistic that all XML vocabularies tag localization-relevant information identical (e.g. all use the "term" tag for terms). One way to take care of diverse localization-relevant markup in localization environments is a mapping mechanism which maps localization-relevant markup onto a canonical representation (such as the Internationalization Tag Set).

Any purpose mapping that can be encoded using the its:rules element can be included in a Mallard document. The its:rules element may be used in any info element. See also Associating ITS Data Categories with Existing Markup.

R011: Bidirectional Text Support

[R011] Markup should be available to support the needs of bidirectional scripts.

Mallard allows attributes from external namespaces to be used on all elements. Consequently, the its:dir attribute may be used to specify text directionality.

R012: Indicator of Translatability

[R012] Methods must exist to allow to specify the parts of a document that are to be translated or not.

Mallard allows attributes from external namespaces to be used on all elements. Consequently, the its:translate attribute may be used to specify whether parts of a document are to be translated.

Additionally, the its:rules element may be used in any info element to provide translatability rules for a page or section.

R014: Limited Impact

[R014] All solutions proposed should be designed to have as less impact as possible on the tree structure of the original document and on the content models in the original schema.

Mallard allows tool-specific extensibility using attributes and elements from external namespaces. Mallard has clearly defined rules for how attributes and elements from external namespaces are to be processed in various contexts. Tools writers are expected to be aware of these issues. Whenever possible, this document issues that can arise from extensions, including those for translation purposes.

While it is impossible to predict all issues one might encounter, Mallard was developed after years of developing translation tools for other formats. Internationalization and localization were primary concerns in the design of Mallard.

R015: Attributes and Translatable Text

[R015] Provisions must be taken to ensure that attributes with translatable values do not impair the localization process.

Mallard never places translatable text in attribute values.

R017: Localization Notes

[R017] A method must exist for authors to communicate information to localizers about a particular item of content.

Mallard allows attributes from external namespaces to be used on all elements. Consequently, the its:locNote and its:locNoteRule attributes may be used to provide localization notes.

If more extensive localization notes are needed, the comment element may be used. Using a its:rules element in an info element, one can clearly specify which editorial comments are localization notes.

R020: Annotation Markup

[R020] There must be a way to support markup up of text annotations of the 'ruby' type.

All translatable content in Mallard is placed in element content, which allows annotation markup to be used. Mallard never places translatable content in attribute values. Note, however, that Mallard documents will often be displayed by converting them to a format such as HTML. If the display format places textual content in attribute values (such as the alt attribute of the img tag in HTML), then annotations could be lost in rendering.

Elements from external namespaces may be used in all inline contexts. While this allows Ruby annotations to be embedded within a Mallard document, the fallback processing expectations are unlikely to produce satisfactory results for tools that do not support Ruby. Future versions of this document should address this issue.

R022: Nested Elements

[R022] Great care must be taken when defining or using nested translatable elements.

Mallard explicitly disallows mixing block and inline content, except in well-defined cases which can easily be detected and handled. In Mallard, any block element which can contain text directly is considered to be a translation unit. Since these elements do not allow general block content to be mixed into the inline content, translation units can always be presented to translators without the need for placeholders.

Note that this may not be the case if a translation tool chooses to treat certain container elements as translation units. For example, under some circumstances a translation tool might choose to present tables or lists as translatable to allow translators to reorder the rows or items. In these cases, the block content inside the entries or items would still constitute discrete units of translations, making placeholders necessary.

R025: Elements and Segmentation

[R025] Methods, independent of the semantic, of the elements must exist to provide hints on how to break down document content into meaningful runs of text.

Making meaningful distinctions is ultimately the job of a processing tool, although the design of an XML vocabulary can have a significant impact on implementation difficulty. The following notes will be relevant to most tool implementers.

  • In Mallard, the content of any element, taken in context, is unambiguously general inline content, general block content, or some particular type of structured content. It is never the case that processing tools must probe the contents to determine the content model.

    Note that, since some element names are used in both block and inline contexts, such ambiguous content models would be particularly problematic for Mallard. Ambiguous content models could lead to situations where it is not possible to determine the function of an element such as code. Thus, ambiguous content models are explicitly avoided. This makes most processing tasks simpler.

  • In Mallard, elements generally contain either block content or inline content. Thus, for example, you cannot place a paragraph inside a paragraph. This is simpler for translators, as well as for translation tool implementers, because it reduces the need to use placeholders for separate translation units.

  • One notable exception to the above is the item element in tree lists. To simplify writing, tree list items simply take inline content followed by any number of nested tree list items. Since the block-like items are not interspersed with the inline content, however, translation tools should be able to handle this case without placeholders.

  • It is noteworthy that Mallard reuses some element names in both block and inline contexts. The code and media elements are two examples of this. Since Mallard never allows general block content to be mixed with general inline content, the purpose of these elements is unambiguous when processed in context. Thus, it is important that tools always process elements in context to handle them correctly.

© 2009 Shaun McCance
cc-by-sa 3.0 (us)

This work is licensed under a Creative Commons Attribution-Share Alike 3.0 United States License.

As a special exception, the copyright holders give you permission to copy, modify, and distribute the example code contained in this document under the terms of your choosing, without restriction.

Powered by