1.2 (2019-02-09)
Internationalization Notes
This is a draft specification. It is likely that changes will still be made before the final specification.
This page provides details for people creating translation tools and other internationalization systems that support Mallard documents. Mallard is an XML format, so there is a wealth of translation tools and techniques that can be applied. XML does not immediately solve all problems, however, and Mallard was designed from the beginning to support features that are important to proper document localization.
Message-based Translation
One of the most difficult aspects of document translation is tracking changes to the source document. For this reason, message-based translation systems such as PO files or XLIFF are popular. XML formats lend themselves well to message-based translation systems, but developers must decide what type of content constitutes a message.
In Mallard, block and inline content are always distinct. You can never, for example, place a paragraph inside the text content of another paragraph. This was done in large part to avoid the need for placeholders in messages. Some elements, such as code and media, can occur in either a block or an inline context. When processing a Mallard page, it is important to keep track of context so you know how to treat an element.
Mallard never places translatable text in attribute values.
Internationalization Attributes
Mallard does not contain its own elements to specify language or text directionality. To specify language, use the standard xml:lang attribute. To override text directionality, use the its:dir attribute from the W3C Internationalization Tag Set (ITS).
It is strongly recommended that you support ITS in translation tools. For example, to allow authors to mark certain elements as non-translatable, use the its:translate attribute.
Localization Notes
Processing tools should support ITS mechanisms for providing notes to translators. At a minimum, tools should support the standalone its:locNote attribute for localization notes.
If more extensive localization notes are needed, the comment element may be used. In tools that support global ITS rules, particular comment elements can be set as localization notes using the its:locNoteRule element in an its:rules element in an info element.
Locales
There are two different methods of identifying language and locale information that are likely to be encountered by those working with Mallard. Since Mallard is an XML format, language identifiers are expected to conform to IETF RFC 3066. Since Mallard is often used in a desktop help system, POSIX locale identifiers are often more convenient. Processing tools should convert between these two formats, and should generally prefer RFC 3066 identifiers, except where compatibility with other systems takes a priority.
The language and region portions of these two locale identifier schemes are identical (although by convention often differ in case). Other modifiers, such as script, nearly always use different codes, necessitating a conversion table.
Link Text
Mallard can automatically generate text for a link using the title of the target page or section. In many formats, this presents a serious problem to languages with declensions for different parts of speech.
Mallard allows any number of extra link titles to be provided for a page or section. Each of these link titles specifies where it should be used with the role attribute. Links can then specify which title to use using the role attribute on the link element.
You should provide a way for translators to provide additional link titles for each page or section, and for them to specify the appropriate roles on inline links.
Media Elements
Mallard allows audio, video, and images to be inserted into pages using the media element. These cannot generally be translated using textual translation tools. You should provide a way for translators to provide localized multimedia files and see when the original files have changed.
Translation Credits
Mallard allows contributors to be credited using the credit element. The type attribute can be set to "translator" to credit translators. Since translator credits aren't direct translations of existing elements, they can't be provided using simple message-based translation alone.
You should provide a mechanism to insert translator credits into translated output pages.