Cache Files 1.0

Mallard features a unique automatic linking mechanism that requires a processing tool to know metadata about each page in a document to process any page correctly. There are many ways this information could be stored. For example, it could be kept in memory by a program, stored in a relational database, or serialized to a file. This specification defines an XML cache file format that is easy to generate and is suitable for use with technologies like XSLT and XQuery.

Mallard processing tools are not required to support cache files. A tool may use a completely different method of storing and reading page metadata.

Cache files are not necessarily intended for interchange between different tools. They may be used for interchange, but there are potential issues that should be dealt with. Potential interchange issues are addressed throughout this specification.

This is a candidate specification. Changes are unlikely, but may still be made before the final specification.

Specification

Mallard cache files use a mixture of elements from the core Mallard namespace and the cache namespace. Throughout this specification, whenever the cache namespace is referenced, or the namespace prefix cache is used, the namespace is:

http://projectmallard.org/cache/1.0/

The root element of a cache file is a cache:cache element. The cache:cache element may have a version attribute. The version attribute can be used to validate the cache file by combining schemas using the method specified for the version attribute on Mallard pages. Because the cache files schema is designed to be combined with the core Mallard schema, you should use at minimum two version tokens: the first to specify the cache file version, and the second to specify the Mallard version. If the version attribute is omitted, it is assumed to be cache/1.0 1.0.

The cache:cache element contains one or more page elements, one for each page in the document. Each page element contains a copy of the attributes found on the source page element. In particular, a page element must have an id attribute. Cache generators may add external-namespace attributes or add style hints to the style attribute. Relying on this behavior is a potential interchange issue.

In addition to the source attributes, each page element contains a cache:href attribute giving a URI that identifies the location of the source page file. The URI may be absolute or relative. A relative URI has the advantage that the cache file remains correct if it is moved along with the document to a different location, potentially on a different computer. An absolute URI has the advantage that the cache file can be moved independently of the document and continue to be valid, as long as the document is not moved.

Each page element contains an optional info element, a title element, an optional subtitle element, and zero or more section elements. These correspond directly to the elements in the source page.

Each section element contains a copy of the attributes found on the source section element, with one modification: the id attribute is modified to include the id of the containing page, using the syntax page_id#section_id. This is the same syntax used to link to the section from another page. Cache generators may add external-namespace attributes or add style hints to the style attribute. Relying on this behavior is a potential interchange issue.

Each section element contains an optional info element, a title element, an optional subtitle element, and zero or more section elements. These correspond directly to the elements in the source section.

Each info element (for pages or sections) should contain a copy of the elements in the source info element. However, cache generators may exclude some elements for efficiency. Since processing tools may make use of extra info child elements in unexpected ways, excluding any elements is a potential interchange issue. At a minimum, cache generators should copy all link, title, and desc elements.

Cache generators may also add elements to the info element, either from external namespaces, or to make default Mallard behavior explicit. For example, if the source element does not contain a link title, a cache generator might add one with the value of the primary title. This could make a processing tool simpler, but relying on this redundant information in the cache file is a potential interchange issue.

Schema

The formal definition of the Mallard Cache Files extension is maintained in RELAX NG Compact Syntax in code blocks within this specification.

default namespace mal = "http://projectmallard.org/1.0/"
namespace cache = "http://projectmallard.org/cache/1.0/"

start = cache_cache

cache_cache = element cache:cache {
  attribute version { text } ?,
  cache_page +
}

cache_page = element page {
  mal_page_attr,
  attribute cache:href { text },

  mal_info ?,
  mal_block_title,
  mal_block_subtitle ?,
  cache_section *,
}

cache_section = element section {
  mal_section_attr,

  mal_info ?,
  mal_block_title,
  mal_block_subtitle ?,
  cache_section *,
}