Merging glossary entry titles

Shaun McCance <shaunm at gnome.org>
Sun Jul 10 21:57:32 EDT 2011

When merging glossary entries with the same ID, I'm trying to
figure out the best thing to do with multiple titles. I want
to allow multiple titles per term, and display those, but I
also want to remove duplicates.

Assume these two definitions (probably on different pages):

<gloss:term id="mallard">
  <title>Mallard</title>
  <p>A dynamic, topic-oriented help language</p>
</gloss:term>
<gloss:term id="mallard">
  <title>Mallard</title>
  <p>An extensible XML format for help documents</p>
</gloss:term>

"Mallard" == "Mallard", so let's only show it once. Consider
these definitions:

<gloss:term id="mallard">
  <title>Mallard</title>
  <p>A dynamic, topic-oriented help language</p>
</gloss:term>
<gloss:term id="mallard">
  <title>Mallard XML</title>
  <p>An extensible XML format for help documents</p>
</gloss:term>

"Mallard" != "Mallard XML", so we show both titles on the entry.
But now consider this:

<gloss:term id="mallard">
  <title>Mallard</title>
  <p>A dynamic, topic-oriented help language</p>
</gloss:term>
<gloss:term id="mallard">
  <title>
    Mallard
  </title>
  <p>An extensible XML format for help documents</p>
</gloss:term>

"Mallard" != "\n    Mallard\n  ". So I could just say titles
are compared based on their normalize-space() values. But that
could have odd interactions if you do crazy stuff with <code>
or <sys> or similar in your titles. (I don't think <code> in
titles is unreasonable, but maybe I'm hunting corner cases too
much to consider issues with whitespace differences.)

OK, normalize space. But now what about this one?

<gloss:term id="mallard">
  <title><em>Mallard</em> XML</title>
  <p>A dynamic, topic-oriented help language</p>
</gloss:term>
<gloss:term id="mallard">
  <title>Mallard <em>XML</em></title>
  <p>An extensible XML format for help documents</p>
</gloss:term>

These have the same string value, so we'd drop one. But they
render differently. And I can't provide any sort of guarantee
as to which one will be dropped. I know, I know, don't do this
in your documents. But maybe you use some markup in the terms
in your main document, and some plugin page gets dropped in
that doesn't mark up the term, and the processor happens to
use that term title instead, so your markup is lost.

The one potential saving grace here is that, with IDs, titles
only need to be provided by one definition. But there's still
some potential for odd corner cases when people do provide
titles multiple times.

I think "normalize space, compare string values, drop dups,
which dup is dropped is undefined" is the sanest thing to do,
but it has some oddities, so I'd like opinions.

--
Shaun