Duck syntax

Petr Kovar <pknbe at volny.cz>
Wed Sep 10 17:35:29 EDT 2014

Hi Shaun,

Thanks a lot for kicking this off. As I said earlier, I believe that, once
finished, this will be another strong selling point of Mallard.

On Sat, 30 Aug 2014 00:09:09 -0400
Shaun McCance <shaunm at gnome.org> wrote:

> Some of you know I've toyed around with a non-XML syntax for Mallard.
> I'm going to start tossing out what's been in my head in the hopes that
> others can follow along.

I'm not exactly an expert in the field of syntax design but let me try...

> An explicit design goal is to not have too many syntactical constructs.
> One of the nice thins about XML is that you know that < and & are the
> only things that are syntactically significant in the middle of text.
> Drives me nuts when I have to remember which non-alphanums will turn my
> text flashing magenta, and how to escape that particular thing. It also
> drives me nuts when I have to escape my own last name.
> 
> Another design goal is to not arbitrarily limit what you can put inside
> other things. Mallard is perfectly fine with inline markup inside code
> blocks, or list items inside table cells, or whatever. These are things
> that are often impossible in lightweight languages.
> 
> Like other lightweight languages (markdown, asciidoc, rst, wikis),
> paragraphs are implicit from an empty line. Note that there are only
> three blocks in Mallard that take inline text: para, code, and screen.
> That'll be relevant later. But when you're just writing paragraphs, you
> make new ones the same way I do in this email.
> 
> Unlike many other languages, indentation matters. A lot. Similarly to
> Python, indentation indicates where you are. So if I start a bulleted
> list using a leading "* ", then I'm inside that list item as long as I
> maintain that indentation.
> 
> * This is a paragraph inside the list item.
>   This is still the same paragraph.
> 
>   This is a new paragraph because of the empty line, but it's
>   still inside the same list item because of the indentation.
> 
> Page and section headers start with a leading "= ", "== ", "=== ", etc.
> I'm not going with the style of "underlining" with "===", "---", "~~~",
> etc for two reasons: One, I dislike having to look at the next line to
> parse the line I'm on. Two, I can never remember which fancy character
> indicates which level.
> 
> Any type of block other than a paragraph uses an opening tag with square
> brackets, like so:
> 
> [note]
> This is a note.
> 
> This is reminiscent of section groupings in config files. I played with
> a few different syntaxes, manually converting bunches of pages, and I
> really liked this one. Note that, unless it's [code] or [screen], you
> implicitly get a paragraph, unless the next line is another opening
> block tag.
> 
> If you don't indent, as in the example above, you just get the following
> paragraph (or tagged block). If you want a multi-paragraph block, use
> indentation, like so:
> 
> [note]
>   This is a note.
> 
>   This is a new paragraph in the note.
> 
>   * What the heck,
>     now we have a bulleted list item in the note too.
> 
> Sometimes those tags need attributes. We can add them the same way we do
> for opening XML tags:
> 
> [note style="tip"]
> 
> Except let's go ahead and not require quotes if there's no space.
> 
> [note style=tip]
> 
> Except while we're at it, let's go ahead and introduce a shortcut for
> style hints, because they get used a bunch.
> 
> [note.tip]
> 
> And for IDs (pending block IDs making it into Mallard 1.1):
> 
> [note#thisnoteid]
> 
> If you need a block title, it goes on the line after the block opening
> with a leading ". ".
> 
> [note]
> . The note title
> This is the note

These all look good to me.

> This is stolen from asciidoc. I like stealing pieces of syntax.

Stealing is good (but don't quote me on this), especially when you want to
introduce a new syntax that people are not familiar with yet. They
will probably be reluctant to learn it if it differs too much from other
popular lightweight languages out there. 

> For terms lists, list items themselves can have titles. Do these with a
> leading "- ".
> 
> [terms]
> - term 1 title
> * term 1 description
> - term 2 title
> * term 2 description
> 
> Inline markup looks like &gui(Button Name). You can escape an ampersand
> with &&. You can escape [ with &[ if you need to use [ at the beginning
> of a line. And you can do XML entities like &aacute; and &#x00E1;. But
> the character sequence "& " is literally ampersand, space, so you don't
> have to escape it for normal prose use. Yes, this means escaping for C
> and other language code blocks. I'm open to something like CDATA.

Yes, I think that support for something like CDATA would be helpful in
cases when you need to include code examples, etc.
 
> If you want to do attributes on inlines, put them in [] between the
> element name and the content in parens. &gui[style="button"](OK). Same
> shortcuts for style hints. &gui[.button](OK). And let's add a shortcut
> for linking, so &gui[xref=ok-buttons](OK) is &gui[>ok-buttons](OK).
> 
> (I'm open to suggestions on changing any of these special characters.
> Some I've grown rather fond of in playing around with different ideas.
> Some are just what happen to be in some of my test files right now. Not
> sure ">" for "xref=" is my favorite, TBH.)
> 
> Then there's info elements. I don't have this fully fleshed out yet, but
> I'd like to do something like asciidoc's use of :tagname: lines after
> headers. So something like this:
> 
> :revision:[pkgversion=3.7.1 version=0.2 date=2012-11-16 status=outdated]
> :credit:[type=editor]
>   :name:(Michael Hill)
>   :email:[its:translate=no](mdhillca at gmail.com)
> 
> This needs to be fleshed out more. One problem I came across when trying
> to convert non-trivial pages was that, in info elements, we don't know
> quite as easily when we're moving into inline text. What happens when
> arbitrarily complex block and inline elements make their way into info?
> It can happen inside license elements. It's a staple of my long-dormant
> experimental glossary extension.
> 
> Extensions get tricky, because there's no telling when to implicitly
> insert a paragraph. For core Mallard, we know not to for code and screen
> blocks. But look at Mallard+TTML. You should be using the tt:p elements,
> and those get mixed content. Maybe once you leave the core namespace,
> you don't get paragraphs for free, but have to explicitly mark them with
> an opening tag [para]? Not sure yet.

TBH, I wouldn't personally worry about all of these things at this point. I
can imagine Duck being used primarily in environments where both rapid
delivery and ease of use are keys. But then again, with recent changes in
our industry, this is probably how all the publishing will end up anyway.
One more plus for Duck :)

The huge advantage for Duck syntax is that by using it, writers could
also get support for modular and topic-based content that can be easily
integrated with existing XML content, if needed. So from my POV, supporting
all the fundamental things in Mallard design such as automatic links is
important, as well as allowing people to create a simple topic (or two...)
with basic formatting (titles, paras, lists, admonitions, tables, maybe
more, maybe...), everything else (especially extensions) is probably a nice
to have. 

> For your viewing pleasure, I've attached a duck syntax conversion of a
> (possibly old and outdated) gnome-user-docs page.

Looks great. :)

Thanks,
pk