mdown \- convert mdown-formatted plain text to various formats
mdown [options] [file]
mdown converts the input text formatted in the mdown lightweight markup language into one of the supported output formats. The mdown language is a plain text language, meaning that it tries to imitate natural plain text formatting with widely understood conventions.
It is a similar to Markdown, from which it derives its name, but is geared towards producing documents for multiple media rather than writing web pages. This man page, of course, is written in mdown.
Options can be grouped. For example:
$ mdown -fbo xhtml utf8 foo.html foo.text
is equivalent to:
$ mdown -f xhtml -b utf8 -o foo.html foo.text
Select output format format. See below for a list of output formats available.
Output to ofile instead of standard output.
Set the tab width to tabwidth instead of the default 8. mdown will try to respect the tab width when it encounters a tab and does a reasonable job at it, but there is one case were the behavior may surprise you. If a tab character is read as part of the prefix to the current line, it will be considered part of the prefix as a whole. Generally speaking, it is a very bad idea to have tabs crossing prefix boundaries in your input files.
If you don’t quite get this explanation or just don’t want to bother, only use spaces and everything will be fine.
Set span element reasonable length, that is the length (in bytes) that no span element should ever exceed. This is to help detect cases of unclosed span tags. Default: 160.
Select various fancy options. flags is a string containing one or more of the following characters:
The default is tdqL.
Select between UTF-8 or XML entity output for ligature characters. Currently, this distinction is only meaningful to XHTML output.
Read environment associations from efile. You can specify multiple files, they will be read in order. See the Files section for details.
Associate environment env to class class. Environment association is not just about creating aliases. See the Environments section for details.
If class is omitted, the environment is disabled. Any environment node named env will be suppressed from output completely, including all its children.
Prepend the contents of hfile to output. Without this option, the backends produce naked code that do not have the proper headers or footers to even function. See the individual backends sections for details of which declarations or packages are needed.
For advanced usage, though, it may be useful to preprocess the header in some user-defined way then use something like cat(1) to assemble the pieces of output.
Append the contents of ffile to output. Without this option, the backends produce naked code that do not have the proper headers or footers to even function.
Read link type definitions from file. Link types let you use special prefixes on link targets that get translated to user-formatted strings. You could use this to structure your own website with a greater degree of abstraction, or to link other structured sites, like wikis, blogs, etc. See the Files section for details.
Read overriding tag table from file. This option is meant for advanced users who are willing to investigate a bit on their own.
Enable compatibility mode. For the 1.x series,
version is the minor revision number. To select mdown 1.122
behavior, use -C 122.
Output standard XHTML 1.0 Strict.
Output LaTeX for pdflatex(1). The output is intended for pdflatex(1) in that it uses hypertext references and the graphicx package to include images; graphicx is compatible with latex(1) but common image formats are not.
Output man page compatible with the tmac.an macro package. Please note that only a subset of the mdown language is supported by this backend. See the man backend section for details.
Output ZCode, for http://www.siteduzero.com.
Dump a s-expression representing the parse tree.
Dump a human-readable representation of the parse tree.
The following sections provide a reference to the mdown language. It is not intended for tutoring. Please refer to the samples for a quick overview of the syntax, or specific documentation for use cases.
mdown recognizes a variety of syntax due to it trying to imitate natural plain text formatting. Yet, constructs can be roughly categorized as follows: block and inline elements.
Inline elements are the simpler of the two. Each inline element may
contain text and they do not nest. That is, you cannot put a sample
element inside an emphasis element. This could change in the future if
I see a real need for it that would make for the increased complexity
of parsing, checking, and generating correct code. Most inline
elements are delimited by special punctuation characters enclosing the
decorated text. For example, inputting /hello/ would get you an
emphasized hello.
Block elements let you structure your text in logical units such as paragraphs, lists and tables. Some block constructs can nest; the nesting level is determined by looking at the indentation of a line, a deeper indentation means a child element. Consecutive block elements with a same indentation are children of the same parent element. The visual indentation step is currently set to 2. Whenever this manual mentions the indentation step, it is to be taken as 2 spaces wide.
mdown relies heavily on punctuation and space characters to delimit
elements in the input text; anywhere in the text, the backslash
character \ may be used to escape the next character, that is to get
its literal meaning. This can be applied to normal characters (except
for white space characters), too; even though it is no different from
inputting them directly, it does no harm, so if you are unsure whether
a character is special in a certain context, you should just escape
it.
Escaping white space works a bit differently, though. Escaping a white space simply sends it to the backend as is. Thus, escaping a newline character prevents it from making a new paragraph. But, apart from this, most backends (all but the man backend) don’t honor consecutive spaces. The use of escaped white space is discouraged and may have its behavior normalized in the future.
The following characters have a special meaning to mdown, in a context or another:
\ ^ _ / * ' " [ ] ( ) ` ~ : . | % = > - + #
as well as the following spacing characters: space, tab, line feed and form feed.
Comments are completely ignored by mdown. They are introduced by the
percent character %, at the beginning of a line, and extend to the
end of the line.
Even though all lines starting with % are ignored, please use the
%- prefix for proper comments, in order to reserve other
combinations for macro processors.
Please note that the indentation level of comments is significant. Putting a comment at a different level will end the deeper levels. This can be used to achieve some desirable side effects. See the section on consecutive blocks for more information on this use.
There are various ways to decorate a piece of text. In order to be
triggered, a decoration character must appear after a non-alphanumeric
character. It is then used to surround the decorated text. The four
decoration delimiters are /, *, + and ., which respectively
correspond to emphasis, strong emphasis, foreign phrases and
abbreviations.
Please note that abbreviations should not contain white space or punctuation and must not begin with any such character; otherwise, it will be treated as a regular period.
Prior to version 1.123, ' and " where special as well and meant
inline quotation and sample text. They were removed because ' was
felt to be highly ambiguous and redundant with ` and " was
both difficult to implement in a language-neutral way and easily
replaced with traditional quotes. The right way to quote now is to use
your preferred quotes. To get a somewhat poor compatibility with older
versions, you can use the -C option (with a version number less then
123), though it is not advised.
A single word preceded by an underscore _ or a caret ^ is
respectively taken as a subscript or superscript. The word ends at the
next white space character, excluded.
Verbatim code spans can be inserted into the document by surrounding
it with as many back-quotes ` characters as necessary in order
to make the delimiter unambiguous. A single leading (resp. ending)
space at the beginning (resp. end) of the span is removed. This lets
you input backquotes at the start of a verbatim span.
A link has the form [text](location). mdown automatically
detects if the location contains the sequence ://, in which case
it is considered an external link and output differently.
A location beginning with a hash # is a reference to some label in
the present document. A label is declared simply as
[name]. Labels must be unique among themselves.
A location beginning with a exclamation mark ! indicates that the
link is to be replaced, if possible, with the contents of the object
at location.
Instead of parentheses, angle brackets can be put around the location. This indicates that location is the name of an indirect link defined somewhere else in the document. An indirect link is defined by an entire line of the form:
[name] location
The line begins with a half indentation (one space) the same way description list items do.
When writing an indirect link, the hash, if present, must go with the definition, while the exclamation mark belongs to the reference.
If the text part of the link is empty, it is automatically filled with
the trailing part of the address, that is, the part after the first
: character in the address part, or the whole address if there is
none. This is to better support prefixed links. See the
Files section for details.
A tie, or non-breakable space, is inserted by the ~ character.
In some cases, however, mdown is able to place ties on its own, before and after some punctuation characters in French and other languages that leave a space before them.
mdown also performs other substitutions automatically, this includes
ligatures and usual combinations such as --- for an em-dash.
Which transformations to perform is selected via the -B option.
A line break can be forced by appending white space at the end of a line.
A page break can be obtained by using the character ^L. It will
render as a horizontal rule if the backend does not support paging
meaningfully.
The simplest and most useful block element is the paragraph. Paragraphs contain text and other inline elements. They are delimited by blank lines.
A variation on the paragraph is the span. It is the exact same thing except it is not usually delimited by blank lines. Wherever you can put inline contents that do not qualify for a paragraph, it is a span. Technically, it appears in some places where paragraphs are not allowed (e.g. in headings), but from the user’s point of view, its sole use is to make short lists and as the contents of table cells. Spans are intended to be rendered as adjacent boxes of text without the margins and padding that usually separate paragraphs.
Other blocks that accept only inline contents are the several
heading constructs: they all start with as many (from 1 to 5)
=’s as their level, the more =’s, the smaller the heading. If
the heading spans several source lines, subsequent lines must be
indented with one indentation step.
Another simple structure is the verbatim block which should be rendered as a copy of the contained input. Inside a verbatim block, every character is read literally. It takes the form of an indented (by one indentation step relative to the current indentation). It stretches from the first indented line (included) to the first unindented line (excluded), disregarding blank lines (which are treated as blank lines in the verbatim block itself). Trailing blank lines are suppressed.
Similar in syntax but very different in use, the quotation block
encloses a quoted text, itself written in mdown. Each line of
a quotation block is prefixed with the > character, which should
be followed by exactly one space (but this is not
enforced). Contrary to verbatim block, inside a quotation block, all
mdown constructs will be parsed and converted.
There are two families of multi-contents blocks (or collections): lists and tables. While mdown only has support simple tables, it knows about a wide variety of lists. A collection is made of consecutive items. All collections follow the principle of prefixed indented block: each item is introduced by a prefix and subsequent lines inside the item element are indented by one indentation step. Unless otherwise mentioned, the rest of the first, prefixed, line is treated as any other line of the block, minus the prefix, meaning that you can start another block right away, inside the first element, although it is not really all that useful, in practice.
Unordered lists use a - followed by a space as a prefix. Any
element is allowed inside a list item, but keep in mind that is
a text block is not followed by a blank line, it will be treated as
a span, not a paragraph. Besides, the following rule apply: if the
list contains only single-content items whose children are spans or
paragraphs, then all these children shall be converted to the type
of the child of the first item. Informally this makes simple lists
carry only spans or only paragraphs.
The following code is a simple list of fruits.
- banana
- apple
- pear
will render as:
Hierarchical lists are obtained by substituting the previous -
prefix with a + prefix. They behave the same as normal lists
except that children of hierarchical list items are never converted
to paragraphs, even if they contain blank lines (they will be
ignored). Their purpose is to help in writing trees and tree-like
structures, such as tables of contents, etc.
Ordered list items begin with a prefix of the form number. (a
number followed by a dot), followed by one space. It is otherwise
identical to an unordered list.
Description list items have two syntaxes. The first form takes
a prefix of the form :label: (some text enclosed in
colons). In this form, label can be surrounded by white space and
it will be stripped. The second form looks like label: preceded
by a space. In this form, label cannot start with white space (or
else it would be mistaken for a verbatim block). The second colon
can also be replaced with a tab, in both forms.
Contrary to the previous lists, space after the label part is ignored until a non white space character or a blank line is read. This implies that you can only put inline contents right after the label, in the first body part. Subsequent blocks inside the list item can have other types though.
Please note that if the label starts with a left angle bracket [
or any of the prefix characters mentioned in this section, it must
be escaped, or else it will be understood as a block element of some
sort. As a rule of the thumb, if the first character may be special
and the effect is unwanted, escape it.
The following example shows a list of options for some fictitious program.
*-a* Print "abr".
*-b* Print "bar".
It will produce:
Tables are the last type of collections. They are made of rows,
which are themselves made of cells. A row starts with the character
| followed by a space, as a prefix. Cells are separated by
vertical bars as well; such as vertical bar cannot start a line. The
support for tables is very limited in mdown, only inline contents
can be inserted into cells.
Although they are technically block elements, environments are different enough to deserve their own section.
Environments are named blocks. The syntax itself is very
natural. The name goes on one line, starting with # and one
space. Then the body follows, indented by one indentation space, on
subsequent lines.
The name is made of two parts, separated by a colon. The first one is used in deciding the class of the environment (see below). The second part is a caption for the block and may be used by backends to label environments.
For example, a table of contents looks like:
# Contents
+ Top
+ Topic 1
+ Sub topic 1
+ ...
+ Topic 2
+ ...
Valid names can be anything supported by the backend; generally speaking, all backends will support alphabetic characters, so you should stick with it or use associations (see below). Case matters.
Environments are translated to some sort of named block in each of the
backend (e.g. a <div> in XHTML, an environment in LaTeX, etc.). This lets you add custom
blocks in a natural way. Moreover, some formats predefine
environments. See the specific sections for details.
Yet, it may not be natural to input the name of the backend environment (thereafter called the class) directly, and it would not translate properly to other backends either. The solution is to create environment associations. An association is not quite an alias, but it can be thought of as a one-time alias. Once an environment name is associated with a string, this string will be used as the real class every time such an environment is invoked.
Only the first part of the name will be used for the purpose of finding an association. Spaces around that part are not meaningful in this process.
Using associations, you can internationalize your sources by using names that are meaningful in other languages. The other use is to allow arbitrary characters in environment names to make them more user-friendly.
The files named std.*.env contain standard environment associations
for the intrinsic environments supported by some backends.
Due to the syntax, there is no clear way to distinguish between two consecutive blocks at some level and one block at that level and one indented block from its parent level.
If you have to, you can put a comment between two blocks. The indentation level of the comment will play as a block delimiter.
See also the bugs section for implementation details.
Each of the implemented backends has its own specificities, although they will all try to be compatible with one another, in order to ease porting.
The following notes are copied verbatim from the sources and will hopefully remain synchronized with them.
The XHTML backend fully supports all language constructs. It
translates environments to <div> tags, with a CSS class matching the environment class.
The XHTML produced is naked. You will need to add proper header and footer.
Environments get translated to LaTeX environments.
The LaTeX code produced is naked, you will need to provide proper header and footer the following packages in order to run the output through pdflatex(1). All of them are standard, except ’subscript.sty’, that is needed only if you have an “old” LaTeX distribution.
The LaTeX backend supports the intrinsic environment tableofcontents
(or simply Contents if you use the standard environments) that
should contain a list of topics with links to parts of the document,
for other formats. In LaTeX, however, it will simply be discarded and
replaced by a \tableofcontents command.
The man backend is intended for writing man pages, not general *roff documents.
Special care must be taken when writing man pages. They follow specific rules and do not allow most fancy constructs other backends may support.
Remember that, as with all other backends, the output encoding is the same as the input encoding.
As a man page scope is not quite as general as any of the other backends, only a subset of the mdown language is supported. It is intended to write plain old man pages, that follow the established conventions.
There are only four levels of heading (that translate to .SH
and .SS, and some cheap emulation for further headings).
Tables are not available. I know there is tbl(1) but I’m just not willing to support that unless there is a real need.
Lists cannot be directly nested; that is, the first element in
a list item cannot be another list. This would require computing
complex prefixes to pass to .IP while the use is not all that
great.
Decorations are not available in headings, and super and subscripts are not available at all.
Neither internal links nor embedded objects are available. Internal links get their pointer part discarded and their text part emphasized. Labels are completely discarded. This is so, when translated to another format, links will be working.
Use the title (or the Title standard association) environment to
specify a .TH line for you man page. Currently, not much prevents
you from putting block contents into this environment but be warned it
will just make a buggy output.
The first \- combination in the file shall be treated as
a literal \- intended for the name line of the man page and will
be output as is.
The ZCode has no support for internal links; they are rendered as plain text for the sake of compatibility with the other backends.
Environments get translated to plain zcode tags; this lets you insert pretty much any ugly thing you want.
The following environments are supported: flottantgauche,
flottantdroit and centre, with their obvious meaning.
Please take a look at the source of this man page for a concrete
example, or the sample.text file for some random constructs. The
contents of the sample.text file is in French as it used to be
a quick reference during the development time of mdown.
mdown can accept various information from files. Besides the source file. This section describes the syntax of the various files.
The syntax of this file is line-oriented. Each line contains a class name followed by a tab and a friendly name, like this:
myclass My Class
mysuperclass My Super Class
Link addresses of the form prefix:rest are matched against
a table of prefixes for special formatting rules. This table is loaded
from link type definition files.
The syntax is as follows:
"prefix" "format"
prefix and format need to be quoted by double quotes. Standard
escape sequences \', \", \\t, \\r, \\n and \\f are
accepted inside these quotes. Also, the string %t will be replaced
by the rest, while %% can be used to output a single %.
The only default prefix supported by mdown is the empty prefix. It simply expands to the trailing part, without any decoration. This prefix should not be overriden as it is the standard mechanism through which one can put a plain URL without conflicting with any prefix.
These files should not be provided by the user. The information provided here can be useful if you are planning on hacking small modifications into mdown to support “sub-backends” (a different doctype for XML documents, or a not-too-different document class for LaTeX).
The syntax of this file is line-oriented. Each line contains a tab-separated list of fields, which are, in order: the name of the tag to override, a double-quoted string for the opening tag and another double-quoted string for the closing tag.
Every % character in those strings is a format specifier. %s will
be replaced by the text field of the node (see sources for details),
%c by the class or the induced class (for environments), %t by the
trail (the part after the first :, with white spaces at start
stripped), and %% by a single %. You can escape a character by
prepending a backslash, just like in mdown.
Instead of a string, you can put a single dot, which is equivalent to
setting the field to NULL, in C. Setting the opening tag to NULL
will disable support for this tag. Setting the closing tag to NULL
will disable output for children of the node. If both fields should be
set to NULL, only one dot must be specified.
You can use mdown -f dump on a file to get a dump of its tree
structure that will also provide the names of the tags used. See also
the plainxml.tags file for an example.
There is no way to tell between a continuation paragraph from a block and a verbatim block from its parent. Consider the following example:
- This is a paragraph.
- This is a second paragraph.
Is this a verbatim block from the parent node or the
continuation of the above list item?
This is treated as the continuation of the list item.
Due to the way > are treated, this is also true for quotations.
The different files do not share a single syntax. This may change in the future, but a conversion tool would need to be written.
Nhat Minh Lê. Please send comments and bug reports to
nhat <dot> minh <dot> le <at> gmail <dot> com.
awk(1), m4(1), pdflatex(1), man(7)