aboutsummaryrefslogtreecommitdiff
diff options
context:
space:
mode:
authormb21 <mb21@users.noreply.github.com>2017-12-27 12:33:40 +0100
committermb21 <mb21@users.noreply.github.com>2017-12-27 17:11:23 +0100
commit44e504853f62ed9383cf6e1e6dabb548637d3f53 (patch)
treea7af53a528d6257bcd8ddd38acf8bf6e2b6da881
parentd71165c8e2b348ee65c8d92d6ca5f8a24b1cfa92 (diff)
downloadpandoc-44e504853f62ed9383cf6e1e6dabb548637d3f53.tar.gz
MANUAL.txt introduce dedicated extensions section
-rw-r--r--MANUAL.txt471
1 files changed, 284 insertions, 187 deletions
diff --git a/MANUAL.txt b/MANUAL.txt
index 78bd057ed..b75385871 100644
--- a/MANUAL.txt
+++ b/MANUAL.txt
@@ -284,16 +284,9 @@ General options
(`markdown_github` provides deprecated and less accurate support
for Github-Flavored Markdown; please use `gfm` instead, unless you
need to use extensions other than `smart`.)
- If `+lhs` is appended to `markdown`, `rst`, `latex`, or
- `html`, the input will be treated as literate Haskell source: see
- [Literate Haskell support], below. Markdown
- syntax extensions can be individually enabled or disabled by
- appending `+EXTENSION` or `-EXTENSION` to the format name. So, for
- example, `markdown_strict+footnotes+definition_lists` is strict
- Markdown with footnotes and definition lists enabled, and
- `markdown-pipe_tables+hard_line_breaks` is pandoc's Markdown
- without pipe tables and with hard line breaks. See [Pandoc's
- Markdown], below, for a list of extensions and
+ Extensions can be individually enabled or disabled by
+ appending `+EXTENSION` or `-EXTENSION` to the format name.
+ See [Extensions] below, for a list of extensions and
their names. See `--list-input-formats` and `--list-extensions`,
below.
@@ -327,13 +320,10 @@ General options
unless you use extensions that do not work with `gfm`.) Note that
`odt`, `epub`, and `epub3` output will not be directed to
*stdout*; an output filename must be specified using the
- `-o/--output` option. If `+lhs` is appended to `markdown`, `rst`,
- `latex`, `beamer`, `html4`, or `html5`, the output will be
- rendered as literate Haskell source: see [Literate Haskell
- support], below. Markdown syntax extensions can be individually
- enabled or disabled by appending `+EXTENSION` or `-EXTENSION` to
- the format name, as described above under `-f`. See
- `--list-output-formats` and `--list-extensions`, below.
+ `-o/--output` option. Extensions can be individually enabled or
+ disabled by appending `+EXTENSION` or `-EXTENSION` to the format
+ name. See [Extensions] below, for a list of extensions and their
+ names. See `--list-output-formats` and `--list-extensions`, below.
`-o` *FILE*, `--output=`*FILE*
@@ -1698,6 +1688,269 @@ will be treated as a comment and ignored.
[pandoc-templates]: https://github.com/jgm/pandoc-templates
+Extensions
+==========
+
+The behavior of some of the readers and writers can be adjusted by
+enabling or disabling various extensions.
+
+An extension can be enabled by adding `+EXTENSION`
+to the format name and disabled by adding `-EXTENSION`. For example,
+`--from markdown_strict+footnotes` is strict Markdown with footnotes
+enabled, while `--from markdown-footnotes-pipe_tables` is pandoc's
+Markdown without footnotes or pipe tables.
+
+The markdown reader and writer make by far the most use of extensions.
+Extensions only used by them are therefore covered in the
+section [Pandoc's Markdown] below (See [Markdown variants] for
+`commonmark` and `gfm`.) In the following, extensions that also work
+for other formats are covered.
+
+Typography
+----------
+
+#### Extension: `smart` ####
+
+Interpret straight quotes as curly quotes, `---` as em-dashes,
+`--` as en-dashes, and `...` as ellipses. Nonbreaking spaces are
+inserted after certain abbreviations, such as "Mr."
+
+This extension can be enabled/disabled for the following formats:
+
+input formats
+: `markdown`, `commonmark`, `latex`, `mediawiki`, `org`, `rst`, `twiki`
+
+output formats
+: `markdown`, `latex`, `context`, `rst`
+
+enabled by default in
+: `markdown`, `latex`, `context` (both input and output)
+
+Note: If you are *writing* Markdown, then the `smart` extension
+has the reverse effect: what would have been curly quotes comes
+out straight.
+
+In LaTeX, `smart` means to use the standard TeX ligatures
+for quotation marks (` `` ` and ` '' ` for double quotes,
+`` ` `` and `` ' `` for single quotes) and dashes (`--` for
+en-dash and `---` for em-dash). If `smart` is disabled,
+then in reading LaTeX pandoc will parse these characters
+literally. In writing LaTeX, enabling `smart` tells pandoc
+to use the ligatures when possible; if `smart` is disabled
+pandoc will use unicode quotation mark and dash characters.
+
+Headers and sections
+--------------------
+
+#### Extension: `auto_identifiers` ####
+
+A header without an explicitly specified identifier will be
+automatically assigned a unique identifier based on the header text.
+
+This extension can be enabled/disabled for the following formats:
+
+input formats
+: `markdown`, `latex`, `rst`, `mediawiki`, `textile`
+
+output formats
+: `markdown`, `muse`
+
+enabled by default in
+: `markdown`, `muse`
+
+The algorithm used to derive the identifier from the header text is:
+
+ - Remove all formatting, links, etc.
+ - Remove all footnotes.
+ - Remove all punctuation, except underscores, hyphens, and periods.
+ - Replace all spaces and newlines with hyphens.
+ - Convert all alphabetic characters to lowercase.
+ - Remove everything up to the first letter (identifiers may
+ not begin with a number or punctuation mark).
+ - If nothing is left after this, use the identifier `section`.
+
+Thus, for example,
+
+ Header Identifier
+ ------------------------------- ----------------------------
+ `Header identifiers in HTML` `header-identifiers-in-html`
+ `*Dogs*?--in *my* house?` `dogs--in-my-house`
+ `[HTML], [S5], or [RTF]?` `html-s5-or-rtf`
+ `3. Applications` `applications`
+ `33` `section`
+
+These rules should, in most cases, allow one to determine the identifier
+from the header text. The exception is when several headers have the
+same text; in this case, the first will get an identifier as described
+above; the second will get the same identifier with `-1` appended; the
+third with `-2`; and so on.
+
+These identifiers are used to provide link targets in the table of
+contents generated by the `--toc|--table-of-contents` option. They
+also make it easy to provide links from one section of a document to
+another. A link to this section, for example, might look like this:
+
+ See the section on
+ [header identifiers](#header-identifiers-in-html-latex-and-context).
+
+Note, however, that this method of providing links to sections works
+only in HTML, LaTeX, and ConTeXt formats.
+
+If the `--section-divs` option is specified, then each section will
+be wrapped in a `div` (or a `section`, if `html5` was specified),
+and the identifier will be attached to the enclosing `<div>`
+(or `<section>`) tag rather than the header itself. This allows entire
+sections to be manipulated using JavaScript or treated differently in
+CSS.
+
+#### Extension: `ascii_identifiers` ####
+
+Causes the identifiers produced by `auto_identifiers` to be pure ASCII.
+Accents are stripped off of accented Latin letters, and non-Latin
+letters are omitted.
+
+Math Input
+----------
+
+The extensions [`tex_math_dollars`](#extension-tex_math_dollars),
+[`tex_math_single_backslash`](#extension-tex_math_single_backslash), and
+[`tex_math_double_backslash`](#extension-tex_math_double_backslash)
+are described in the section about Pandoc's Markdown.
+
+However, they can also be used with HTML input. This is handy for
+reading web pages formatted using MathJax, for example.
+
+Raw HTML/TeX
+------------
+
+The following extensions (especially how they affect Markdown
+input/output) are also described in more detail in their respective
+sections of [Pandoc's Markdown].
+
+#### [Extension: `raw_html`] {#raw_html}
+
+When converting from HTML, parse elements to raw HTML which are not
+representable in pandoc's AST.
+By default, this is disabled for HTML input.
+
+#### [Extension: `raw_tex`] {#raw_tex}
+
+Allows raw LaTeX, TeX, and ConTeXt to be included in a document.
+
+This extension can be enabled/disabled for the following formats
+(in addition to `markdown`):
+
+input formats
+: `latex`, `org`, `textile`
+
+output formats
+: `textile`
+
+#### [Extension: `native_divs`] {#native_divs}
+
+This extension is enabled by default for HTML input. This means that
+`div`s are parsed to pandoc native elements. (Alternatively, you
+can parse them to raw HTML using `-f html-native_divs+raw_html`.)
+
+When converting HTML to Markdown, for example, you may want to drop all
+`div`s and `span`s:
+
+ pandoc -f html-native_divs-native_spans -t markdown
+
+#### [Extension: `native_spans`] {#native_spans}
+
+Analogous to `native_divs` above.
+
+
+Literate Haskell support
+------------------------
+
+#### Extension: `literate_haskell` ####
+
+Treat the document as literate Haskell source.
+
+This extension can be enabled/disabled for the following formats:
+
+input formats
+: `markdown`, `rst`, `latex`
+
+output formats
+: `markdown`, `rst`, `latex`, `html`
+
+If you append `+lhs` (or `+literate_haskell`) to one of the formats
+above, pandoc will treat the document as literate Haskell source.
+This means that
+
+ - In Markdown input, "bird track" sections will be parsed as Haskell
+ code rather than block quotations. Text between `\begin{code}`
+ and `\end{code}` will also be treated as Haskell code. For
+ ATX-style headers the character '=' will be used instead of '#'.
+
+ - In Markdown output, code blocks with classes `haskell` and `literate`
+ will be rendered using bird tracks, and block quotations will be
+ indented one space, so they will not be treated as Haskell code.
+ In addition, headers will be rendered setext-style (with underlines)
+ rather than ATX-style (with '#' characters). (This is because ghc
+ treats '#' characters in column 1 as introducing line numbers.)
+
+ - In restructured text input, "bird track" sections will be parsed
+ as Haskell code.
+
+ - In restructured text output, code blocks with class `haskell` will
+ be rendered using bird tracks.
+
+ - In LaTeX input, text in `code` environments will be parsed as
+ Haskell code.
+
+ - In LaTeX output, code blocks with class `haskell` will be rendered
+ inside `code` environments.
+
+ - In HTML output, code blocks with class `haskell` will be rendered
+ with class `literatehaskell` and bird tracks.
+
+Examples:
+
+ pandoc -f markdown+lhs -t html
+
+reads literate Haskell source formatted with Markdown conventions and writes
+ordinary HTML (without bird tracks).
+
+ pandoc -f markdown+lhs -t html+lhs
+
+writes HTML with the Haskell code in bird tracks, so it can be copied
+and pasted as literate Haskell source.
+
+Note that GHC expects the bird tracks in the first column, so indentend literate
+code blocks (e.g. inside an itemized environment) will not be picked up by the
+Haskell compiler.
+
+Other extensions
+----------------
+
+#### Extension: `empty_paragraphs` ####
+
+Allows empty paragraphs. By default empty paragraphs are
+omitted.
+
+This extension can be enabled/disabled for the following formats:
+
+input formats
+: `docx`, `html`
+
+output formats
+: `markdown`, `docx`, `odt`, `opendocument`, `html`
+
+#### Extension: `amuse` ####
+
+In the `muse` input format, this enables Text::Amuse
+extensions to Emacs Muse markup.
+
+#### Extension: `citations` {#org-citations}
+
+Some aspects of [Pandoc's Markdown citation syntax](#citations) are also accepted
+in `org` input.
+
+
Pandoc's Markdown
=================
@@ -1705,11 +1958,9 @@ Pandoc understands an extended and slightly revised version of
John Gruber's [Markdown] syntax. This document explains the syntax,
noting differences from standard Markdown. Except where noted, these
differences can be suppressed by using the `markdown_strict` format instead
-of `markdown`. An extensions can be enabled by adding `+EXTENSION`
-to the format name and disabled by adding `-EXTENSION`. For example,
-`markdown_strict+footnotes` is strict Markdown with footnotes
-enabled, while `markdown-footnotes-pipe_tables` is pandoc's
-Markdown without footnotes or pipe tables.
+of `markdown`. Extensions can be enabled or disabled to specify the
+behavior more granularly. They are described in the following. See also
+[Extensions] above, for extensions that work also on other formats.
Philosophy
----------
@@ -1801,6 +2052,8 @@ pandoc does require the space.
### Header identifiers ###
+See also the [`auto_identifiers` extension](#extension-auto_identifiers) above.
+
#### Extension: `header_attributes` ####
Headers can be assigned attributes using this syntax at the end
@@ -1837,55 +2090,6 @@ is just the same as
# My header {.unnumbered}
-#### Extension: `auto_identifiers` ####
-
-A header without an explicitly specified identifier will be
-automatically assigned a unique identifier based on the header text.
-To derive the identifier from the header text,
-
- - Remove all formatting, links, etc.
- - Remove all footnotes.
- - Remove all punctuation, except underscores, hyphens, and periods.
- - Replace all spaces and newlines with hyphens.
- - Convert all alphabetic characters to lowercase.
- - Remove everything up to the first letter (identifiers may
- not begin with a number or punctuation mark).
- - If nothing is left after this, use the identifier `section`.
-
-Thus, for example,
-
- Header Identifier
- ------------------------------- ----------------------------
- `Header identifiers in HTML` `header-identifiers-in-html`
- `*Dogs*?--in *my* house?` `dogs--in-my-house`
- `[HTML], [S5], or [RTF]?` `html-s5-or-rtf`
- `3. Applications` `applications`
- `33` `section`
-
-These rules should, in most cases, allow one to determine the identifier
-from the header text. The exception is when several headers have the
-same text; in this case, the first will get an identifier as described
-above; the second will get the same identifier with `-1` appended; the
-third with `-2`; and so on.
-
-These identifiers are used to provide link targets in the table of
-contents generated by the `--toc|--table-of-contents` option. They
-also make it easy to provide links from one section of a document to
-another. A link to this section, for example, might look like this:
-
- See the section on
- [header identifiers](#header-identifiers-in-html-latex-and-context).
-
-Note, however, that this method of providing links to sections works
-only in HTML, LaTeX, and ConTeXt formats.
-
-If the `--section-divs` option is specified, then each section will
-be wrapped in a `div` (or a `section`, if `html5` was specified),
-and the identifier will be attached to the enclosing `<div>`
-(or `<section>`) tag rather than the header itself. This allows entire
-sections to be manipulated using JavaScript or treated differently in
-CSS.
-
#### Extension: `implicit_header_references` ####
Pandoc behaves as if reference links have been defined for each header.
@@ -3028,8 +3232,6 @@ HTML, Slidy, DZSlides, S5, EPUB
command-line options selected. Therefore see [Math rendering in HTML]
above.
-This extension can be used with both `markdown` and `html` input.
-
[interpreted text role `:math:`]: http://docutils.sourceforge.net/docs/ref/rst/roles.html#math
Raw HTML
@@ -3457,33 +3659,6 @@ they cannot contain multiple paragraphs). The syntax is as follows:
Inline and regular footnotes may be mixed freely.
-Typography
-----------
-
-#### Extension: `smart` ####
-
-Interpret straight quotes as curly quotes, `---` as em-dashes,
-`--` as en-dashes, and `...` as ellipses. Nonbreaking spaces are
-inserted after certain abbreviations, such as "Mr." This
-option currently affects the input formats `markdown`,
-`commonmark`, `latex`, `mediawiki`, `org`, `rst`, and `twiki`,
-and the output formats `markdown`, `latex`, and `context`.
-It is enabled by default for `markdown`, `latex`, and `context`
-(in both input and output).
-
-Note: If you are *writing* Markdown, then the `smart` extension
-has the reverse effect: what would have been curly quotes comes
-out straight.
-
-In LaTeX, `smart` means to use the standard TeX ligatures
-for quotation marks (` `` ` and ` '' ` for double quotes,
-`` ` `` and `` ' `` for single quotes) and dashes (`--` for
-en-dash and `---` for em-dash). If `smart` is disabled,
-then in reading LaTeX pandoc will parse these characters
-literally. In writing LaTeX, enabling `smart` tells pandoc
-to use the ligatures when possible; if `smart` is disabled
-pandoc will use unicode quotation mark and dash characters.
-
Citations
---------
@@ -3746,8 +3921,6 @@ TeX math, and anything between `\[` and `\]` to be interpreted
as display TeX math. Note: a drawback of this extension is that
it precludes escaping `(` and `[`.
-This extension can be used with both `markdown` and `html` input.
-
#### Extension: `tex_math_double_backslash` ####
Causes anything between `\\(` and `\\)` to be interpreted as inline
@@ -3790,12 +3963,6 @@ simply skipped (as opposed to being parsed as paragraphs).
Makes all absolute URIs into links, even when not surrounded by
pointy braces `<...>`.
-#### Extension: `ascii_identifiers` ####
-
-Causes the identifiers produced by `auto_identifiers` to be pure ASCII.
-Accents are stripped off of accented Latin letters, and non-Latin
-letters are omitted.
-
#### Extension: `mmd_link_attributes` ####
Parses multimarkdown style key-value attributes on link
@@ -3839,12 +4006,6 @@ in several respects:
we must either disallow lazy wrapping or require a blank line between
list items.
-#### Extension: `empty_paragraphs` ####
-
-Allows empty paragraphs. By default empty paragraphs are
-omitted. This affects the `docx` reader and writer, the
-`opendocument` and `odt` writer, and all HTML-based readers and writers.
-
Markdown variants
-----------------
@@ -3878,34 +4039,21 @@ variants are supported:
: `raw_html`, `shortcut_reference_links`,
`spaced_reference_links`.
-We also support `gfm` (GitHub-Flavored Markdown) as a set of
-extensions on `commonmark`:
+We also support `commonmark` and `gfm` (GitHub-Flavored Markdown,
+which is implemented as a set of extensions on `commonmark`).
+
+Note, however, that `commonmark` and `gfm` have limited support
+for extensions. Only those listed below (and `smart` and
+`raw_tex`) will work. The extensions can, however, all be
+individually disabled.
+Also, `raw_tex` only affects `gfm` output, not input.
+`gfm` (GitHub-Flavored Markdown)
: `pipe_tables`, `raw_html`, `fenced_code_blocks`, `auto_identifiers`,
`ascii_identifiers`, `backtick_code_blocks`, `autolink_bare_uris`,
`intraword_underscores`, `strikeout`, `hard_line_breaks`, `emoji`,
`shortcut_reference_links`, `angle_brackets_escapable`.
- These can all be individually disabled. Note, however, that
- `commonmark` and `gfm` have limited support for extensions:
- extensions other than those listed above (and `smart` and
- `raw_tex`) will have no effect on `commonmark` or `gfm`.
- And `raw_tex` only affects `gfm` output, not input.
-
-Extensions with formats other than Markdown
--------------------------------------------
-
-Some of the extensions discussed above can be used with formats
-other than Markdown:
-
-* `auto_identifiers` can be used with `latex`, `rst`, `mediawiki`,
- and `textile` input (and is used by default).
-
-* `tex_math_dollars`, `tex_math_single_backslash`, and
- `tex_math_double_backslash` can be used with `html` input.
- (This is handy for reading web pages formatted using MathJax,
- for example.)
-
Producing slide shows with pandoc
=================================
@@ -4257,57 +4405,6 @@ with the `src` attribute. For example:
</source>
</audio>
-Literate Haskell support
-========================
-
-If you append `+lhs` (or `+literate_haskell`) to an appropriate input or output
-format (`markdown`, `markdown_strict`, `rst`, or `latex` for input or output;
-`beamer`, `html4` or `html5` for output only), pandoc will treat the document as
-literate Haskell source. This means that
-
- - In Markdown input, "bird track" sections will be parsed as Haskell
- code rather than block quotations. Text between `\begin{code}`
- and `\end{code}` will also be treated as Haskell code. For
- ATX-style headers the character '=' will be used instead of '#'.
-
- - In Markdown output, code blocks with classes `haskell` and `literate`
- will be rendered using bird tracks, and block quotations will be
- indented one space, so they will not be treated as Haskell code.
- In addition, headers will be rendered setext-style (with underlines)
- rather than ATX-style (with '#' characters). (This is because ghc
- treats '#' characters in column 1 as introducing line numbers.)
-
- - In restructured text input, "bird track" sections will be parsed
- as Haskell code.
-
- - In restructured text output, code blocks with class `haskell` will
- be rendered using bird tracks.
-
- - In LaTeX input, text in `code` environments will be parsed as
- Haskell code.
-
- - In LaTeX output, code blocks with class `haskell` will be rendered
- inside `code` environments.
-
- - In HTML output, code blocks with class `haskell` will be rendered
- with class `literatehaskell` and bird tracks.
-
-Examples:
-
- pandoc -f markdown+lhs -t html
-
-reads literate Haskell source formatted with Markdown conventions and writes
-ordinary HTML (without bird tracks).
-
- pandoc -f markdown+lhs -t html+lhs
-
-writes HTML with the Haskell code in bird tracks, so it can be copied
-and pasted as literate Haskell source.
-
-Note that GHC expects the bird tracks in the first column, so indentend literate
-code blocks (e.g. inside an itemized environment) will not be picked up by the
-Haskell compiler.
-
Syntax highlighting
===================