diff options
author | mb21 <mb21@users.noreply.github.com> | 2017-12-27 12:33:40 +0100 |
---|---|---|
committer | mb21 <mb21@users.noreply.github.com> | 2017-12-27 17:11:23 +0100 |
commit | 44e504853f62ed9383cf6e1e6dabb548637d3f53 (patch) | |
tree | a7af53a528d6257bcd8ddd38acf8bf6e2b6da881 | |
parent | d71165c8e2b348ee65c8d92d6ca5f8a24b1cfa92 (diff) | |
download | pandoc-44e504853f62ed9383cf6e1e6dabb548637d3f53.tar.gz |
MANUAL.txt introduce dedicated extensions section
-rw-r--r-- | MANUAL.txt | 471 |
1 files changed, 284 insertions, 187 deletions
diff --git a/MANUAL.txt b/MANUAL.txt index 78bd057ed..b75385871 100644 --- a/MANUAL.txt +++ b/MANUAL.txt @@ -284,16 +284,9 @@ General options (`markdown_github` provides deprecated and less accurate support for Github-Flavored Markdown; please use `gfm` instead, unless you need to use extensions other than `smart`.) - If `+lhs` is appended to `markdown`, `rst`, `latex`, or - `html`, the input will be treated as literate Haskell source: see - [Literate Haskell support], below. Markdown - syntax extensions can be individually enabled or disabled by - appending `+EXTENSION` or `-EXTENSION` to the format name. So, for - example, `markdown_strict+footnotes+definition_lists` is strict - Markdown with footnotes and definition lists enabled, and - `markdown-pipe_tables+hard_line_breaks` is pandoc's Markdown - without pipe tables and with hard line breaks. See [Pandoc's - Markdown], below, for a list of extensions and + Extensions can be individually enabled or disabled by + appending `+EXTENSION` or `-EXTENSION` to the format name. + See [Extensions] below, for a list of extensions and their names. See `--list-input-formats` and `--list-extensions`, below. @@ -327,13 +320,10 @@ General options unless you use extensions that do not work with `gfm`.) Note that `odt`, `epub`, and `epub3` output will not be directed to *stdout*; an output filename must be specified using the - `-o/--output` option. If `+lhs` is appended to `markdown`, `rst`, - `latex`, `beamer`, `html4`, or `html5`, the output will be - rendered as literate Haskell source: see [Literate Haskell - support], below. Markdown syntax extensions can be individually - enabled or disabled by appending `+EXTENSION` or `-EXTENSION` to - the format name, as described above under `-f`. See - `--list-output-formats` and `--list-extensions`, below. + `-o/--output` option. Extensions can be individually enabled or + disabled by appending `+EXTENSION` or `-EXTENSION` to the format + name. See [Extensions] below, for a list of extensions and their + names. See `--list-output-formats` and `--list-extensions`, below. `-o` *FILE*, `--output=`*FILE* @@ -1698,6 +1688,269 @@ will be treated as a comment and ignored. [pandoc-templates]: https://github.com/jgm/pandoc-templates +Extensions +========== + +The behavior of some of the readers and writers can be adjusted by +enabling or disabling various extensions. + +An extension can be enabled by adding `+EXTENSION` +to the format name and disabled by adding `-EXTENSION`. For example, +`--from markdown_strict+footnotes` is strict Markdown with footnotes +enabled, while `--from markdown-footnotes-pipe_tables` is pandoc's +Markdown without footnotes or pipe tables. + +The markdown reader and writer make by far the most use of extensions. +Extensions only used by them are therefore covered in the +section [Pandoc's Markdown] below (See [Markdown variants] for +`commonmark` and `gfm`.) In the following, extensions that also work +for other formats are covered. + +Typography +---------- + +#### Extension: `smart` #### + +Interpret straight quotes as curly quotes, `---` as em-dashes, +`--` as en-dashes, and `...` as ellipses. Nonbreaking spaces are +inserted after certain abbreviations, such as "Mr." + +This extension can be enabled/disabled for the following formats: + +input formats +: `markdown`, `commonmark`, `latex`, `mediawiki`, `org`, `rst`, `twiki` + +output formats +: `markdown`, `latex`, `context`, `rst` + +enabled by default in +: `markdown`, `latex`, `context` (both input and output) + +Note: If you are *writing* Markdown, then the `smart` extension +has the reverse effect: what would have been curly quotes comes +out straight. + +In LaTeX, `smart` means to use the standard TeX ligatures +for quotation marks (` `` ` and ` '' ` for double quotes, +`` ` `` and `` ' `` for single quotes) and dashes (`--` for +en-dash and `---` for em-dash). If `smart` is disabled, +then in reading LaTeX pandoc will parse these characters +literally. In writing LaTeX, enabling `smart` tells pandoc +to use the ligatures when possible; if `smart` is disabled +pandoc will use unicode quotation mark and dash characters. + +Headers and sections +-------------------- + +#### Extension: `auto_identifiers` #### + +A header without an explicitly specified identifier will be +automatically assigned a unique identifier based on the header text. + +This extension can be enabled/disabled for the following formats: + +input formats +: `markdown`, `latex`, `rst`, `mediawiki`, `textile` + +output formats +: `markdown`, `muse` + +enabled by default in +: `markdown`, `muse` + +The algorithm used to derive the identifier from the header text is: + + - Remove all formatting, links, etc. + - Remove all footnotes. + - Remove all punctuation, except underscores, hyphens, and periods. + - Replace all spaces and newlines with hyphens. + - Convert all alphabetic characters to lowercase. + - Remove everything up to the first letter (identifiers may + not begin with a number or punctuation mark). + - If nothing is left after this, use the identifier `section`. + +Thus, for example, + + Header Identifier + ------------------------------- ---------------------------- + `Header identifiers in HTML` `header-identifiers-in-html` + `*Dogs*?--in *my* house?` `dogs--in-my-house` + `[HTML], [S5], or [RTF]?` `html-s5-or-rtf` + `3. Applications` `applications` + `33` `section` + +These rules should, in most cases, allow one to determine the identifier +from the header text. The exception is when several headers have the +same text; in this case, the first will get an identifier as described +above; the second will get the same identifier with `-1` appended; the +third with `-2`; and so on. + +These identifiers are used to provide link targets in the table of +contents generated by the `--toc|--table-of-contents` option. They +also make it easy to provide links from one section of a document to +another. A link to this section, for example, might look like this: + + See the section on + [header identifiers](#header-identifiers-in-html-latex-and-context). + +Note, however, that this method of providing links to sections works +only in HTML, LaTeX, and ConTeXt formats. + +If the `--section-divs` option is specified, then each section will +be wrapped in a `div` (or a `section`, if `html5` was specified), +and the identifier will be attached to the enclosing `<div>` +(or `<section>`) tag rather than the header itself. This allows entire +sections to be manipulated using JavaScript or treated differently in +CSS. + +#### Extension: `ascii_identifiers` #### + +Causes the identifiers produced by `auto_identifiers` to be pure ASCII. +Accents are stripped off of accented Latin letters, and non-Latin +letters are omitted. + +Math Input +---------- + +The extensions [`tex_math_dollars`](#extension-tex_math_dollars), +[`tex_math_single_backslash`](#extension-tex_math_single_backslash), and +[`tex_math_double_backslash`](#extension-tex_math_double_backslash) +are described in the section about Pandoc's Markdown. + +However, they can also be used with HTML input. This is handy for +reading web pages formatted using MathJax, for example. + +Raw HTML/TeX +------------ + +The following extensions (especially how they affect Markdown +input/output) are also described in more detail in their respective +sections of [Pandoc's Markdown]. + +#### [Extension: `raw_html`] {#raw_html} + +When converting from HTML, parse elements to raw HTML which are not +representable in pandoc's AST. +By default, this is disabled for HTML input. + +#### [Extension: `raw_tex`] {#raw_tex} + +Allows raw LaTeX, TeX, and ConTeXt to be included in a document. + +This extension can be enabled/disabled for the following formats +(in addition to `markdown`): + +input formats +: `latex`, `org`, `textile` + +output formats +: `textile` + +#### [Extension: `native_divs`] {#native_divs} + +This extension is enabled by default for HTML input. This means that +`div`s are parsed to pandoc native elements. (Alternatively, you +can parse them to raw HTML using `-f html-native_divs+raw_html`.) + +When converting HTML to Markdown, for example, you may want to drop all +`div`s and `span`s: + + pandoc -f html-native_divs-native_spans -t markdown + +#### [Extension: `native_spans`] {#native_spans} + +Analogous to `native_divs` above. + + +Literate Haskell support +------------------------ + +#### Extension: `literate_haskell` #### + +Treat the document as literate Haskell source. + +This extension can be enabled/disabled for the following formats: + +input formats +: `markdown`, `rst`, `latex` + +output formats +: `markdown`, `rst`, `latex`, `html` + +If you append `+lhs` (or `+literate_haskell`) to one of the formats +above, pandoc will treat the document as literate Haskell source. +This means that + + - In Markdown input, "bird track" sections will be parsed as Haskell + code rather than block quotations. Text between `\begin{code}` + and `\end{code}` will also be treated as Haskell code. For + ATX-style headers the character '=' will be used instead of '#'. + + - In Markdown output, code blocks with classes `haskell` and `literate` + will be rendered using bird tracks, and block quotations will be + indented one space, so they will not be treated as Haskell code. + In addition, headers will be rendered setext-style (with underlines) + rather than ATX-style (with '#' characters). (This is because ghc + treats '#' characters in column 1 as introducing line numbers.) + + - In restructured text input, "bird track" sections will be parsed + as Haskell code. + + - In restructured text output, code blocks with class `haskell` will + be rendered using bird tracks. + + - In LaTeX input, text in `code` environments will be parsed as + Haskell code. + + - In LaTeX output, code blocks with class `haskell` will be rendered + inside `code` environments. + + - In HTML output, code blocks with class `haskell` will be rendered + with class `literatehaskell` and bird tracks. + +Examples: + + pandoc -f markdown+lhs -t html + +reads literate Haskell source formatted with Markdown conventions and writes +ordinary HTML (without bird tracks). + + pandoc -f markdown+lhs -t html+lhs + +writes HTML with the Haskell code in bird tracks, so it can be copied +and pasted as literate Haskell source. + +Note that GHC expects the bird tracks in the first column, so indentend literate +code blocks (e.g. inside an itemized environment) will not be picked up by the +Haskell compiler. + +Other extensions +---------------- + +#### Extension: `empty_paragraphs` #### + +Allows empty paragraphs. By default empty paragraphs are +omitted. + +This extension can be enabled/disabled for the following formats: + +input formats +: `docx`, `html` + +output formats +: `markdown`, `docx`, `odt`, `opendocument`, `html` + +#### Extension: `amuse` #### + +In the `muse` input format, this enables Text::Amuse +extensions to Emacs Muse markup. + +#### Extension: `citations` {#org-citations} + +Some aspects of [Pandoc's Markdown citation syntax](#citations) are also accepted +in `org` input. + + Pandoc's Markdown ================= @@ -1705,11 +1958,9 @@ Pandoc understands an extended and slightly revised version of John Gruber's [Markdown] syntax. This document explains the syntax, noting differences from standard Markdown. Except where noted, these differences can be suppressed by using the `markdown_strict` format instead -of `markdown`. An extensions can be enabled by adding `+EXTENSION` -to the format name and disabled by adding `-EXTENSION`. For example, -`markdown_strict+footnotes` is strict Markdown with footnotes -enabled, while `markdown-footnotes-pipe_tables` is pandoc's -Markdown without footnotes or pipe tables. +of `markdown`. Extensions can be enabled or disabled to specify the +behavior more granularly. They are described in the following. See also +[Extensions] above, for extensions that work also on other formats. Philosophy ---------- @@ -1801,6 +2052,8 @@ pandoc does require the space. ### Header identifiers ### +See also the [`auto_identifiers` extension](#extension-auto_identifiers) above. + #### Extension: `header_attributes` #### Headers can be assigned attributes using this syntax at the end @@ -1837,55 +2090,6 @@ is just the same as # My header {.unnumbered} -#### Extension: `auto_identifiers` #### - -A header without an explicitly specified identifier will be -automatically assigned a unique identifier based on the header text. -To derive the identifier from the header text, - - - Remove all formatting, links, etc. - - Remove all footnotes. - - Remove all punctuation, except underscores, hyphens, and periods. - - Replace all spaces and newlines with hyphens. - - Convert all alphabetic characters to lowercase. - - Remove everything up to the first letter (identifiers may - not begin with a number or punctuation mark). - - If nothing is left after this, use the identifier `section`. - -Thus, for example, - - Header Identifier - ------------------------------- ---------------------------- - `Header identifiers in HTML` `header-identifiers-in-html` - `*Dogs*?--in *my* house?` `dogs--in-my-house` - `[HTML], [S5], or [RTF]?` `html-s5-or-rtf` - `3. Applications` `applications` - `33` `section` - -These rules should, in most cases, allow one to determine the identifier -from the header text. The exception is when several headers have the -same text; in this case, the first will get an identifier as described -above; the second will get the same identifier with `-1` appended; the -third with `-2`; and so on. - -These identifiers are used to provide link targets in the table of -contents generated by the `--toc|--table-of-contents` option. They -also make it easy to provide links from one section of a document to -another. A link to this section, for example, might look like this: - - See the section on - [header identifiers](#header-identifiers-in-html-latex-and-context). - -Note, however, that this method of providing links to sections works -only in HTML, LaTeX, and ConTeXt formats. - -If the `--section-divs` option is specified, then each section will -be wrapped in a `div` (or a `section`, if `html5` was specified), -and the identifier will be attached to the enclosing `<div>` -(or `<section>`) tag rather than the header itself. This allows entire -sections to be manipulated using JavaScript or treated differently in -CSS. - #### Extension: `implicit_header_references` #### Pandoc behaves as if reference links have been defined for each header. @@ -3028,8 +3232,6 @@ HTML, Slidy, DZSlides, S5, EPUB command-line options selected. Therefore see [Math rendering in HTML] above. -This extension can be used with both `markdown` and `html` input. - [interpreted text role `:math:`]: http://docutils.sourceforge.net/docs/ref/rst/roles.html#math Raw HTML @@ -3457,33 +3659,6 @@ they cannot contain multiple paragraphs). The syntax is as follows: Inline and regular footnotes may be mixed freely. -Typography ----------- - -#### Extension: `smart` #### - -Interpret straight quotes as curly quotes, `---` as em-dashes, -`--` as en-dashes, and `...` as ellipses. Nonbreaking spaces are -inserted after certain abbreviations, such as "Mr." This -option currently affects the input formats `markdown`, -`commonmark`, `latex`, `mediawiki`, `org`, `rst`, and `twiki`, -and the output formats `markdown`, `latex`, and `context`. -It is enabled by default for `markdown`, `latex`, and `context` -(in both input and output). - -Note: If you are *writing* Markdown, then the `smart` extension -has the reverse effect: what would have been curly quotes comes -out straight. - -In LaTeX, `smart` means to use the standard TeX ligatures -for quotation marks (` `` ` and ` '' ` for double quotes, -`` ` `` and `` ' `` for single quotes) and dashes (`--` for -en-dash and `---` for em-dash). If `smart` is disabled, -then in reading LaTeX pandoc will parse these characters -literally. In writing LaTeX, enabling `smart` tells pandoc -to use the ligatures when possible; if `smart` is disabled -pandoc will use unicode quotation mark and dash characters. - Citations --------- @@ -3746,8 +3921,6 @@ TeX math, and anything between `\[` and `\]` to be interpreted as display TeX math. Note: a drawback of this extension is that it precludes escaping `(` and `[`. -This extension can be used with both `markdown` and `html` input. - #### Extension: `tex_math_double_backslash` #### Causes anything between `\\(` and `\\)` to be interpreted as inline @@ -3790,12 +3963,6 @@ simply skipped (as opposed to being parsed as paragraphs). Makes all absolute URIs into links, even when not surrounded by pointy braces `<...>`. -#### Extension: `ascii_identifiers` #### - -Causes the identifiers produced by `auto_identifiers` to be pure ASCII. -Accents are stripped off of accented Latin letters, and non-Latin -letters are omitted. - #### Extension: `mmd_link_attributes` #### Parses multimarkdown style key-value attributes on link @@ -3839,12 +4006,6 @@ in several respects: we must either disallow lazy wrapping or require a blank line between list items. -#### Extension: `empty_paragraphs` #### - -Allows empty paragraphs. By default empty paragraphs are -omitted. This affects the `docx` reader and writer, the -`opendocument` and `odt` writer, and all HTML-based readers and writers. - Markdown variants ----------------- @@ -3878,34 +4039,21 @@ variants are supported: : `raw_html`, `shortcut_reference_links`, `spaced_reference_links`. -We also support `gfm` (GitHub-Flavored Markdown) as a set of -extensions on `commonmark`: +We also support `commonmark` and `gfm` (GitHub-Flavored Markdown, +which is implemented as a set of extensions on `commonmark`). + +Note, however, that `commonmark` and `gfm` have limited support +for extensions. Only those listed below (and `smart` and +`raw_tex`) will work. The extensions can, however, all be +individually disabled. +Also, `raw_tex` only affects `gfm` output, not input. +`gfm` (GitHub-Flavored Markdown) : `pipe_tables`, `raw_html`, `fenced_code_blocks`, `auto_identifiers`, `ascii_identifiers`, `backtick_code_blocks`, `autolink_bare_uris`, `intraword_underscores`, `strikeout`, `hard_line_breaks`, `emoji`, `shortcut_reference_links`, `angle_brackets_escapable`. - These can all be individually disabled. Note, however, that - `commonmark` and `gfm` have limited support for extensions: - extensions other than those listed above (and `smart` and - `raw_tex`) will have no effect on `commonmark` or `gfm`. - And `raw_tex` only affects `gfm` output, not input. - -Extensions with formats other than Markdown -------------------------------------------- - -Some of the extensions discussed above can be used with formats -other than Markdown: - -* `auto_identifiers` can be used with `latex`, `rst`, `mediawiki`, - and `textile` input (and is used by default). - -* `tex_math_dollars`, `tex_math_single_backslash`, and - `tex_math_double_backslash` can be used with `html` input. - (This is handy for reading web pages formatted using MathJax, - for example.) - Producing slide shows with pandoc ================================= @@ -4257,57 +4405,6 @@ with the `src` attribute. For example: </source> </audio> -Literate Haskell support -======================== - -If you append `+lhs` (or `+literate_haskell`) to an appropriate input or output -format (`markdown`, `markdown_strict`, `rst`, or `latex` for input or output; -`beamer`, `html4` or `html5` for output only), pandoc will treat the document as -literate Haskell source. This means that - - - In Markdown input, "bird track" sections will be parsed as Haskell - code rather than block quotations. Text between `\begin{code}` - and `\end{code}` will also be treated as Haskell code. For - ATX-style headers the character '=' will be used instead of '#'. - - - In Markdown output, code blocks with classes `haskell` and `literate` - will be rendered using bird tracks, and block quotations will be - indented one space, so they will not be treated as Haskell code. - In addition, headers will be rendered setext-style (with underlines) - rather than ATX-style (with '#' characters). (This is because ghc - treats '#' characters in column 1 as introducing line numbers.) - - - In restructured text input, "bird track" sections will be parsed - as Haskell code. - - - In restructured text output, code blocks with class `haskell` will - be rendered using bird tracks. - - - In LaTeX input, text in `code` environments will be parsed as - Haskell code. - - - In LaTeX output, code blocks with class `haskell` will be rendered - inside `code` environments. - - - In HTML output, code blocks with class `haskell` will be rendered - with class `literatehaskell` and bird tracks. - -Examples: - - pandoc -f markdown+lhs -t html - -reads literate Haskell source formatted with Markdown conventions and writes -ordinary HTML (without bird tracks). - - pandoc -f markdown+lhs -t html+lhs - -writes HTML with the Haskell code in bird tracks, so it can be copied -and pasted as literate Haskell source. - -Note that GHC expects the bird tracks in the first column, so indentend literate -code blocks (e.g. inside an itemized environment) will not be picked up by the -Haskell compiler. - Syntax highlighting =================== |