pandoc (2.5) * Text.Pandoc.App: split into several unexported submodules (Albert Krewinkel): Text.Pandoc.App.FormatHeuristics, Text.Pandoc.App.Opt, Text.Pandoc.App.CommandLineOptions, Text.Pandoc.App.OutputSettings. This is motivated partly by the desire to reduce recompilations when something is modified, since App previously depended on virtually every other module. * Text.Pandoc.Extensions + Semantically, `gfm_auto_identifiers` is now a modifier of `auto_identifiers`; for identifiers to be set, `auto_identifiers` must be turned on, and then the type of identifier produced depends on `gfm_auto_identifiers` and `ascii_identifiers` are set. Accordingly, `auto_identifiers` is now added to `githubMarkdownExtensions` (#5057). + Remove `ascii_identifiers` from `githubMarkdownExtensions`. GitHub doesn't seem to strip non-ascii characters any more. * Text.Pandoc.Lua.Module.Utils (Albert Krewinkel) + Test AST object equality via Haskell (#5092). Equality of Lua objects representing pandoc AST elements is tested by unmarshalling the objects and comparing the result in Haskell. A new function `equals` which performs this test has been added to the `pandoc.utils` module. + Improve stringify. Meta value strings (MetaString) and booleans (MetaBool) are now converted to the literal string and the lowercase boolean name, respectively. Previously, all values of these types were converted to the empty string. * Text.Pandoc.Parsing: Remove Functor and Applicative constraints where Monad already exists (Alexander Krotov). * Text.Pandoc.Pretty: Don't render BreakingSpace at end of line or beginning of line (#5050). * Text.Pandoc.Readers.Markdown + Fix parsing of citations, quotes, and underline emphasis after symbols. Starting with pandoc 2.4, citations, quoted inlines, and underline emphasis were no longer recognized after certain symbols, like parentheses (#5099, #5053). + In pandoc 2.4, a soft break after an abbreviation would be relocated before it to allow for insertion of a nonbreaking space after the abbreviation. This behavior is here reverted. A soft break after an abbreviation will remain, and no nonbreaking space will be added. Those who care about this issue should take care not to end lines with an abbreviation, or to insert nonbreaking spaces manually. * Text.Pandoc.Readers.FB2: Do not throw error for unknown elements in `
` (Alexander Krotov). Some libraries include custom elements in their FB2 files. * Text.Pandoc.Readers.HTML + Allow `tfoot` before body rows (#5079). + Parse `` as a Span with class "small" (#5080). + Allow thead containing a row with `td` rather than `th` (#5014). * Text.Pandoc.Readers.LaTeX + Cleaned up handling of dimension arguments. Allow decimal points, preceding space. + Don't allow arguments for verbatim, etc. + Allow space before bracketed options. + Allow optional arguments after `\\` in tables. + Improve parsing of `\tiny`, `\scriptsize`, etc. Parse as raw, but know that these font changing commands take no arguments. * Text.Pandoc.Readers.Muse + Trim whitespace before parsing grid table cells (Alexander Krotov). + Add grid tables support (Alexander Krotov). * Text.Pandoc.Shared + For bibliography match Div with id `refs`, not class `references`. This was a mismatch between pandoc's docx, epub, latex, and markdown writers and the behavior of pandoc-citeproc, which actually looks for a div with id `refs` rather than one with class `references`. + Exactly match GitHub's identifier generating algorithm (#5057). + Add parameter for `Extensions` to `uniqueIdent` and `inlineListToIdentifier` (#5057). [API change] This allows these functions to be sensitive to the settings of `Ext_gfm_auto_identifiers` and `Ext_ascii_identifiers`, and allows us to use `uniqueIdent` in the CommonMark reader, replacing custom code. It also means that `gfm_auto_identifiers` can now be used in all formats. * Text.Pandoc.Writers.AsciiDoc + Use `.`+ as list markers to support nested ordered lists (#5087). + Support list number styles (#5089). + Render Spans using `[#id .class]#contents#` (#5080). * Text.Pandoc.Writers.CommonMark + Respect `--ascii` (#5043, quasicomputational). + Make sure `--ascii` affects quotes, super/subscript. * Text.Pandoc.Writers.Docx + Fix bookmarks to headers with long titles (#5091). Word has a 40 character limit for bookmark names. In addition, bookmarks must begin with a letter. Since pandoc's auto-generated identifiers may not respect these constraints, some internal links did not work. With this change, pandoc uses a bookmark name based on the SHA1 hash of the identifier when the identifier isn't a legal bookmark name. + Add bookmarks to code blocks (Nikolay Yakimov). + Add bookmarks to images (Nikolay Yakimov). + Refactor common bookmark creation code into a function (Nikolay Yakimov). * Text.Pandoc.Writers.EPUB: Handle calibre metadata (#5098). Nodes of the form are now included from an epub XML metadata file. You can also include this information in your YAML metadata, like so: calibre: series: Classics on War and Policitics In addition, ibooks-specific metadata can now be included via an XML file. (Previously, it could only be included via YAML metadata, see #2693.) * Text.Pandoc.Writers.HTML: Use plain `"` instead of `"` outside of attributes. * Text.Pandoc.Writers.ICML: Consolidate adjacent strings, inc. spaces. This avoids splitting up the output unnecessarily into separate elements. * Text.Pandoc.Writers.LaTeX: Don't emit `[<+->]` unless beamer output, even if `writerIncremental` is True (#5072). * Text.Pandoc.Writers.Muse (Alexander Krotov). + Output tables as grid tables if they have multi-line cells. + Indent simple tables only on the top level. + Output tables with one column as grid tables. + Add support for `--reference-location`. + Internal improvements. * Text.Pandoc.Writers.OpenDocument: Fix list indentation (Nils Carlson, #5095). This was a regression in pandoc 2.4. * Text.Pandoc.Writers.RTF: Fix warnings for skipped raw inlines. * Text.Pandoc.Writers.Texinfo: Add blank line before `@menu` section (#5055). * Text.Pandoc.XML: in `toHtml5Entities`, prefer shorter entities when there are several choices for a particular character. * data/abbreviations + Add additional abbreviations (Andrew Dunning) Many of these borrowed from the Chicago Manual of Style 10.42, 'Scholarly abbreviations'. * Templates + Asciidoc template: add :lang: to title header is lang is set in metadata (#5088). * pandoc.cabal: Add cabal flag `derive_json_via_th` (Albert Krewinkel) Disabling the flag will cause derivation of ToJSON and FromJSON instances via GHC Generics instead of Template Haskell. The flag is enabled by default, as deriving via Generics can be slow (see #4083). * trypandoc: + Tweaked drop-down lists. + Put link to site in footer. + Preselect output format. + Update on change of in or out format. + Add man input format. * MANUAL.txt: + Fix outdated description of latex_macros extension. + Clarified placement of bibliography. + Added "A note on security." + Fix note on curly brace syntx for locators. + Document new explicit syntax for citeproc locators. + Remove confusing cross-links for some extensions. + Don't put pandoc in code ticks in heading. + Document that `--ascii` works for gfm and commonmark too. + Add `man` to `--from` options. * doc/customizing-pandoc.md: various improvements (Mauro Bieg). pandoc (2.4) [new features] * New input format `man` (Yan Pashkovsky, John MacFarlane). [behavior changes] * `--ascii` is now implemented in the writers, not in Text.Pandoc.App, via the new `writerPreferAscii` field in `WriterOptions`. Now the `write*` functions for Docbook, HTML, ICML, JATS, LaTeX, Ms, Markdown, and OPML are sensitive to `writerPreferAscii`. Previously the to-ascii translation was done in Text.Pandoc.App, and thus not available to those using the writer functions directly. * `--ascii` now works with Markdown output. HTML5 character reference entities are used. * `--ascii` now works with LaTeX output. 100% ASCII output can't be guaranteed, but the writer will use commands like `\"{a}` and `\l` whenever possible, to avoid emiting a non-ASCII character. * For HTML5 output, `--ascii` now uses HTML5 character reference entities rather than numerical entities. * Improved detection of format based on extension (in Text.Pandoc.App). We now ensure that if someone tries to convert a file for a format that has a pandoc writer but not a reader, it won't just default to markdown. * Add viz. to abbreviations file (#5007, Nick Fleisher). * AsciiDoc writer: always use single-line section headers, instead of the old underline style (#5038). Previously the single-line style would be used if `--atx-headers` was specified, but now it is always used. * RST writer: Use simple tables when possible (#4750). * CommonMark (and gfm) writer: Add plain text fallbacks. (#4528, quasicomputational). Previously, the writer would unconditionally emit HTML output for subscripts, superscripts, strikeouts (if the strikeout extension is disabled) and small caps, even with `raw_html` disabled. Now there are plain-text (and, where possible, fancy Unicode) fallbacks for all of these corresponding (mostly) to the Markdown fallbacks, and the HTML output is only used when `raw_html` is enabled. * Powerpoint writer: support raw openxml (Jesse Rosenthal, #4976). This allows raw openxml blocks and inlines to be used in the pptx writer. Caveats: (1) It's up to the user to write well-formed openxml. The chances for corruption, especially with such a brittle format as pptx, is high. (2) Because of the tricky way that blocks map onto shapes, if you are using a raw block, it should be the only block on a slide (otherwise other text might end up overlapping it). (3) The pptx ooxml namespace abbreviations are different from the docx ooxml namespaces. Again, it's up to the user to get it right. Unzipped document and ooxml specification should be consulted. * With `--katex` in HTML formats, do not use the autorenderer (#4946). We no longer surround formulas with `\(..\)` or `\[..\]`. Instead, we tell katex to convert the contents of span elements with class "math". Since math has already been identified, this avoids wasted time parsing for LaTeX delimiters. Note, however, that this may yield unexpected results if you have span elements with class "math" that don't contain LaTeX math. Also, use latest version of KaTeX by default (0.9.0). * The man writer now produces ASCII-only output, using groff escapes, for portability. * ODT writer: + Add title, author and date to metadata; any remaining metadata fields are added as `meta:user-defined` tags. + Implement table caption numbering (#4949, Nils Carlson). Captioned tables are numbered and labeled with format "Table 1: caption", where "Table" is replaced by a translation, depending on the value of `lang` in metadata. Uncaptioned tables are not enumerated. + OpenDocument writer: Implement figure numbering in captions (#4944, Nils Carlson). Figure captions are now numbered 1, 2, 3, ... The format in the caption is "Figure 1: caption" and so on (where "Figure" is replaced by a translation, depending on the value of `lang` in the metadata). Captioned figures are numbered consecutively and uncaptioned figures are not enumerated. This is necessary in order for LibreOffice to generate an Illustration Index (Table of Figures) for included figures. * RST reader: Pass through fields in unknown directives as div attributes (#4715). Support `class` and `name` attributes for all directives. * Org reader: Add partial support for `#+EXCLUDE_TAGS` option. (#4284, Brian Leung). Headers with the corresponding tags should not appear in the output. * Log warnings about missing title attributes now include a suggestion about how to fix the problem (#4909). * Lua filter changes (Albert Krewinkel): + Report traceback when an error occurs. A proper Lua traceback is added if either loading of a file or execution of a filter function fails. This should be of help to authors of Lua filters who need to debug their code. + Allow access to pandoc state (#5015). Lua filters and custom writers now have read-only access to most fields of pandoc's internal state via the global variable `PANDOC_STATE`. + Push ListAttributes via constructor (Albert Krewinkel). This ensures that ListAttributes, as present in OrderedList elements, have additional accessors (viz. `start`, `style`, and `delimiter`). + Rename ReaderOptions fields, use snake_case. Snake case is used in most variable names, using camelCase for these fields was an oversight. A metatable is added to ensure that the old field names remain functional. + Iterate over AST element fields when using `pairs`. This makes it possible to iterate over all ield names of an AST element by using a generic `for` loop with pairs`: for field_name, field_content in pairs(element) do ... end Raw table fields of AST elements should be considered an implementation detail and might change in the future. Accessing element properties should always happen through the fields listed in the Lua filter docs. Note that the iterator currently excludes the `t`/`tag` field. + Ensure that MetaList elements behave like Lists. Methods usable on Lists can also be used on MetaList objects. + Fix MetaList constructor (Albert Krewinkel). Passing a MetaList object to the constructor `pandoc.MetaList` now returns the passed list as a MetaList. This is consistent with the constructor behavior when passed an (untagged) list. * Custom writers: Custom writers have access to the global variable `PANDOC_DOCUMENT`(Albert Krewinkel, #4957). The variable contains a userdata wrapper around the full pandoc AST and exposes two fields, `meta` and `blocks`. The field content is only marshaled on-demand, performance of scripts not accessing the fields remains unaffected. [API changes] * Text.Pandoc.Options: add `writerPreferAscii` to `WriterOptions`. * Text.Pandoc.Shared: + Export `splitSentences`. This was previously duplicated in the Man and Ms writers. + Add `ToString` typeclass (Alexander Krotov). * New exported module Text.Pandoc.Filter (Albert Krewinkel). * Text.Pandoc.Parsing + Generalize `gridTableWith` to any `Char` Stream (Alexander Krotov). + Generalize `readWithM` from `[Char]` to any `Char` Stream that is a `ToString` instance (Alexander Krotov). * New exposed module Text.Pandoc.Filter (Albert Krewinkel). * Text.Pandoc.XML: add `toHtml5Entities`. * New exported module Text.Pandoc.Readers.Man (Yan Pashkovsky, John MacFarlane). * Text.Pandoc.Writers.Shared + Add exported functions `toSuperscript` and `toSubscript` (quasicomputational, #4528). + Remove exported functions `metaValueToInlines`, `metaValueToString`. Add new exported functions `lookupMetaBool`, `lookupMetaBlocks`, `lookupMetaInlines`, `lookupMetaString`. Use these whenever possible for uniformity in writers (Mauro Bieg, #4907). (Note that removed function `metaValueToInlines` was in previous released versions.) + Add `metaValueToString`. * Text.Pandoc.Lua + Expose more useful internals (Albert Krewinkel): - `runFilterFile` to run a Lua filter from file; - data type `Global` and its constructors; and - `setGlobals` to add globals to a Lua environment. This module also contains `Pushable` and `Peekable` instances required to get pandoc's data types to and from Lua. Low-level Lua operation remain hidden in Text.Pandoc.Lua. + Rename `runPandocLua` to `runLua` (Albert Krewinkel). + Remove `runLuaFilter`, merging this into Text.Pandoc.Filter.Lua's `apply` (Albert Krewinkel). [bug fixes and under-the-hood improvements] * Text.Pandoc.Parsing + Make `uri` accept any stream with Char tokens (Alexander Krotov). + Rewrite `uri` without `withRaw` (Alexander Krotov). + Generalize `parseFromString` and `parseFromString'` to any streams with Char token (Alexander Krotov) + Rewrite `nonspaceChar` using `noneOf` (Alexander Krotov) * Text.Pandoc.Shared: Reimplement `mapLeft` using `Bifunctor.first` (Alexander Krotov). * Text.Pandoc.Pretty: Simplify `Text.Pandoc.Pretty.offset` (Alexander Krotov). * Text.Pandoc.App + Work around HXT limitation for --syntax-definition with windows drive (#4836). + Always preserve tabs for man format. We need it for tables. + Split command line parsing code into a separate unexported module, Text.Pandoc.App.CommandLineOptions (Albert Krewinkel). * Text.Pandoc.Readers.Roff: new unexported module for tokenizing roff documents. * New unexported module Text.Pandoc.RoffChar, provided character escape tables for roff formats. * Text.Pandoc.Readers.HTML: Fix `htmlTag` and `isInlineTag` to accept processing instructions (#3123, regression since 2.0). * Text.Pandoc.Readers.JATS: Use `foldl'` instead of `maximum` to account for empty lists (Alexander Krotov). * Text.Pandoc.Readers.RST: Don't allow single-dash separator in headerless table (#4382). * Text.Pandoc.Readers.Org: Parse empty argument array in inline src blocks (Brian Leung). * Text.Pandoc.Readers.Vimwiki: + Get rid of `F`, `runF` and `stateMeta'` in favor of `stateMeta` (Alexander Krotov). + Parse `Text` without converting to `[Char]` (Alexander Krotov). * Text.Pandoc.Readers.Creole: Parse `Text` without converting to `[Char]` (Alexander Krotov). * Text.Pandoc.Readers.LaTeX + Allow space at end of math after `\` (#5010). + Add support for `nolinkurl` command (#4992, Brian Leung). + Simplified type on `doMacros'`. + Tokenize before pulling tokens, rather than after (#4408). This has some performance penalty but is more reliable. + Make macroDef polymorphic and allow in inline context. Otherwise we can't parse something like `\lowercase{\def\x{Foo}}`. I have actually seen tex like this in the wild. + Improved parsing of `\def`, `\let`. We now correctly parse: ``` \def\bar{hello} \let\fooi\bar \def\fooii{\bar} \fooi +\fooii \def\bar{goodbye} \fooi +\fooii ``` + Improve parsing of `\def` argspec. + Skip `\PackageError` commands (see #4408). + Fix bugs omitting raw tex (#4527). The default is `-raw_tex`, so no raw tex should result unless we explicitly say `+raw_tex`. Previously some raw commands did make it through. + Moved `isArgTok` to Text.Pandoc.Readers.LaTeX.Parsing. + Moved `babelLangToBCP`, `polyglossiaLangToBCP` to new module, Text.Pandoc.Readers.LaTeX.Lang (unexported). + Simplified accent code using unicode-transforms. New dependency on unicode-transforms package for normalization. + Allow verbatim blocks ending with blank lines (#4624). + Support `breq` math environments: `dmath`, `dgroup`, `darray`. This collects some of the general-purpose code from the LaTeX reader, with the aim of making the module smaller. * Text.Pandoc.Readers.Markdown + Fix awkward soft break movements before abbreviations (#4635). + Add updateStrPos in a couple places where needed. * Text.Pandoc.Readers.Docx: Trigger bold/italic with bCs, iCs (#4947). These are variants for "complex scripts" like Arabic and are now treated just like b, i (bold, italic). * Text.Pandoc.Readers.Muse (Alexander Krotov) + Try to parse lists before trying to parse table. This ensures that tables inside lists are parsed correctly. + Forbid whitespace after opening and before closing markup elements. + Parse page breaks. + Simplify `museToPandocTable` to get rid of partial functions. + Allow footnotes to start with empty line. + Make sure that the whole text is parsed. + Allow empty headers. Previously empty headers caused parser to terminate without parsing the rest of the document. + Allow examples to be indented with tabs. + Remove indentation from examples indicated by `{{{` and `}}}`. + Fix parsing of empty cells. + Various changes to internals. + Rewrite some parsers in applicative style. + Avoid tagsoup dependency. + Allow table caption to contain `+`. * Text.Pandoc.Writers.LaTeX + Add newline if math ends in a comment (#4880). This prevents the closing delimiter from being swalled up in the comment. + With `--listings`, don't pass through org-babel attributes (#4889). + With `--biblatex`, use `\autocite` when possible (#4960). `\autocites{a1}{a2}{a3}` will not collapse the entries. So, if we don't have prefixes and suffixes, we use instead `\autocite{a1,a2,a3}`. + Fix description lists contining highlighted code (#4662). * Text.Pandoc.Writers.Man + Don't wrap `.SH` and `.SS` lines (#5019). + Avoid unnecessary `.RS`/`.RE` pair in definition lists with one paragraph definitions. + Moved common groff functions to Text.Pandoc.Writers.Groff. * Fix strong/code combination on man (should be `\f[CB]` not `\f[BC]`, see #4973). + Man writer: use `\f[R]` instead of `\f[]` to reset font (Alexander Krotov, #4973). + Move `splitSentences` to Text.Pandoc.Shared. * Text.Pandoc.Writers.Docx + Add framework for custom properties (#3034). So far, we don't actually write any custom properties, but we have the infrastructure to add this. + Handle tables in table cells (#4953). Although this is not documented in the spec, some versions of Word require a `w:p` element inside every table cell. Thus, we add one when the contents of a cell do not already include one (e.g. when a table cell contains a table). * Text.Pandoc.Writers.AsciiDoc: Prevent illegal nestings. Adjust header levels so that n+1 level headers are only found under n level headers, and the top level is 1. * Text.Pandoc.Writers.OpenDocument: Improve bullet/numbering alignment (#4385). This change eliminates the large gap we used to have between bullet and text, and also ensures that numbers in numbered lists will be right-aligned. * Text.Pandoc.Writers.ZimWiki + Number ordered list items sequentially, rather than always with 1 (#4962). + Remove extra indentation on lists (#4963). * Text.Pandoc.Writers.EPUB: Use metadata field `css` instead of `stylesheet` (Mauro Bieg, #4990). * Text.Pandoc.Writers.Markdown: Ensure blank between raw block and normal content (#4629). Otherwise a raw block can prevent a paragraph from being recognized as such. * Text.Pandoc.Writers.Ms + Removed old `escapeBar`. We don't need this now that we use `@` for math delim. + Moved common code to Text.Pandoc.Writers.Roff and to Text.Pandoc.RoffChar. + Move `splitSentences` to Text.Pandoc.Shared (to avoid duplication with the man writer). * Text.Pandoc.Writers.Muse (Alexander Krotov). + Add support for grid tables. + Fix Muse writer style. + Use `length` instead of `realLength` to calculate definition indentation. Muse parsers don't take character width into account when calculating indentation. + Do not insert newline before lists. + Use lightweight markup after `` tag. * New unexported module Text.Pandoc.Writers.Roff, providing functions useful for all roff format writers (man, ms). * Text.Pandoc.Lua + Move globals handling to separate module Text.Pandoc.Lua.Global (Albert Krewinkel). + Lua filter internals: push Shared.Element as userdata (Albert Krewinkel). Hierarchical Elements were pushed to Lua as plain tables. This is simple, but has the disadvantage that marshaling is eager: all child elements will be marshaled as part of the object. Using a Lua userdata object instead allows lazy access to fields, causing content marshaling just (but also each time) when a field is accessed. Filters which do not traverse the full element contents tree become faster as a result. [default template changes] * LaTeX template: + Add variable `hyperrefoptions` (#4925, Mathias Walter). + Add variable `romanfont`, `romanfontoptions` (#4665, OvidiusCicero). * AsciiDoc template: use single-line style for title. * revealjs template: Fix typo in the socket.io javascript plugin (#5006, Yoan Blanc). * Text.Pandoc.Lua.Util: add missing docstring to `defineHowTo` (Albert Krewinkel). * data/pandoc.lua: add datatype ListAttributes (Albert Krewinkel) * data/sample.lua: replace custom pipe function with pandoc.utils.pipe (Albert Krewinkel). [documentation improvements] * INSTALL.md + Add chromeos install instructions (#4958) (Evan Pratten). + Add note about TinyTeX. * MANUAL.txt + Change `groff` -> `roff`. + Implement `--ascii` for Markdown writer. + Clarify LaTeX image dimensions output (Mauro Bieg). * doc/customizing-pandoc.md: added skeleton (Mauro Bieg, #3288). * doc/getting-started.md: Added title to test1.md to avoid warning. * doc/lua-filters.md: merge type references into main document, fix description of Code.text (Albert Krewinkel). [build infrastructure improvements] * Makefile + Makefile: added quick-cabal, full-cabal targets. + Make .msi download targets insensitive to order of appveyor builds. * Update benchmarks for ghc 8.6.1. * pandoc.cabal: + Enable more compiler warnings (Albert Krewinkel). + Make base lower bound 4.8. + Bump upper bound for QuickCheck. + Bump upper bound for binary. + Updated version bounds for containers and haddock-library (#4974). + Added docx/docPropos/custom.xml to cabal data-files. + Require skylighting 0.7.4 (#4920). + New dependency on unicode-transforms package for normalization. * Improved .travis.yml testing and test with GHC 8.6.1 (Albert Krewinkel). * Added `tools/changelog-helper.sh`. * Added test/grofftest.sh for testing the man reader on real man pages. pandoc (2.3.1) * RST reader: + Parse RST inlines containing newlines (#4912, Francesco Occhipinti). This eliminates a regression introduced after pandoc 2.1.1, which caused inline constructions containing newlines not to be recognized. + Fix bug with internal link targets (#4919). They were gobbling up indented content underneath. * Markdown reader: distinguish autolinks in the AST. With this change, autolinks are parsed as Links with the `uri` class. (The same is true for bare links, if the `autolink_bare_uris` extension is enabled.) Email autolinks are parsed as Links with the `email` class. This allows the distinction to be represented in the AST. * Org reader: + Force inline code blocks to honor export options (Brian Leung). + Parse empty argument array in inline src blocks (Brian Leung). * Muse reader (Alexander Krotov): + Added additional tests. + Do not allow code markup to be followed by digit. + Remove heading level limit. + Simplify `