diff options
-rw-r--r-- | changelog.md | 355 |
1 files changed, 355 insertions, 0 deletions
diff --git a/changelog.md b/changelog.md index 34a3002ce..b13bf1fd4 100644 --- a/changelog.md +++ b/changelog.md @@ -1,5 +1,360 @@ # Revision history for pandoc +## pandoc 2.12 (UNRELEASED -- PROVISIONAL) + + * Add new unexported module Text.Pandoc.XML.Light, as well + as Text.Pandoc.XML.Light.Types, Text.Pantoc.XML.Light.Proc, + Text.Pandoc.XML.Light.Output. (Closes #6001, #6565, #7091). + + This module exports definitions of `Element` and `Content` + that are isomorphic to xml-light's, but with Text + instead of String. This allows us to keep most of the code in existing + readers that use xml-light, but avoid lots of unnecessary allocation. + + We also add versions of the functions from xml-light's + Text.XML.Light.Output and Text.XML.Light.Proc that operate on our + modified XML types, and functions that convert xml-light types to our + types (since some of our dependencies, like texmath, use xml-light). + + We export functions that use xml-conduit's parser to produce an + `Element` or `[Content]`. This allows existing pandoc code to use + a better parser without much modification. + + The new parser is used in all places where xml-light's parser was + previously used. Benchmarks show a significant performance improvement + in parsing XML-based formats (with docbook, opml, jats, and docx + almost twice as fast, odt and fb2 more than twice as fast). + + In addition, the new parser gives us better error reporting than + xml-light. We report XML errors, when possible, using the new + `PandocXMLError` constructor in `PandocError`. + + These changes revealed the need for some changes in the tests. The + docbook-reader.docbook test lacked definitions for the entities it used; + these have been added. And the docx golden tests have been updated, + because the new parser does not preserve the order of attributes. + + * Text.Pandoc.App + + + Add `parseOptionsFromArgs` [API change, new exported function]. + + * Text.Pandoc.Citeproc.BibTeX + + + `Text.Pandoc.Citeproc.writeBibTeXString` now returns + `Doc Text` instead of `Text` (#7068). + + Correctly handle `pages` (= `page` in CSL) (#7067). + + Correctly handle BibLaTeX `langid` (= `language` in CSL, #7067). + + In BibTeX output, protect foreign titles since there's no language + field (#7067). + + Clean up BibTeX parsing (#7049). Previously there was a messy code + path that gave strange results in some cases, not passing through raw + tex but trying to extract a string content. This was an artefact of + trying to handle some special bibtex-specific commands in the BibTeX + reader. Now we just handle these in the LaTeX reader and simplify + parsing in the BibTeX reader. This does mean that more raw tex will + be passed through (and currently this is not sensitive to the + `raw_tex` extension; this should be fixed). + + * Text.Pandoc.Citeproc.MetaValue + + + Correctly parse "raw" date value in markdown references metadata. + (See jgm/citeproc#53.) + + * Text.Pandoc.Class + + + Add `getTimestamp` [API change]. This attempts to read the + `SOURCE_DATE_EPOCH` environment variable and parse a UTC time + from it (treating it as a unix date stamp, see + https://reproducible-builds.org/specs/source-date-epoch/). If the + variable is not set or can't be parsed as a unix date stamp, then the + function returns the current date. + + * Text.Pandoc.Error + + + Remove unused variables (Albert Krewinkel) + + Export `renderError` [API change]. + + Refactor `handleError` to use `renderError`. This allows us render + error messages without exiting. + + * Text.Pandoc.Extensions + + + `Ext_task_lists` is now supported by org (and turned + on by default) (Albert Krewinkel, #6336). + + Remove `Ext_fenced_code_attributes` from allowed commonmark attributes + (#7097). This attribute was listed as allowed, but it didn't actually + do anything. Use `attributes` for code attributes and more. + + * Lua subsystem: + + + Always load built-in Lua scripts from default data-dir (Albert + Krewinkel). The Lua modules `pandoc` and `pandoc.List` are now always + loaded from the system's default data directory. Loading from a + different directory by overriding the default path, e.g. via + `--data-dir`, is no longer supported to avoid unexpected behavior + and to address security concerns. + + Add module "pandoc.path" (Albert Krewinkel, #6001, #6565). + The module allows to work with file paths in a convenient and + platform-independent manner. + + * Text.Pandoc.PDF + + + Disable `smart` extension when building PDF via LaTeX. + This is to prevent accidental creation of ligatures like + `` ?` `` and `` !` `` (especially in languages with quotations like + German), and similar ligature issues. (See jgm/citeproc#54.) + + * DocBook reader: + + + Avoid expensive tree normalization step, as it is not necessary + with the new XML parser. + + Support `informalfigure` (#7079) (Nils Carlson). + + * Docx reader: + + + Use Map instead of list for Namespaces. This gives a speedup of + about 5-10%. With this and the XML parsing changes, the docx reader + is now about twice as fast as in the previous release. + + * HTML reader: + + + Small performance tweaks. + + Also, remove exported class `NamedTag(..)` [API change]. This was just + intended to smooth over the transition from String to Text and is no + longer needed. + + As a result, the functions `isInlineTag` and `isBlockTag` + are no longer polymorphic; they apply to a `Tag Text` [API change]. + + Do a lookahead to find the right parser to use. This takes + benchmarks from 34ms to 23ms, with less allocation. + + Fix bad handling of empty `src` attribute in `iframe` (#7099). + If `src` is empty, we simply skip the `iframe`. + If `src` is invalid or cannot be fetched, we issue a warning + nd skip instead of failing with an error. + + * JATS reader: + + + Avoid tree normalization, which is no longer necessary given the + new XML parser. + + * LaTeX reader: + + + Code cleanup, removing some unnecessary things. + + Rewrite `withRaw` so it doesn't rely on fragile assumptions + about token positions (which break when macros are expanded) + (#7092). This requires the addition of `sEnableWithRaw` and + `sRawTokens` in `LaTeXState`, and a new combinator `disablingWithRaw` + to disable collecting of raw tokens in certain contexts. + Add `parseFromToks` to Text.Pandoc.Readers.LaTeX.Parsing. + Fix parsing of single character tokens so it doesn't mess + up the new raw token collecting. These changes slightly increase + allocations and have a small performance impact. + + Handle some bibtex/biblatex-specific commands that used to be + dealt with in pandoc-citeproc (#7049). + + Optimize `satisfyTok`, avoiding unnecessary macro expansion steps. + Benchmarks after this change show 2/3 of the run time and 2/3 of the + allocation of the Feb. 10 benchmarks. + + Removed `sExpanded` in state. This isn't actually needed and checking + it doesn't change anything. + + Improve `braced'`. Remove the parameter, have it parse the + opening brace, and make it more efficient. + + * Markdown reader: + + + Improved handling of mmd link attributes in references (#7080). + Previously they only worked for links that had titles. + + * OPML reader: + + + Avoid tree normalization, which is no longer necessary with the + new XML parser. + + * ODT reader: + + + Finer-grained errors on parse failure (#7091). + + Give more information if the zip container can't be unpacked. + + * Org reader: + + + Support `task_lists` extension (Albert Krewinkel, #6336). + + Fix bug in org-ref citation parsing (Albert Krewinkel, #7101). + The org-ref syntax allows to list multiple citations separated by + comma. Previously commas were accepted as part of the citation id, + so all citation lists were parsed as one single citation. + + * RST reader: + + + Use `getTimestamp` instead of `getCurrentTime` to fetch timestamp. + Setting `SOURCE_DATE_EPOCH` will allow reproducible builds. + + RST reader: fix handling of header in CSV tables (#7064). + The interpretation of this line is not affected by the delim option. + + * Jira reader: + + + Modified the Doc parser to skip leading blank lines. This fixes + parsing of documents which start with multiple blank lines (#7095). + + Prevent URLs within link aliases to be treated as autolinks + (#6944). + + * Text.Pandoc.Shared + + + Remove formerly exported functions that are no longer used in the + code base: `splitByIndices`, `splitStringByIndicies`, `substitute`, + and `underlineSpan` (which had been deprecated in April 2020) + [API change]. + + Export `handleTaskListItem` (Albert Krewinkel) [API change]. + + * BibTeX writer: + + + BibTeX writer: use doclayout and doctemplate. This change allows + bibtex/biblatex output to wrap as other formats do, + depending on the settings of `--wrap` and `--columns` (#7068). + + * CSL JSON writer: + + + Output `[]` if no references in input, instead of raising a + PandocAppError as before. + + * Docx writer: + + + Use `getTimestamp` instead of `getCurrentTime` for timestamp. + Setting `SOURCE_DATE_EPOCH` will allow reproducible builds. + + * Text.Pandoc.Writers.EPUB + + + Use `getTimestamp` instead of `getCurrentTime` for timestamp. + Setting `SOURCE_DATE_EPOCH` will allow reproducible builds (#7093). + This does not suffice to fully enable reproducible in EPUB, since + a unique id is still being generated for each build. + + Support `belongs-to-collection` metadata (#7063) (Nick Berendsen). + + * JATS writer: + + + Escape special chars in reference elements (Albert Krewinkel). + Prevents the generation of invalid markup if a citation element + contains an ampersand or another character with a special meaning + in XML. + + * LaTeX writer: + + + Adjust hypertargets to beginnings of paragraphs (#7078). + Use `\vadjust pre` so that the hypertarget takes you to the beginning + of the paragraph rather than one line down. + This makes a particular difference for links to citations using + `--citeproc` and `link-citations: true`. + + Change BCP47 lang tag from `jp` to `ja` (Mauro Bieg, #7047). + + * Markdown writer: + + + Handle math right before digit. We insert an HTML comment to + avoid a `$` right before a digit, which pandoc will not recognize + as a math delimiter. + + * ODT writer: + + + Use `getTimestamp` instead of `getCurrentTime` for timestamp. + Setting `SOURCE_DATE_EPOCH` will allow reproducible builds. + + Update default ODT style (Lorenzo). Previously, the "First paragraph" + style inherited from "Standard" but not from "Text body." Now + it is adjusted to inherit from "Text body", to avoid some ugly + spacing issues. It may be necessary to update a custom `reference.odt` + in light of this change. + + * Org writer: + + + Support `task_lists` extension (Albert Krewinkel, #6336). + + * Pptx writer: + + + Use `getTimestamp` instead of `getCurrentTime` for timestamp. + Setting `SOURCE_DATE_EPOCH` will allow reproducible builds. + + * JATS templates: tag `author.name` as `string-name` (Albert Krewinkel). + The partitioning the components of a name into surname, given names, + etc. is not always possible or not available. Using `author.name` + allows to give the full name as a fallback to be used when + `author.surname` is not available. + + * Add default templates for bibtex and biblatex, so that + the variables `header-include`, `include-before`, `include-after` + (or alternatively the command line options + `--include-in-header`, `--include-before-body`, `--include-after-body`) + may be used. + + * LaTeX template: Update to iftex package (#7073) (Andrew Dunning) + + * revealjs template: Add 'center' option for vertical slide centering. + (maurerle, #7104). + + * Text.Pandoc.XML: Improve efficiency of `fromEntities`. + + * Test suite: a more robust way of testing the executable. + Many of our tests require running the pandoc executable. This is + problematic for a few different reasons. First, cabal-install will + sometimes run the test suite after building the library but before + building the executable, which means the executable isn't in place for + the tests. One can work around that by first building, then building and + running the tests, but that's fragile. Second, we have to find the + executable. So far, we've done that using a function `findPandoc` that + attempts to locate it relative to the test executable (which can be + located using findExecutablePath). But the logic here is delicate and + work with every combination of options. To solve both problems, we add + an `--emulate` option to the `test-pandoc` executable. When `--emulate` + occurs as the first argument passed to `test-pandoc`, the program simply + emulates the regular pandoc executable, using the rest of the arguments + (after `--emulate`). Thus, `test-pandoc --emulate -f markdown -t latex` + is just like `pandoc -f markdown -t latex`. + Since all the work is done by library functions, implementing this + emulation just takes a couple lines of code and should be entirely + reliable. With this change, we can test the pandoc executable by running + the test program itself (locatable using `findExecutablePath`) with the + `--emulate` option. This removes the need for the fragile `findPandoc` + step, and it means we can run our integration tests even when we're just + building the library, not the executable. [Note: part of this change + involved simplifying some complex handling to set environment variables + for dynamic library paths. I have tested a build with + `--enable-dynamic-executable`, and it works, but further testing may be + needed.] + + * MANUAL.txt + + + MANUAL: block-level formatting is not allowed in line blocks (#7107). + + Clarify `tex_math_dollars` extension. Note that no blank lines + are allowed between the delimiters in display math. + + Add MANUAL section on reproducible builds. + + Document no template fallback for absolute path (#7077, Nixon + Enraght-Moony.) + + Improve docs for cite-method. + + Update README and man page. + + * Makefile: in `make bench`, create CSV files for comparison and compare + against previous benchmark run. Add timestamp to CSV filenames. + + * doc/lua-filters.md: improve documentation for + `pandoc.mediabag.insert`, `pandoc.mediabag.fetch`, + `directory`, `normalize` (Albert Krewinkel). + + * Allow base64-bytestring-1.2.* (Dmitrii Kovanikov) + + * Require jira-wiki-markup 1.3.3 (Albert Krewinkel) + + * Require citeproc 0.3.0.7, which correctly titlecases when titles + contain non-ASCII characters. + + * Avoid unnecessary use of NoImplicitPrelude pragma (#7089) (Albert + Krewinkel) + + * Benchmarks + + + Use the lighter-weight tasty-bench instead of criterion. + + Run writer benchmarks for binary formats too. + + Alphabetize benchmarks. + + Don't run benchmarks for bibliography formats + (yet; we need a special input for them). + + Show allocation data + + Clean up benchmark code. + + Allow specifying patterns using `-p blah'. + + + ## pandoc 2.11.4 (2021-01-22) * Add `biblatex`, `bibtex` as output formats (closes #7040). |