pandoc - Conversion between markup formats

Age	Commit message (Collapse)	Author	Files	Lines
2021-02-17	Docx reader: use Map instead of list for Namespaces.	John MacFarlane	2	-20/+20
	This gives a speedup of about 5-10%. The reader is now approximately twice as fast as in the last release.
2021-02-16	Revert "Add T.P.XML.Light.Cursor."	John MacFarlane	1	-346/+0
	This reverts commit d8fc4971868104274881570ce9bc3d9edf0d2506.
2021-02-16	Add T.P.XML.Light.Cursor.	John MacFarlane	1	-0/+346

2021-02-16	Add orig copyright/license info for code derived from xml-light.	John MacFarlane	3	-3/+12

2021-02-16	Split up T.P.XML.Light into submodules.	John MacFarlane	4	-504/+565

2021-02-16	Rename Text.Pandoc.XMLParser -> Text.Pandoc.XML.Light...	John MacFarlane	24	-928/+1384
	..and add new definitions isomorphic to xml-light's, but with Text instead of String. This allows us to keep most of the code in existing readers that use xml-light, but avoid lots of unnecessary allocation. We also add versions of the functions from xml-light's Text.XML.Light.Output and Text.XML.Light.Proc that operate on our modified XML types, and functions that convert xml-light types to our types (since some of our dependencies, like texmath, use xml-light). Update golden tests for docx and pptx. OOXML test: Use `showContent` instead of `ppContent` in `displayDiff`. Docx: Do a manual traversal to unwrap sdt and smartTag. This is faster, and needed to pass the tests. Benchmarks: A = prior to 8ca191604dcd13af27c11d2da225da646ebce6fc (Feb 8) B = as of 8ca191604dcd13af27c11d2da225da646ebce6fc (Feb 8) C = this commit \| Reader \| A \| B \| C \| \| ------- \| ----- \| ------ \| ----- \| \| docbook \| 18 ms \| 12 ms \| 10 ms \| \| opml \| 65 ms \| 62 ms \| 35 ms \| \| jats \| 15 ms \| 11 ms \| 9 ms \| \| docx \| 72 ms \| 69 ms \| 44 ms \| \| odt \| 78 ms \| 41 ms \| 28 ms \| \| epub \| 64 ms \| 61 ms \| 56 ms \| \| fb2 \| 14 ms \| 5 ms \| 4 ms \|
2021-02-14	T.P.Error: remove unused variables	Albert Krewinkel	1	-2/+2

2021-02-13	HTML reader: fix bad handling of empty src attribute in iframe.	John MacFarlane	1	-6/+12
	- If src is empty, we simply skip the iframe. - If src is invalid or cannot be fetched, we issue a warning and skip instead of failing with an error. - Closes #7099.
2021-02-13	T.P.Error: export `renderError`.	John MacFarlane	1	-33/+72
	Refactor `handleError` to use `renderError`. This allows us render error messages without exiting.
2021-02-13	Org: support task_lists extension	Albert Krewinkel	3	-5/+54
	The tasks lists extension is now supported by the org reader and writer; the extension is turned on by default. Closes: #6336
2021-02-13	T.P.Shared: export `handleTaskListItem`. [API change]	Albert Krewinkel	1	-0/+1

2021-02-13	LaTeX reader: remove unnecessary line	John MacFarlane	1	-1/+0

2021-02-13	Remove Ext_fenced_code_attributes from allowed commonmark attributes.	John MacFarlane	1	-2/+0
	This attribute was listed as allowed, but it didn't actually do anything. Use `attributes` for code attributes and more. Closes #7097.
2021-02-12	Avoid an unnecessary withRaw.	John MacFarlane	1	-1/+4

2021-02-12	LaTeX reader improvements.	John MacFarlane	2	-22/+68
	* Rewrote `withRaw` so it doesn't rely on fragile assumptions about token positions (which break when macros are expanded). This requires the addition of `sEnableWithRaw` and `sRawTokens` in `LaTeXState`, and a new combinator `disablingWithRaw` to disable collecting of raw tokens in certain contexts. * Add `parseFromToks` to T.P.Readers.LaTeX.Parsing. * Fix parsing of single character tokens so it doesn't mess up the new raw token collecting. * These changes slightly increase allocations and have a small performance impact, but it's minor. Closes #7092.
2021-02-11	Use getTimestamp instead of getCurrentTime in writers.	John MacFarlane	5	-7/+7
	Setting SOURCE_DATE_EPOCH will allow reproducible builds. Partially addresses #7093. This does not suffice to fully enable reproducible in EPUB, since a unique id is being generated for each build.
2021-02-11	T.P.Class: Add getTimestamp [API change].	John MacFarlane	1	-2/+19
	This attempts to read the SOURCE_DATE_EPOCH environment variable and parse a UTC time from it (treating it as a unix date stamp, see https://reproducible-builds.org/specs/source-date-epoch/). If the variable is not set or can't be parsed as a unix date stamp, then the function returns the current date.
2021-02-11	Correctly parse "raw" date value in markdown references metadata.	John MacFarlane	1	-3/+5
	See jgm/citeproc#53.
2021-02-10	Add new unexported module T.P.XMLParser.	John MacFarlane	16	-87/+224
	This exports functions that uses xml-conduit's parser to produce an xml-light Element or [Content]. This allows existing pandoc code to use a better parser without much modification. The new parser is used in all places where xml-light's parser was previously used. Benchmarks show a significant performance improvement in parsing XML-based formats (especially ODT and FB2). Note that the xml-light types use String, so the conversion from xml-conduit types involves a lot of extra allocation. It would be desirable to avoid that in the future by gradually switching to using xml-conduit directly. This can be done module by module. The new parser also reports errors, which we report when possible. A new constructor PandocXMLError has been added to PandocError in T.P.Error [API change]. Closes #7091, which was the main stimulus. These changes revealed the need for some changes in the tests. The docbook-reader.docbook test lacked definitions for the entities it used; these have been added. And the docx golden tests have been updated, because the new parser does not preserve the order of attributes. Add entity defs to docbook-reader.docbook. Update golden tests for docx.
2021-02-08	ODT reader: finer-grained errors on parse failure.	John MacFarlane	1	-21/+18
	See #7091.
2021-02-08	ODT reader: give more information if zip can't be unpacked.	John MacFarlane	1	-1/+4

2021-02-08	DocBook reader: Support informalfigure (#7079)	Nils Carlson	1	-1/+3
	Add support for informalfigure.
2021-02-07	Avoid unnecessary use of NoImplicitPrelude pragma (#7089)	Albert Krewinkel	1	-1/+0

2021-02-06	Markdown reader: improved handling of mmd link attributes in references.	John MacFarlane	1	-0/+2
	Previously they only worked for links that had titles. Closes #7080.
2021-02-04	Lua filters: use same function names in Haskell and Lua	Albert Krewinkel	2	-28/+30

2021-02-03	ePub writer: `belongs-to-collection` metadata (#7063)	Nick Berendsen	1	-41/+58

2021-02-02	Lua: add module "pandoc.path"	Albert Krewinkel	1	-0/+2
	The module allows to work with file paths in a convenient and platform-independent manner. Closes: #6001 Closes: #6565
2021-02-02	Add parseOptionsFromArgs [API change, addition].	John MacFarlane	2	-2/+9
	Exported by Text.Pandoc.App.
2021-02-01	BibTeX writer: use doclayout and doctemplate.	John MacFarlane	3	-23/+39
	This change allows bibtex/biblatex output to wrap as other formats do, depending on the settings of `--wrap` and `--columns`. It also introduces default templates for bibtex and biblatex, which allow for using the variables `header-include`, `include-before` or `include-after` (or alternatively the command line options `--include-in-header`, `--include-before-body`, `--include-after-body`) to insert content into the generated bibtex/biblatex. This change requires a change in the return type of the unexported `T.P.Citeproc.writeBibTeXString` from `Text` to `Doc Text`. Closes #7068.
2021-02-01	BibTeX writer fixes. Closes #7067.	John MacFarlane	1	-6/+15
	+ Require citeproc 0.3.0.7, which correctly titlecases when titles contain non-ASCII characters. + Correctly handle 'pages' (= 'page' in CSL). + Correctly handle BibLaTeX 'langid' (= 'language' in CSL). + In BibTeX output, protect foreign titles since there's no language field.
2021-01-31	RST reader: fix handling of header in CSV tables.	John MacFarlane	1	-4/+5
	The interpretation of this line is not affected by the delim option. Closes #7064.
2021-01-31	CslJson writer: fix compiler warning	Albert Krewinkel	1	-1/+1

2021-01-30	CslJson writer: output `[]` if no references in input,	John MacFarlane	1	-5/+5
	instead of raising a PandocAppError as before.
2021-01-29	Markdown writer: handle math right before digit.	John MacFarlane	1	-1/+5
	We insert an HTML comment to avoid a `$` right before a digit, which pandoc will not recognize as a math delimiter.
2021-01-29	JATS writer: escape special chars in reference elements.	Albert Krewinkel	1	-3/+6
	Prevents the generation of invalid markup if a citation element contains an ampersand or another character with a special meaning in XML.
2021-01-26	Clean up BibTeX parsing.	John MacFarlane	2	-32/+19
	Previously there was a messy code path that gave strange results in some cases, not passing through raw tex but trying to extract a string content. This was an artefact of trying to handle some special bibtex-specific commands in the BibTeX reader. Now we just handle these in the LaTeX reader and simplify parsing in the BibTeX reader. This does mean that more raw tex will be passed through (and currently this is not sensitive to the `raw_tex` extension; this should be fixed). Closes #7049.
2021-01-26	LaTeX writer: change BCP47 lang tag from jp to ja	Mauro Bieg	1	-1/+1
	fixes #7047
2021-01-26	Lua: always load built-in Lua scripts from default data-dir	Albert Krewinkel	4	-46/+44
	The Lua modules `pandoc` and `pandoc.List` are now always loaded from the system's default data directory. Loading from a different directory by overriding the default path, e.g. via `--data-dir`, is no longer supported to avoid unexpected behavior and to address security concerns.
2021-01-22	ImageSize: use viewBox for svg if no length, width.	John MacFarlane	1	-2/+6
	This change allows pandoc to extract size information from more SVGs. Closes #7045.
2021-01-22	Merge pull request #7042 from tarleb/jats-element-citations	John MacFarlane	5	-25/+213
	JATS writer: use element citations
2021-01-22	JATS writer: allow to use element-citation	Albert Krewinkel	4	-7/+192

2021-01-22	Add biblatex, bibtex as output formats (closes #7040).	John MacFarlane	4	-3/+306
	* `biblatex` and `bibtex` are now supported as output as well as input formats. * New module Text.Pandoc.Writers.BibTeX, exporting writeBibTeX and writeBibLaTeX. [API change] * New unexported function `writeBibtexString` in Text.Pandoc.Citeproc.BibTeX.
2021-01-21	Text.Pandoc.Citeproc: use finer grained imports	Albert Krewinkel	1	-18/+21
	This allows to import the module in writers without causing a circular dependency.
2021-01-19	JATS writer: Ensure that disp-quote is always wrapped in p.	John MacFarlane	1	-1/+3
	Closes #7041.
2021-01-18	RST writer: fix #7039.	John MacFarlane	1	-2/+2
	We were losing content from inside spans with a class, due to logic that is meant to avoid nested inline structures that can't be represented in RST. The logic was a bit stricter than necessary. This commit fixes the issue.
2021-01-16	Revert "Markdown reader: support GitHub wiki's internal links (#2923) (#6458)"	John MacFarlane	2	-28/+0
	This reverts commit 6efd3460a776620fdb93812daa4f6831e6c332ce. Since this extension is designed to be used with GitHub markdown (gfm), we need to implement the parser as a commonmark extension (commonmark-extensions), rather than in pandoc's markdown reader. When that is done, we can add it here.
2021-01-16	Markdown reader: support GitHub wiki's internal links (#2923) (#6458)	Gautier DI FOLCO	2	-0/+28
	Canges overview: * Add a `Ext_markdown_github_wikilink` constructor to `Extension` [API change]. * Add the parser `githubWikiLink` in `Text.Pandoc.Readers.Markdown` * Add tests.
2021-01-16	Recognize more extensions as markdown by default.	John MacFarlane	1	-0/+5
	`mkdn`, `mkd`, `mdwn`, `mdown`, `Rmd`. Closes #7034.
2021-01-12	Markdown writer: cleaned up raw formats.	John MacFarlane	1	-34/+35
	We now react appropriately to gfm, commonmark, and commonmark_x as raw formats.
2021-01-12	Docx writer: handle table header using styles.	John MacFarlane	1	-17/+20
	Instead of hard-coding the border and header cell vertical alignment, we now let this be determined by the Table style, making use of Word's "conditional formatting" for the table's first row. For headerless tables, we use the tblLook element to tell Word not to apply conditional first-row formatting. Closes #7008.