pandoc - Conversion between markup formats

Age	Commit message (Collapse)	Author	Files	Lines
2021-02-16	Rename Text.Pandoc.XMLParser -> Text.Pandoc.XML.Light...	John MacFarlane	6	-513/+485
	..and add new definitions isomorphic to xml-light's, but with Text instead of String. This allows us to keep most of the code in existing readers that use xml-light, but avoid lots of unnecessary allocation. We also add versions of the functions from xml-light's Text.XML.Light.Output and Text.XML.Light.Proc that operate on our modified XML types, and functions that convert xml-light types to our types (since some of our dependencies, like texmath, use xml-light). Update golden tests for docx and pptx. OOXML test: Use `showContent` instead of `ppContent` in `displayDiff`. Docx: Do a manual traversal to unwrap sdt and smartTag. This is faster, and needed to pass the tests. Benchmarks: A = prior to 8ca191604dcd13af27c11d2da225da646ebce6fc (Feb 8) B = as of 8ca191604dcd13af27c11d2da225da646ebce6fc (Feb 8) C = this commit \| Reader \| A \| B \| C \| \| ------- \| ----- \| ------ \| ----- \| \| docbook \| 18 ms \| 12 ms \| 10 ms \| \| opml \| 65 ms \| 62 ms \| 35 ms \| \| jats \| 15 ms \| 11 ms \| 9 ms \| \| docx \| 72 ms \| 69 ms \| 44 ms \| \| odt \| 78 ms \| 41 ms \| 28 ms \| \| epub \| 64 ms \| 61 ms \| 56 ms \| \| fb2 \| 14 ms \| 5 ms \| 4 ms \|
2021-02-13	Org: support task_lists extension	Albert Krewinkel	1	-3/+13
	The tasks lists extension is now supported by the org reader and writer; the extension is turned on by default. Closes: #6336
2021-02-11	Use getTimestamp instead of getCurrentTime in writers.	John MacFarlane	4	-5/+5
	Setting SOURCE_DATE_EPOCH will allow reproducible builds. Partially addresses #7093. This does not suffice to fully enable reproducible in EPUB, since a unique id is being generated for each build.
2021-02-10	Add new unexported module T.P.XMLParser.	John MacFarlane	5	-24/+42
	This exports functions that uses xml-conduit's parser to produce an xml-light Element or [Content]. This allows existing pandoc code to use a better parser without much modification. The new parser is used in all places where xml-light's parser was previously used. Benchmarks show a significant performance improvement in parsing XML-based formats (especially ODT and FB2). Note that the xml-light types use String, so the conversion from xml-conduit types involves a lot of extra allocation. It would be desirable to avoid that in the future by gradually switching to using xml-conduit directly. This can be done module by module. The new parser also reports errors, which we report when possible. A new constructor PandocXMLError has been added to PandocError in T.P.Error [API change]. Closes #7091, which was the main stimulus. These changes revealed the need for some changes in the tests. The docbook-reader.docbook test lacked definitions for the entities it used; these have been added. And the docx golden tests have been updated, because the new parser does not preserve the order of attributes. Add entity defs to docbook-reader.docbook. Update golden tests for docx.
2021-02-03	ePub writer: `belongs-to-collection` metadata (#7063)	Nick Berendsen	1	-41/+58

2021-02-01	BibTeX writer: use doclayout and doctemplate.	John MacFarlane	1	-3/+16
	This change allows bibtex/biblatex output to wrap as other formats do, depending on the settings of `--wrap` and `--columns`. It also introduces default templates for bibtex and biblatex, which allow for using the variables `header-include`, `include-before` or `include-after` (or alternatively the command line options `--include-in-header`, `--include-before-body`, `--include-after-body`) to insert content into the generated bibtex/biblatex. This change requires a change in the return type of the unexported `T.P.Citeproc.writeBibTeXString` from `Text` to `Doc Text`. Closes #7068.
2021-01-31	CslJson writer: fix compiler warning	Albert Krewinkel	1	-1/+1

2021-01-30	CslJson writer: output `[]` if no references in input,	John MacFarlane	1	-5/+5
	instead of raising a PandocAppError as before.
2021-01-29	Markdown writer: handle math right before digit.	John MacFarlane	1	-1/+5
	We insert an HTML comment to avoid a `$` right before a digit, which pandoc will not recognize as a math delimiter.
2021-01-29	JATS writer: escape special chars in reference elements.	Albert Krewinkel	1	-3/+6
	Prevents the generation of invalid markup if a citation element contains an ampersand or another character with a special meaning in XML.
2021-01-26	LaTeX writer: change BCP47 lang tag from jp to ja	Mauro Bieg	1	-1/+1
	fixes #7047
2021-01-22	Merge pull request #7042 from tarleb/jats-element-citations	John MacFarlane	3	-7/+178
	JATS writer: use element citations
2021-01-22	JATS writer: allow to use element-citation	Albert Krewinkel	3	-7/+178

2021-01-22	Add biblatex, bibtex as output formats (closes #7040).	John MacFarlane	1	-0/+48
	* `biblatex` and `bibtex` are now supported as output as well as input formats. * New module Text.Pandoc.Writers.BibTeX, exporting writeBibTeX and writeBibLaTeX. [API change] * New unexported function `writeBibtexString` in Text.Pandoc.Citeproc.BibTeX.
2021-01-19	JATS writer: Ensure that disp-quote is always wrapped in p.	John MacFarlane	1	-1/+3
	Closes #7041.
2021-01-18	RST writer: fix #7039.	John MacFarlane	1	-2/+2
	We were losing content from inside spans with a class, due to logic that is meant to avoid nested inline structures that can't be represented in RST. The logic was a bit stricter than necessary. This commit fixes the issue.
2021-01-12	Markdown writer: cleaned up raw formats.	John MacFarlane	1	-34/+35
	We now react appropriately to gfm, commonmark, and commonmark_x as raw formats.
2021-01-12	Docx writer: handle table header using styles.	John MacFarlane	1	-17/+20
	Instead of hard-coding the border and header cell vertical alignment, we now let this be determined by the Table style, making use of Word's "conditional formatting" for the table's first row. For headerless tables, we use the tblLook element to tell Word not to apply conditional first-row formatting. Closes #7008.
2021-01-10	JATS writer: fix citations (#7018)	Albert Krewinkel	1	-12/+21
	* JATS writer: keep code lines at 80 chars or below * JATS writer: fix citations
2021-01-10	Fix infinite HTTP requests when writing epubs from URL source.	John MacFarlane	1	-5/+9
	Due to a bug in code added to avoid overwriting the cover image if it had the form `fileX.YYY`, pandoc made an endless sequence of HTTP requests when writing epub with input from a URL. Closes #7013.
2021-01-08	Update copyright notices for 2021 (#7012)	Albert Krewinkel	38	-39/+39

2021-01-07	gfm/commonmark writer: implement start number on ordered lists.	John MacFarlane	1	-1/+4
	Previously they always started at 1, but according to the spec the start number is respected. Closes #7009.
2021-01-05	HTML writer: fix implicit_figure at end of footnotes.	John MacFarlane	1	-3/+7
	Closes #7006.
2021-01-04	EPUB writer: adjust internal links to identifiers...	John MacFarlane	1	-0/+20
	defined in raw HTML sections after splitting into chapters. Closes #7000.
2021-01-03	EPUB writer: recognize `Format "html4"`, `Format "html5"` as raw HTML.	John MacFarlane	1	-2/+8

2021-01-03	EPUB writer: adjust internal links to images, links, and tables...	John MacFarlane	1	-0/+6
	after splitting into chapters. Previously we only did this for Div and Span and Header elements. See #7000.
2021-01-02	LaTeX writer: revert table line height increase in 2.11.3.	John MacFarlane	1	-1/+1
	In 2.11.3 we started adding `\addlinespace`, which produced less dense tables. This wasn't an intentional change; I misunderstood a comment in the discussion leading up to the change. This commit restores the earlier default table appearance. Note that if you want a less dense table, you can use something like `\def\arraystretch{1.5}` in your header. Closes #6996.
2020-12-30	Undo the "Use fromRight" hlint hint.	John MacFarlane	1	-2/+1

2020-12-30	Hlint fixes	John MacFarlane	1	-1/+2

2020-12-30	Ms writer: don't justify inside table cells.	John MacFarlane	1	-1/+3

2020-12-29	Improve fix to #6983.	John MacFarlane	1	-1/+3
	If we have a paragraph then a bookmarkEnd, we don't need to insert the empty paragraph (and in fact it alters the spacing). Closes #6983.
2020-12-28	Docx writer: fix nested tables with captions.	John MacFarlane	1	-4/+6
	Previously we got unreadable content, because docx seems to want a `<w:p>` element (even an empty one) at the end of every table cell. Closes #6983.
2020-12-27	Use meta-description instead of description in templates.	John MacFarlane	1	-0/+3
	Since this is an attribute value, we need to prepare it in the writer.
2020-12-27	Add support for writing nested tables to asciidoc (#6972)	timo-a	1	-7/+32
	Added field to WriterState that denotes the current nesting level for traversing tables. Depending on the value of that field nested tables are recognized and written. Asciidoc supports one level of nesting. If deeper tables are to be written, they are omitted and a warning is issued.
2020-12-27	Powerpoint writer: allow arbitrary OOXML in raw inline elements	Albert Krewinkel	1	-22/+27
	The raw text is now included verbatim in the output. Previously is was parsed into XML elements, which prevented the inclusion of partial XML snippets.
2020-12-20	HTML writer: don't include p tags in CSL bibliography entries.	John MacFarlane	1	-2/+7
	Fixes a regression in 2.11.3. Closes #6966
2020-12-20	LaTeX writer: support colspans and rowspans in tables. (#6950)	Albert Krewinkel	4	-96/+236
	Note that the multirow package is needed for rowspans. It is included in the latex template under a variable, so that it won't be used unless needed for a table.
2020-12-15	Properly handle boolean values in writing YAML metadata.	John MacFarlane	2	-2/+3
	(Markdown writer.) This requires doctemplates >= 0.9. Closes #6388.
2020-12-13	RST writer: better image handling.	John MacFarlane	1	-9/+21
	- An image alone in its paragraph (but not a figure) is now rendered as an independent image, with an `alt` attribute if a description is supplied. - An inline image that is not alone in its paragraph will be rendered, as before, using a substitution. Such an image cannot have a "center", "left", or "right" alignment, so the classes `align-center`, `align-left`, or `align-right` are ignored. However, `align-top`, `align-middle`, `align-bottom` will generate a corresponding `align` attribute. Closes #6948.
2020-12-13	Docx writer: keep raw openxml strings verbatim.	Albert Krewinkel	1	-2/+5
	Closes: #6933
2020-12-13	Docx writer: use Content instead of Element.	Albert Krewinkel	1	-59/+75

2020-12-12	Merge pull request #6946 from mb21/icml-image-fit	John MacFarlane	1	-1/+6
	ICML writer: fix image bounding box for custom widths/heights
2020-12-12	LaTeX writer: extract table handling into separate module.	Albert Krewinkel	5	-237/+355

2020-12-12	ICML writer: fix image bounding box for custom widths/heights	mb21	1	-1/+6
	fixes #6936
2020-12-07	Merge pull request #6922 from jtojnar/db-writer-admonitions	John MacFarlane	1	-19/+45
	Docbook writer: handle admonitions
2020-12-07	Docbook writer: Handle admonition titles from Markdown reader	Jan Tojnar	1	-0/+2
	Docbook reader produces a `Div` with `title` class for `<title>` element within an “admonition” element. Markdown writer then turns this into a fenced div with `title` class attribute. Since fenced divs are block elements, their content is recognized as a paragraph by the Markdown reader. This is an issue for Docbook writer because it would produce an invalid DocBook document from such AST – the `<title>` element can only contain “inline” elements. Let’s handle this invalid special case separately by unwrapping the paragraph before creating the `<title>` element.
2020-12-07	Docbook writer: Use correct id attribute consistently	Jan Tojnar	1	-10/+16
	DocBook5 should always use xml:id instead of id so let’s use it everywhere.
2020-12-07	Docbook writer: handle admonitions	Jan Tojnar	1	-12/+30
	Similarly to https://github.com/jgm/pandoc/commit/d6fdfe6f2bba2a8ed25d6c9f11861774001f7a91, we should handle admonitions.
2020-12-05	OpenDocument writer: Allow references for internal links (#6774)	Nils Carlson	1	-18/+73
	This commit adds two extensions to the OpenDocument writer, `xrefs_name` and `xrefs_number`. Links to headings, figures and tables inside the document are substituted with cross-references that will use the name or caption of the referenced item for `xrefs_name` or the number for `xrefs_number`. For the `xrefs_number` to be useful heading numbers must be enabled in the generated document and table and figure captions must be enabled using for example the `native_numbering` extension. In order for numbers and reference text to be updated the generated document must be refreshed. Co-authored-by: Nils Carlson <nils.carlson@ludd.ltu.se>
2020-12-04	Docbook writer: add XML namespaces to top-level elements (#6923)	Jan Tojnar	1	-8/+20
	Previously, we only added xmlns attributes to chapter elements, even when running with --top-level-division=section. Let’s add the namespaces to part and section elements too, when they are the selected top-level divisions. We do not need to add namespaces to documents produced with --standalone flag, since those will already have xmlns attribute on the root element in the template.