pandoc - Conversion between markup formats

Age	Commit message (Collapse)	Author	Files	Lines
2021-02-16	Rename Text.Pandoc.XMLParser -> Text.Pandoc.XML.Light...	John MacFarlane	1	-0/+0
	..and add new definitions isomorphic to xml-light's, but with Text instead of String. This allows us to keep most of the code in existing readers that use xml-light, but avoid lots of unnecessary allocation. We also add versions of the functions from xml-light's Text.XML.Light.Output and Text.XML.Light.Proc that operate on our modified XML types, and functions that convert xml-light types to our types (since some of our dependencies, like texmath, use xml-light). Update golden tests for docx and pptx. OOXML test: Use `showContent` instead of `ppContent` in `displayDiff`. Docx: Do a manual traversal to unwrap sdt and smartTag. This is faster, and needed to pass the tests. Benchmarks: A = prior to 8ca191604dcd13af27c11d2da225da646ebce6fc (Feb 8) B = as of 8ca191604dcd13af27c11d2da225da646ebce6fc (Feb 8) C = this commit \| Reader \| A \| B \| C \| \| ------- \| ----- \| ------ \| ----- \| \| docbook \| 18 ms \| 12 ms \| 10 ms \| \| opml \| 65 ms \| 62 ms \| 35 ms \| \| jats \| 15 ms \| 11 ms \| 9 ms \| \| docx \| 72 ms \| 69 ms \| 44 ms \| \| odt \| 78 ms \| 41 ms \| 28 ms \| \| epub \| 64 ms \| 61 ms \| 56 ms \| \| fb2 \| 14 ms \| 5 ms \| 4 ms \|
2021-02-10	Add new unexported module T.P.XMLParser.	John MacFarlane	1	-0/+0
	This exports functions that uses xml-conduit's parser to produce an xml-light Element or [Content]. This allows existing pandoc code to use a better parser without much modification. The new parser is used in all places where xml-light's parser was previously used. Benchmarks show a significant performance improvement in parsing XML-based formats (especially ODT and FB2). Note that the xml-light types use String, so the conversion from xml-conduit types involves a lot of extra allocation. It would be desirable to avoid that in the future by gradually switching to using xml-conduit directly. This can be done module by module. The new parser also reports errors, which we report when possible. A new constructor PandocXMLError has been added to PandocError in T.P.Error [API change]. Closes #7091, which was the main stimulus. These changes revealed the need for some changes in the tests. The docbook-reader.docbook test lacked definitions for the entities it used; these have been added. And the docx golden tests have been updated, because the new parser does not preserve the order of attributes. Add entity defs to docbook-reader.docbook. Update golden tests for docx.
2021-01-12	Docx writer: handle table header using styles.	John MacFarlane	1	-0/+0
	Instead of hard-coding the border and header cell vertical alignment, we now let this be determined by the Table style, making use of Word's "conditional formatting" for the table's first row. For headerless tables, we use the tblLook element to tell Word not to apply conditional first-row formatting. Closes #7008.
2020-11-26	Docx writer: Fix bullets/lists indentation	cholonam	1	-0/+0
	Fix appearance of bullets/numbered lists (the first level is slightly indented to the right instead of right on the margin). New golden files have been tested using Word 2010 on Windows 10.
2020-07-22	Docx writer: support --number-sections.	John MacFarlane	1	-0/+0
	Closes #1413.
2020-05-16	Docx writer: enable column and row bands for tables.	John MacFarlane	1	-0/+0
	This change will not have any effect with the default style. However, it enables users to use a style (via a reference.docx) that turns on row and/or column bands. Closes #6371.
2019-11-16	Change styles in reference.docx.	John MacFarlane	1	-0/+0
	All headings now have a uniform color. Level-1 headings no longer set `w:themeShade="B5"`. Level-2 headings are now 14 point rather than 16 point. Level-3 headings are now 12 point rather than 14 point. Level-4 headings are italic rather than bold. Closes #5820.
2019-11-14	Change reference.docx to use more normal block quotes.	John MacFarlane	1	-0/+0
	Indented left and right, same font and size. Previously it was unindented, smaller font and different typeface. See #5820.
2019-09-20	Preserve built-in styles in DOCX with custom style (#5670)	Ben Steinberg	1	-0/+0
	This commit prevents custom styles on divs and spans from overriding styles on certain elements inside them, like headings, blockquotes, and links. On those elements, the "native" style is required for the element to display correctly. This change also allows nesting of custom styles; in order to do so, it removes the default "Compact" style applied to Plain blocks, except when inside a table.
2019-03-11	docx writer: avoid extra copy of abstractNum and num elements...	John MacFarlane	1	-0/+0
	...in numbering.xml. This caused pandoc-produced docx files to be uneditable using Word Online. The problem was that recent versions of reference.docx include samples of various kinds of text, including lists. The numering elements for these were getting copied over to the new docx, where they clashed with the autogenerated elements produced by pandoc. This didn't confuse Desktop Word, but it did confuse Word Online. Closes #5358.
2019-01-26	Improve writing metadata for docx, pptx and odt (#5252)	Agustín Martín Barbero	1	-0/+0
	* docx writer: support custom properties. Solves the writer part of #3024. Also supports additional core properties: `subject`, `lang`, `category`, `description`. * odt writer: improve standard properties, including the following core properties: `generator` (Pandoc/VERSION), `description`, `subject`, `keywords`, `initial-creator` (from authors), `creation-date` (actual creation date). Also fix date. * pptx writer: support custom properties. Also supports additional core properties: `subject`, `category`, `description`. * Includes golden tests. * MANUAL: document metadata support for docx, odt, pptx writers