pandoc - Conversion between markup formats

Age	Commit message (Collapse)	Author	Files	Lines
2021-02-16	Rename Text.Pandoc.XMLParser -> Text.Pandoc.XML.Light...	John MacFarlane	1	-133/+130
	..and add new definitions isomorphic to xml-light's, but with Text instead of String. This allows us to keep most of the code in existing readers that use xml-light, but avoid lots of unnecessary allocation. We also add versions of the functions from xml-light's Text.XML.Light.Output and Text.XML.Light.Proc that operate on our modified XML types, and functions that convert xml-light types to our types (since some of our dependencies, like texmath, use xml-light). Update golden tests for docx and pptx. OOXML test: Use `showContent` instead of `ppContent` in `displayDiff`. Docx: Do a manual traversal to unwrap sdt and smartTag. This is faster, and needed to pass the tests. Benchmarks: A = prior to 8ca191604dcd13af27c11d2da225da646ebce6fc (Feb 8) B = as of 8ca191604dcd13af27c11d2da225da646ebce6fc (Feb 8) C = this commit \| Reader \| A \| B \| C \| \| ------- \| ----- \| ------ \| ----- \| \| docbook \| 18 ms \| 12 ms \| 10 ms \| \| opml \| 65 ms \| 62 ms \| 35 ms \| \| jats \| 15 ms \| 11 ms \| 9 ms \| \| docx \| 72 ms \| 69 ms \| 44 ms \| \| odt \| 78 ms \| 41 ms \| 28 ms \| \| epub \| 64 ms \| 61 ms \| 56 ms \| \| fb2 \| 14 ms \| 5 ms \| 4 ms \|
2021-02-11	Use getTimestamp instead of getCurrentTime in writers.	John MacFarlane	1	-1/+1
	Setting SOURCE_DATE_EPOCH will allow reproducible builds. Partially addresses #7093. This does not suffice to fully enable reproducible in EPUB, since a unique id is being generated for each build.
2021-01-12	Docx writer: handle table header using styles.	John MacFarlane	1	-17/+20
	Instead of hard-coding the border and header cell vertical alignment, we now let this be determined by the Table style, making use of Word's "conditional formatting" for the table's first row. For headerless tables, we use the tblLook element to tell Word not to apply conditional first-row formatting. Closes #7008.
2021-01-08	Update copyright notices for 2021 (#7012)	Albert Krewinkel	1	-1/+1

2020-12-30	Undo the "Use fromRight" hlint hint.	John MacFarlane	1	-2/+1

2020-12-30	Hlint fixes	John MacFarlane	1	-1/+2

2020-12-29	Improve fix to #6983.	John MacFarlane	1	-1/+3
	If we have a paragraph then a bookmarkEnd, we don't need to insert the empty paragraph (and in fact it alters the spacing). Closes #6983.
2020-12-28	Docx writer: fix nested tables with captions.	John MacFarlane	1	-4/+6
	Previously we got unreadable content, because docx seems to want a `<w:p>` element (even an empty one) at the end of every table cell. Closes #6983.
2020-12-13	Docx writer: keep raw openxml strings verbatim.	Albert Krewinkel	1	-2/+5
	Closes: #6933
2020-12-13	Docx writer: use Content instead of Element.	Albert Krewinkel	1	-59/+75

2020-12-03	Docx writer: Support bold and italic in "complex script."	John MacFarlane	1	-2/+6
	Previously bold and italics didn't work properly in LTR text. This commit causes the w:bCs and w:iCs attributes to be used, in addition to w:b and w:i, for bold and italics respectively. Closes #6911.
2020-11-26	Docx writer: Fix bullets/lists indentation	cholonam	1	-3/+3
	Fix appearance of bullets/numbered lists (the first level is slightly indented to the right instead of right on the margin). New golden files have been tested using Word 2010 on Windows 10.
2020-10-06	DOCX reader: Allow empty dates in comments and tracked changes (#6726)	Diego Balseiro	1	-11/+7
	For security reasons, some legal firms delete the date from comments and tracked changes. * Make date optional (Maybe) in tracked changes and comments datatypes * Add tests
2020-10-02	Docx writer: better handle list items whose contents are lists (#6522)	Michael Hoffmann	1	-3/+13
	If the first element of a bulleted or ordered list is another list, then that first item will disappear if the target format is docx. This changes the docx writer so that it prepends an empty string for those cases. With this, no items will disappear. Closes #5948.
2020-09-21	Add built-in citation support using new citeproc library.	John MacFarlane	1	-0/+12
	This deprecates the use of the external pandoc-citeproc filter; citation processing is now built in to pandoc. * Add dependency on citeproc library. * Add Text.Pandoc.Citeproc module (and some associated unexported modules under Text.Pandoc.Citeproc). Exports `processCitations`. [API change] * Add data files needed for Text.Pandoc.Citeproc: default.csl in the data directory, and a citeproc directory that is just used at compile-time. Note that we've added file-embed as a mandatory rather than a conditional depedency, because of the biblatex localization files. We might eventually want to use readDataFile for this, but it would take some code reorganization. * Text.Pandoc.Loging: Add `CiteprocWarning` to `LogMessage` and use it in `processCitations`. [API change] * Add tests from the pandoc-citeproc package as command tests (including some tests pandoc-citeproc did not pass). * Remove instructions for building pandoc-citeproc from CI and release binary build instructions. We will no longer distribute pandoc-citeproc. * Markdown reader: tweak abbreviation support. Don't insert a nonbreaking space after a potential abbreviation if it comes right before a note or citation. This messes up several things, including citeproc's moving of note citations. * Add `csljson` as and input and output format. This allows pandoc to convert between `csljson` and other bibliography formats, and to generate formatted versions of CSL JSON bibliographies. * Add module Text.Pandoc.Writers.CslJson, exporting `writeCslJson`. [API change] * Add module Text.Pandoc.Readers.CslJson, exporting `readCslJson`. [API change] * Added `bibtex`, `biblatex` as input formats. This allows pandoc to convert between BibLaTeX and BibTeX and other bibliography formats, and to generated formatted versions of BibTeX/BibLaTeX bibliographies. * Add module Text.Pandoc.Readers.BibTeX, exporting `readBibTeX` and `readBibLaTeX`. [API change] * Make "standalone" implicit if output format is a bibliography format. This is needed because pandoc readers for bibliography formats put the bibliographic information in the `references` field of metadata; and unless standalone is specified, metadata gets ignored. (TODO: This needs improvement. We should trigger standalone for the reader when the input format is bibliographic, and for the writer when the output format is markdown.) * Carry over `citationNoteNum` to `citationNoteNumber`. This was just ignored in pandoc-citeproc. * Text.Pandoc.Filter: Add `CiteprocFilter` constructor to Filter. [API change] This runs the processCitations transformation. We need to treat it like a filter so it can be placed in the sequence of filter runs (after some, before others). In FromYAML, this is parsed from `citeproc` or `{type: citeproc}`, so this special filter may be specified either way in a defaults file (or by `citeproc: true`, though this gives no control of positioning relative to other filters). TODO: we need to add something to the manual section on defaults files for this. * Add deprecation warning if `upandoc-citeproc` filter is used. * Add `--citeproc/-C` option to trigger citation processing. This behaves like a filter and will be positioned relative to filters as they appear on the command line. * Rewrote the manual on citatations, adding a dedicated Citations section which also includes some information formerly found in the pandoc-citeproc man page. * Look for CSL styles in the `csl` subdirectory of the pandoc user data directory. This changes the old pandoc-citeproc behavior, which looked in `~/.csl`. Users can simply symlink `~/.csl` to the `csl` subdirectory of their pandoc user data directory if they want the old behavior. * Add support for CSL bibliography entry formatting to LaTeX, HTML, Ms writers. Added CSL-related CSS to styles.html.
2020-09-13	Fix hlint suggestions, update hlint.yaml (#6680)	Christian Despres	1	-1/+1
	* Fix hlint suggestions, update hlint.yaml Most suggestions were redundant brackets. Some required LambdaCase. The .hlint.yaml file had a small typo, and didn't ignore camelCase suggestions in certain modules.
2020-09-11	Use the original tail instead of deconstructing and reconstructing it (#6678)	Joseph C. Sible	1	-2/+2

2020-08-24	Docx writer: separate adjacent tables.	John MacFarlane	1	-1/+9
	Word combines adjacent tables, so to prevent this we insert an empty paragraph between two adjacent tables. Closes #4315.
2020-07-22	Docx writer: support --number-sections.	John MacFarlane	1	-4/+17
	Closes #1413.
2020-05-16	Docx writer: enable column and row bands for tables.	John MacFarlane	1	-1/+6
	This change will not have any effect with the default style. However, it enables users to use a style (via a reference.docx) that turns on row and/or column bands. Closes #6371.
2020-04-28	Support new Underline element in readers and writers (#6277)	Vaibhav Sagar	1	-3/+3
	Deprecate `underlineSpan` in Shared in favor of `Text.Pandoc.Builder.underline`.
2020-04-15	Adapt to the newest Table type, fix some previous adaptation issues	despresc	1	-1/+1
	- Writers.Native is now adapted to the new Table type. - Inline captions should now be conditionally wrapped in a Plain, not a Para block. - The toLegacyTable function now lives in Writers.Shared.
2020-04-15	Implement the new Table type	despresc	1	-3/+4

2020-03-29	Clean up and simplify Text.Pandoc.Writers.Docx (#6229)	Joseph C. Sible	1	-56/+48
	* Use <\|> to simplify the Semigroup instance * Use map instead of reimplementing it * Simplify isValidChar * Remove an unnecessary nested do block * Simplify pgContentWidth * Simplify addLang * Simplify newStyles * Avoid an unnecessary fmap in headerFooterEntries * Remove unnecessary monadicity from mkNumbering and mkAbstractNum * Use randomRs instead of constantly messing with the RNG state * Lift common functions out of ifs * Hoist not * Clarify withTextPropM and withParaPropM
2020-03-29	Clean up some fmaps (#6226)	Joseph C. Sible	1	-3/+3
	* Avoid fmapping when we're just binding right after anyway * Clean up unnecessary fmaps in the LaTeX reader
2020-03-22	Finer grained imports of Text.Pandoc.Class submodules (#6203)	Albert Krewinkel	1	-2/+2
	This should speed-up recompilation after changes in `Text.Pandoc.Class`, as the number of modules affected by a change will be smaller in general. It also offers faster insights into the parts of `T.P.Class` used within a module.
2020-03-15	Use implicit Prelude (#6187)	Albert Krewinkel	1	-2/+0
	* Use implicit Prelude The previous behavior was introduced as a fix for #4464. It seems that this change alone did not fix the issue, and `stack ghci` and `cabal repl` only work with GHC 8.4.1 or newer, as no custom Prelude is loaded for these versions. Given this, it seems cleaner to revert to the implicit Prelude. * PandocMonad: remove outdated check for base version Only base versions 4.9 and later are supported, the check for `MIN_VERSION_base(4,8,0)` is therefore unnecessary. * Always use custom prelude Previously, the custom prelude was used only with older GHC versions, as a workaround for problems with ghci. The ghci problems are resolved by replacing package `base` with `base-noprelude`, allowing for consistent use of the custom prelude across all GHC versions.
2020-03-13	Update copyright year (#6186)	Albert Krewinkel	1	-1/+1
	* Update copyright year * Copyright: add notes for Lua and Jira modules
2020-01-19	Docx writer: fix regression with Compact style on tight lists. (#6073)	John MacFarlane	1	-1/+9
	Starting in 2.8, the docx writer no longer distinguishes between tight and loose lists, since the Compact style is omitted. This is a side-effect of the fix to #5670, as explained in the changelog: + Preserve built-in styles in DOCX with custom style (Ben Steinberg, #5670). This change prevents custom styles on divs and spans from overriding styles on certain elements inside them, like headings, blockquotes, and links. On those elements, the "native" style is required for the element to display correctly. This change also allows nesting of custom styles; in order to do so, it removes the default "Compact" style applied to Plain blocks, except when inside a table. This patch fixes the problem by extending the exception currently offered to Plain blocks inside tables to Plain blocks inside list items. Closes #6072.
2019-11-12	Switch to new pandoc-types and use Text instead of String [API change].	despresc	1	-82/+87
	PR #5884. + Use pandoc-types 1.20 and texmath 0.12. + Text is now used instead of String, with a few exceptions. + In the MediaBag module, some of the types using Strings were switched to use FilePath instead (not Text). + In the Parsing module, new parsers `manyChar`, `many1Char`, `manyTillChar`, `many1TillChar`, `many1Till`, `manyUntil`, `mantyUntilChar` have been added: these are like their unsuffixed counterparts but pack some or all of their output. + `glob` in Text.Pandoc.Class still takes String since it seems to be intended as an interface to Glob, which uses strings. It seems to be used only once in the package, in the EPUB writer, so that is not hard to change.
2019-11-11	Fix typos (#5896)	Brian Wignall	1	-1/+1

2019-09-28	More throwError in place of fail.	John MacFarlane	1	-0/+1

2019-09-28	Replace some more fails with throwErrors.	John MacFarlane	1	-2/+5

2019-09-28	Use Prelude.fail to avoid ambiguity with fail from GHC.Base.	John MacFarlane	1	-1/+1

2019-09-22	[Docx Writer] Re-use Readers.Docx.Parse for StyleMap (#5766)	Nikolay Yakimov	1	-31/+30
	* [Docx Parser] Move style-parsing-specific code to a new module * [Docx Writer] Re-use Readers.Docx.Parse.Styles for StyleMap * [Docx Writer] Move Readers.Docx.StyleMap to Writers.Docx.StyleMap It's never used outside of writer code, so it makes more sense to scope it under writers really.
2019-09-21	[Docx Writer] Consistently use style names, not style ids	Nikolay Yakimov	1	-27/+25
	Styles that this change affects: paragraph styles: Author, Abstract, Compact, Figure, Captioned Figure, Image Caption, First Paragraph, Source Code, Table Caption, Definition, Definition Term; character styles: Verbatim Char, token styles (those with names ending in Tok)
2019-09-21	[Docx Writer] Code clean-up	Nikolay Yakimov	1	-40/+37
	Reduce code duplication, remove redundant brackets
2019-09-20	Preserve built-in styles in DOCX with custom style (#5670)	Ben Steinberg	1	-22/+55
	This commit prevents custom styles on divs and spans from overriding styles on certain elements inside them, like headings, blockquotes, and links. On those elements, the "native" style is required for the element to display correctly. This change also allows nesting of custom styles; in order to do so, it removes the default "Compact" style applied to Plain blocks, except when inside a table.
2019-09-08	Replace Element and makeHierarchical with makeSections.	John MacFarlane	1	-1/+1
	Text.Pandoc.Shared: + Remove `Element` type [API change] + Remove `makeHierarchicalize` [API change] + Add `makeSections` [API change] + Export `deLink` [API change] Now that we have Divs, we can use them to represent the structure of sections, and we don't need a special Element type. `makeSections` reorganizes a block list, adding Divs with class `section` around sections, and adding numbering if needed. This change also fixes some longstanding issues recognizing section structure when the document contains Divs. Closes #3057, see also #997. All writers have been changed to use `makeSections`. Note that in the process we have reverted the change c1d058aeb1c6a331a2cc22786ffaab17f7118ccd made in response to #5168, which I'm not completely sure was a good idea. Lua modules have also been adjusted accordingly. Existing lua filters that use `hierarchicalize` will need to be rewritten to use `make_sections`.
2019-08-23	add proofState to settingsList (#5703)	Krystof Beuermann	1	-0/+1

2019-07-19	Change order of ilvl and numId in document.xml (#5647)	Agustín Martín Barbero	1	-3/+3
	Workaround for Word Online shortcomming. Fixes #5645 Also, make list para properties go first. This reordering of properties shouldn't be necessary but it seems Word Online does not understand the docx correctly otherwise.
2019-03-21	Docx writer: Use w:br without attributes for line breaks.	John MacFarlane	1	-4/+1
	We previously added the attribute `type="textWrapping"`, but this causes problems on Word Online. Closes #5377.
2019-03-11	docx writer: avoid extra copy of abstractNum and num elements...	John MacFarlane	1	-1/+9
	...in numbering.xml. This caused pandoc-produced docx files to be uneditable using Word Online. The problem was that recent versions of reference.docx include samples of various kinds of text, including lists. The numering elements for these were getting copied over to the new docx, where they clashed with the autogenerated elements produced by pandoc. This didn't confuse Desktop Word, but it did confuse Word Online. Closes #5358.
2019-03-01	Remove license boilerplate.	John MacFarlane	1	-18/+0
	The haddock module header contains essentially the same information, so the boilerplate is redundant and just one more thing to get out of sync.
2019-02-04	Add missing copyright notices and remove license boilerplate (#5112)	Albert Krewinkel	1	-2/+2
	Quite a few modules were missing copyright notices. This commit adds copyright notices everywhere via haddock module headers. The old license boilerplate comment is redundant with this and has been removed. Update copyright years to 2019. Closes #4592.
2019-01-26	Improve writing metadata for docx, pptx and odt (#5252)	Agustín Martín Barbero	1	-2/+17
	* docx writer: support custom properties. Solves the writer part of #3024. Also supports additional core properties: `subject`, `lang`, `category`, `description`. * odt writer: improve standard properties, including the following core properties: `generator` (Pandoc/VERSION), `description`, `subject`, `keywords`, `initial-creator` (from authors), `creation-date` (actual creation date). Also fix date. * pptx writer: support custom properties. Also supports additional core properties: `subject`, `category`, `description`. * Includes golden tests. * MANUAL: document metadata support for docx, odt, pptx writers
2018-12-31	Replace read with safeRead (#5186)	Mauro Bieg	1	-5/+5
	closes #5180
2018-11-20	Docx writer: Fix bookmarks to headers with long titles.	John MacFarlane	1	-4/+18
	Word has a 40 character limit for bookmark names. In addition, bookmarks must begin with a letter. Since pandoc's auto-generated identifiers may not respect these constraints, some internal links did not work. With this change, pandoc uses a bookmark name based on the SHA1 hash of the identifier when the identifier isn't a legal bookmark name. Closes #5091.
2018-11-19	Fix compiler warning.	John MacFarlane	1	-1/+1

2018-11-19	For bibliography match Div with id 'refs', not class 'references'.	John MacFarlane	1	-2/+2
	This was a mismatch between pandoc's docx, epub, latex, and markdown writers and the behavior of pandoc-citeproc, which actually looks for a div with id 'refs' rather than one with class 'references'.