pandoc - Conversion between markup formats

Age	Commit message (Collapse)	Author	Files	Lines
2021-02-22	tests: print accurate location if a test fails	Albert Krewinkel	1	-1/+1
	Ensures that tasty-hunit reports the location of the failing test instead of the location of the helper `test` function.
2021-02-22	Text.Pandoc.UTF8: change IO functions to return Text, not String.	John MacFarlane	3	-4/+5
	[API change] This affects `readFile`, `getContents`, `writeFileWith`, `writeFile`, `putStrWith`, `putStr`, `putStrLnWith`, `putStrLn`. `hPutStrWith`, `hPutStr`, `hPutStrLnWith`, `hPutStrLn`, `hGetContents`. This avoids the need to uselessly create a linked list of characters when emiting output.
2021-02-18	Org reader: fix bug in org-ref citation parsing.	Albert Krewinkel	1	-0/+40
	The org-ref syntax allows to list multiple citations separated by comma. This fixes a bug that accepted commas as part of the citation id, so all citation lists were parsed as one single citation. Fixes: #7101
2021-02-16	Rename Text.Pandoc.XMLParser -> Text.Pandoc.XML.Light...	John MacFarlane	1	-1/+2
	..and add new definitions isomorphic to xml-light's, but with Text instead of String. This allows us to keep most of the code in existing readers that use xml-light, but avoid lots of unnecessary allocation. We also add versions of the functions from xml-light's Text.XML.Light.Output and Text.XML.Light.Proc that operate on our modified XML types, and functions that convert xml-light types to our types (since some of our dependencies, like texmath, use xml-light). Update golden tests for docx and pptx. OOXML test: Use `showContent` instead of `ppContent` in `displayDiff`. Docx: Do a manual traversal to unwrap sdt and smartTag. This is faster, and needed to pass the tests. Benchmarks: A = prior to 8ca191604dcd13af27c11d2da225da646ebce6fc (Feb 8) B = as of 8ca191604dcd13af27c11d2da225da646ebce6fc (Feb 8) C = this commit \| Reader \| A \| B \| C \| \| ------- \| ----- \| ------ \| ----- \| \| docbook \| 18 ms \| 12 ms \| 10 ms \| \| opml \| 65 ms \| 62 ms \| 35 ms \| \| jats \| 15 ms \| 11 ms \| 9 ms \| \| docx \| 72 ms \| 69 ms \| 44 ms \| \| odt \| 78 ms \| 41 ms \| 28 ms \| \| epub \| 64 ms \| 61 ms \| 56 ms \| \| fb2 \| 14 ms \| 5 ms \| 4 ms \|
2021-02-13	Org: support task_lists extension	Albert Krewinkel	2	-11/+59
	The tasks lists extension is now supported by the org reader and writer; the extension is turned on by default. Closes: #6336
2021-02-12	Jira: require jira-wiki-markup 1.3.3	Albert Krewinkel	1	-0/+7
	* Modified the Doc parser to skip leading blank lines. This fixes parsing of documents which start with multiple blank lines. (#7095) * Prevent URLs within link aliases to be treated as autolinks. (#6944) Fixes: #7095 Fixes: #6944
2021-02-10	Add new unexported module T.P.XMLParser.	John MacFarlane	1	-0/+1
	This exports functions that uses xml-conduit's parser to produce an xml-light Element or [Content]. This allows existing pandoc code to use a better parser without much modification. The new parser is used in all places where xml-light's parser was previously used. Benchmarks show a significant performance improvement in parsing XML-based formats (especially ODT and FB2). Note that the xml-light types use String, so the conversion from xml-conduit types involves a lot of extra allocation. It would be desirable to avoid that in the future by gradually switching to using xml-conduit directly. This can be done module by module. The new parser also reports errors, which we report when possible. A new constructor PandocXMLError has been added to PandocError in T.P.Error [API change]. Closes #7091, which was the main stimulus. These changes revealed the need for some changes in the tests. The docbook-reader.docbook test lacked definitions for the entities it used; these have been added. And the docx golden tests have been updated, because the new parser does not preserve the order of attributes. Add entity defs to docbook-reader.docbook. Update golden tests for docx.
2021-02-07	Avoid unnecessary use of NoImplicitPrelude pragma (#7089)	Albert Krewinkel	52	-100/+0

2021-02-02	Fixed some compiler warnings in tests.	John MacFarlane	3	-14/+3

2021-02-02	Lua: add module "pandoc.path"	Albert Krewinkel	1	-0/+2
	The module allows to work with file paths in a convenient and platform-independent manner. Closes: #6001 Closes: #6565
2021-02-02	Test suite: a more robust way of testing the executable.	John MacFarlane	3	-66/+45
	Mmny of our tests require running the pandoc executable. This is problematic for a few different reasons. First, cabal-install will sometimes run the test suite after building the library but before building the executable, which means the executable isn't in place for the tests. One can work around that by first building, then building and running the tests, but that's fragile. Second, we have to find the executable. So far, we've done that using a function findPandoc that attempts to locate it relative to the test executable (which can be located using findExecutablePath). But the logic here is delicate and work with every combination of options. To solve both problems, we add an `--emulate` option to the `test-pandoc` executable. When `--emulate` occurs as the first argument passed to `test-pandoc`, the program simply emulates the regular pandoc executable, using the rest of the arguments (after `--emulate`). Thus, test-pandoc --emulate -f markdown -t latex is just like pandoc -f markdown -t latex Since all the work is done by library functions, implementing this emulation just takes a couple lines of code and should be entirely reliable. With this change, we can test the pandoc executable by running the test program itself (locatable using findExecutablePath) with the `--emulate` option. This removes the need for the fragile `findPandoc` step, and it means we can run our integration tests even when we're just building the library, not the executable. Part of this change involved simplifying some complex handling to set environment variables for dynamic library paths. I have tested a build with `--enable-dynamic-executable`, and it works, but further testing may be needed.
2021-01-16	Revert "Markdown reader: support GitHub wiki's internal links (#2923) (#6458)"	John MacFarlane	1	-30/+0
	This reverts commit 6efd3460a776620fdb93812daa4f6831e6c332ce. Since this extension is designed to be used with GitHub markdown (gfm), we need to implement the parser as a commonmark extension (commonmark-extensions), rather than in pandoc's markdown reader. When that is done, we can add it here.
2021-01-16	Markdown reader: support GitHub wiki's internal links (#2923) (#6458)	Gautier DI FOLCO	1	-0/+30
	Canges overview: * Add a `Ext_markdown_github_wikilink` constructor to `Extension` [API change]. * Add the parser `githubWikiLink` in `Text.Pandoc.Readers.Markdown` * Add tests.
2021-01-09	Org reader: allow multiple pipe chars in todo sequences	Albert Krewinkel	1	-0/+10
	Additional pipe chars, used to separate "action" state from "no further action" states, are ignored. E.g., for the following sequence, both `DONE` and `FINISHED` are states with no further action required. #+TODO: UNFINISHED \| DONE \| FINISHED Previously, parsing of the todo sequence failed if multiple pipe chars were included. Closes: #7014
2021-01-08	Update copyright notices for 2021 (#7012)	Albert Krewinkel	30	-30/+30

2021-01-03	Org reader: mark verbatim code with class "verbatim". (#6998)	Dimitri Sabadie	1	-2/+2
	* Replace org-mode’s verbatim from code to codeWith. This adds the `"verbatim"` class so that exporters can apply a specific style on it. For instance, it will be possible for HTML to add a CSS rule for code + verbatim class. * Alter test for org-mode’s verbatim change. See previous commit for further detail on the new implementation.
2021-01-01	Org reader: restructure output of captioned code blocks	Albert Krewinkel	1	-3/+3
	The Div wrapper of code blocks with captions now has the class "captioned-content". The caption itself is added as a Plain block inside a Div of class "caption". This makes it easier to write filters which match on captioned code blocks. Existing filters will need to be updated. Closes: #6977
2020-12-20	LaTeX writer: support colspans and rowspans in tables. (#6950)	Albert Krewinkel	1	-1/+1
	Note that the multirow package is needed for rowspans. It is included in the latex template under a variable, so that it won't be used unless needed for a table.
2020-12-13	Docx writer: keep raw openxml strings verbatim.	Albert Krewinkel	1	-0/+10
	Closes: #6933
2020-12-07	Merge pull request #6922 from jtojnar/db-writer-admonitions	John MacFarlane	1	-0/+66
	Docbook writer: handle admonitions
2020-12-07	Docbook writer: Handle admonition titles from Markdown reader	Jan Tojnar	1	-0/+14
	Docbook reader produces a `Div` with `title` class for `<title>` element within an “admonition” element. Markdown writer then turns this into a fenced div with `title` class attribute. Since fenced divs are block elements, their content is recognized as a paragraph by the Markdown reader. This is an issue for Docbook writer because it would produce an invalid DocBook document from such AST – the `<title>` element can only contain “inline” elements. Let’s handle this invalid special case separately by unwrapping the paragraph before creating the `<title>` element.
2020-12-07	Docbook writer: handle admonitions	Jan Tojnar	1	-0/+52
	Similarly to https://github.com/jgm/pandoc/commit/d6fdfe6f2bba2a8ed25d6c9f11861774001f7a91, we should handle admonitions.
2020-12-05	Org reader: preserve targets of spurious links	Albert Krewinkel	1	-2/+4
	Links with (internal) targets that the reader doesn't know about are converted into emphasized text. Information on the link target is now preserved by wrapping the text in a Span of class `spurious-link`, with an attribute `target` set to the link's original target. This allows to recover and fix broken or unknown links with filters. See: #6916
2020-11-24	HTML reader tests: disable round-trip testing for tables	Albert Krewinkel	1	-11/+3
	Information for cell alignment in a column is not preserved during round-trips.
2020-11-22	Org reader: parse `#+LANGUAGE` into `lang` metadata field	Albert Krewinkel	1	-0/+4
	Fixes: #6845
2020-11-19	JATS writer: support advanced table features	Albert Krewinkel	1	-1/+1

2020-11-18	Replace org #+KEYWORDS with #+keywords	TEC	7	-92/+92
	As of ~2 years ago, lower case keywords became the standard (though they are handled case insensitive, as always): https://code.orgmode.org/bzg/org-mode/commit/13424336a6f30c50952d291e7a82906c1210daf0 Upper case keywords are exclusive to the manual: - https://orgmode.org/list/871s50zn6p.fsf@nicolasgoaziou.fr/ - https://orgmode.org/list/87tuuw3n15.fsf@nicolasgoaziou.fr/
2020-11-14	Markdown writer: default to using ATX headings.	Aner Lucero	2	-3/+7
	Previously we used Setext (underlined) headings by default. The default is now ATX (`##` style). * Add the `--markdown-headings=atx\|setext` option. * Deprecate `--atx-headers`. * Add constructor 'ATXHeadingInLHS` constructor to `LogMessage` [API change]. * Support `markdown-headings` in defaults files. * Document new options in MANUAL. Closes #6662.
2020-10-14	Fix remaining typos in tests	Albert Krewinkel	2	-2/+2
	See: #6738
2020-10-07	Use golden test framework for command tests.	John MacFarlane	1	-27/+59
	This means that `--accept` can be used to update expected output.
2020-10-06	DOCX reader: Allow empty dates in comments and tracked changes (#6726)	Diego Balseiro	2	-0/+9
	For security reasons, some legal firms delete the date from comments and tracked changes. * Make date optional (Maybe) in tracked changes and comments datatypes * Add tests
2020-10-02	Docx writer: better handle list items whose contents are lists (#6522)	Michael Hoffmann	1	-0/+5
	If the first element of a bulleted or ordered list is another list, then that first item will disappear if the target format is docx. This changes the docx writer so that it prepends an empty string for those cases. With this, no items will disappear. Closes #5948.
2020-09-21	Markdown reader: Set citationNoteNum accurately in citations.	John MacFarlane	1	-4/+4
	This also changes stateLastNoteNumber -> stateNoteNumber.
2020-09-15	LaTeX reader: fix improper empty cell filtering (#6689)	Christian Despres	1	-6/+26

2020-09-13	HTML writer: support intermediate table headers	Albert Krewinkel	1	-1/+1
	Closes: #6314
2020-09-13	Fix hlint suggestions, update hlint.yaml (#6680)	Christian Despres	10	-29/+27
	* Fix hlint suggestions, update hlint.yaml Most suggestions were redundant brackets. Some required LambdaCase. The .hlint.yaml file had a small typo, and didn't ignore camelCase suggestions in certain modules.
2020-09-12	HTML writer: render table footers if present	Albert Krewinkel	1	-6/+7
	Part of: #6314
2020-09-12	[API change] Rename Writers.Tables and its contents (#6679)	Christian Despres	1	-64/+66
	Writers.Tables is now Writers.AnnotatedTable. All of the types and functions in it have had the "Ann" removed from them. Now it is expected that the module be imported qualified.
2020-09-10	Support colspans and rowspans in HTML tables (#6644)	Albert Krewinkel	1	-2/+19
	* HTML writer: add support for row headers, colspans, rowspans * Add planet table tests See #6312
2020-09-05	Add Writers.Tables helper functions and types, add tests for those (#6655)	Christian Despres	1	-0/+252
	Add Writers.Tables helper functions and types, add tests for those The Writers.Tables module contains an AnnTable type that is a pandoc Table with added inferred information that should be enough for writers (in particular the HTML writer) to operate on without having to lay out the table themselves. The toAnnTable and fromAnnTable functions in that module convert between AnnTable and Table. In addition to producing an AnnTable with coherent and well-formed annotations, the toAnnTable function also normalizes its input Table like the table builder does. Various tests ensure that toAnnTable normalizes tables exactly like the table builder, and that its annotations are coherent.
2020-08-15	[Latex Reader] Fixing issues with \multirow and \multicolumn table cells (#6608)	Laurent P. René de Cotret	1	-4/+13
	* Added test to replicate (#6596) * Table cell reader not consuming spaces correctly (#6596) * Prevented wrong nesting of \multicolumn and \multirow table cells (#6603) * Parse empty table cells (#6603) * Support full prototype for multirow macro (#6603) Closes #6603
2020-08-07	[Latex Reader] Table cell parser not consuming spaces correctly (#6597)	Laurent P. René de Cotret	1	-0/+7
	* Added test to replicate (#6596) * Table cell reader not consuming spaces correctly (#6596)
2020-07-23	Col-span and row-span in LaTeX reader (#6470)	Laurent P. René de Cotret	1	-3/+55
	Add multirow and multicolumn support in LaTex reader. Partially addresses #6311.
2020-07-08	Escape starting periods in ms writer code blocks	Michael Hoffmann	1	-0/+37
	If a line of ms code block output starts with a period (.), it should be prepended by '\&' so that it is not interpreted as a roff command. Fixes #6505
2020-07-01	Org reader: respect tables-excluding export setting	Albert Krewinkel	1	-0/+8
	Tables can be removed from the final document with the `#+OPTION: \|:nil` export setting.
2020-06-30	Org reader: respect export setting disabling footnotes	Albert Krewinkel	1	-0/+16
	Footnotes can be removed from the final document with the `#+OPTION: f:nil` export setting.
2020-06-30	Org reader: respect export setting which disables entities	Albert Krewinkel	1	-0/+6
	MathML-like entities, e.g., `\alpha`, can be disabled with the `#+OPTION: e:nil` export setting.
2020-06-29	Org reader: keep unknown keyword lines as raw org	Albert Krewinkel	1	-2/+5
	The lines of unknown keywords, like `#+SOMEWORD: value` are no longer read as metadata, but kept as raw `org` blocks. This ensures that more information is retained when round-tripping org-mode files; additionally, this change makes it possible to support non-standard org extensions via filters.
2020-06-29	Org reader: unify keyword handling	Albert Krewinkel	1	-48/+56
	Handling of export settings and other keywords (like `#+LINK`) has been combined and unified.
2020-06-29	Org reader: support LATEX_HEADER_EXTRA and HTML_HEAD_EXTRA settings	Albert Krewinkel	1	-29/+49
	These export settings are treated like their non-extra counterparts, i.e., the values are added to the `header-includes` metadata list.