pandoc - Conversion between markup formats

Age	Commit message (Collapse)	Author	Files	Lines
2021-08-15	Multimarkdown sub- and superscripts (#5512) (#7188)	OCzarnecki	1	-0/+48
	Added an extension `short_subsuperscripts` which modifies the behavior of `subscript` and `superscript`, allowing subscripts or superscripts containing only alphanumerics to end with a space character (eg. `x^2 = 4` or `H~2 is combustible`). This improves support for multimarkdown. Closes #5512. Add `Ext_short_subsuperscripts` constructor to `Extension` [API change]. This is enabled by default for `markdown_mmd`.
2021-08-10	Tests.Helpers: export testGolden and use it in RTF reader.	John MacFarlane	1	-12/+5
	This gives a diff output on failure.
2021-08-10	Add test for #7488.	John MacFarlane	1	-0/+1

2021-08-10	Add RTF reader.	John MacFarlane	2	-1/+49
	- `rtf` is now supported as an input format as well as output. - New module Text.Pandoc.Readers.RTF (exporting `readRTF`). [API change] Closes #3982.
2021-07-06	Recognize data-external when reading HTML img tags (#7429)	Michael Hoffmann	1	-0/+6
	Preserve all attributes in img tags. If attributes have a `data-` prefix, it will be stripped. In particular, this preserves a `data-external` attribute as an `external` attribute in the pandoc AST.
2021-05-29	Reduce size of cover image in test epub.	John MacFarlane	1	-1/+1

2021-05-28	Docx reader: Support new table features.	Emily Bourke	1	-0/+16
	* Column spans * Row spans - The spec says that if the `val` attribute is ommitted, its value should be assumed to be `continue`, and that its values are restricted to {`restart`, `continue`}. If the value has any other value, I think it seems reasonable to default it to `continue`. It might cause problems if the spec is extended in the future by adding a third possible value, in which case this would probably give incorrect behaviour, and wouldn't error. * Allow multiple header rows * Include table description in simple caption - The table description element is like alt text for a table (along with the table caption element). It seems like we should include this somewhere, but I’m not 100% sure how – I’m pairing it with the simple caption for the moment. (Should it maybe go in the block caption instead?) * Detect table captions - Check for caption paragraph style /and/ either the simple or complex table field. This means the caption detection fails for captions which don’t contain a field, as in an example doc I added as a test. However, I think it’s better to be too conservative: a missed table caption will still show up as a paragraph next to the table, whereas if I incorrectly classify something else as a table caption it could cause havoc by pairing it up with a table it’s not at all related to, or dropping it entirely. * Update tests and add new ones Partially fixes: #6316
2021-05-25	Jira: add support for "smart" links	Albert Krewinkel	1	-0/+8
	Support has been added for the new `[alias\|https://example.com\|smart-card]` syntax.
2021-05-24	MediaBag improvements.	John MacFarlane	1	-5/+5
	In the current dev version, we will sometimes add a version of an image with a hashed name, keeping the original version with the original name, which would leave to undesirable duplication. This change separates the media's filename from the media's canonical name (which is the path of the link in the document itself). Filenames are based on SHA1 hashes and assigned automatically. In Text.Pandoc.MediaBag: - Export MediaItem type [API change]. - Change MediaBag type to a map from Text to MediaItem [API change]. - `lookupMedia` now returns a `MediaItem` [API change]. - Change `insertMedia` so it sets the `mediaPath` to a filename based on the SHA1 hash of the contents. This will be used when contents are extracted. In Text.Pandoc.Class.PandocMonad: - Remove `fetchMediaResource` [API change]. Lua MediaBag module has been changed minimally. In the future it would be better, probably, to give Lua access to the full MediaItem type.
2021-05-17	HTML writer: keep attributes from code nested below pre tag.	Albert Krewinkel	1	-0/+11
	If a code block is defined with `<pre><code class="language-x">…</code></pre>`, where the `<pre>` element has no attributes, then the attributes from the `<code>` element are used instead. Any leading `language-` prefix is dropped in the code's class attribute are dropped to improve syntax highlighting. Closes: #7221
2021-05-15	HTML writer: parse `<header>` as a Div	Albert Krewinkel	1	-5/+9
	HTML5 `<header>` elements are treated like `<div>` elements.
2021-05-09	Change reader types, allowing better tracking of source positions.	John MacFarlane	1	-2/+2
	Previously, when multiple file arguments were provided, pandoc simply concatenated them and passed the contents to the readers, which took a Text argument. As a result, the readers had no way of knowing which file was the source of any particular bit of text. This meant that we couldn't report accurate source positions on errors or include accurate source positions as attributes in the AST. More seriously, it meant that we couldn't resolve resource paths relative to the files containing them (see e.g. #5501, #6632, #6384, #3752). Add Text.Pandoc.Sources (exported module), with a `Sources` type and a `ToSources` class. A `Sources` wraps a list of `(SourcePos, Text)` pairs. [API change] A parsec `Stream` instance is provided for `Sources`. The module also exports versions of parsec's `satisfy` and other Char parsers that track source positions accurately from a `Sources` stream (or any instance of the new `UpdateSourcePos` class). Text.Pandoc.Parsing now exports these modified Char parsers instead of the ones parsec provides. Modified parsers to use a `Sources` as stream [API change]. The readers that previously took a `Text` argument have been modified to take any instance of `ToSources`. So, they may still be used with a `Text`, but they can also be used with a `Sources` object. In Text.Pandoc.Error, modified the constructor PandocParsecError to take a `Sources` rather than a `Text` as first argument, so parse error locations can be accurately reported. T.P.Error: showPos, do not print "-" as source name.
2021-04-29	Docx reader: add handling of vml image objects (jgm#4735) (#7257)	mbrackeantidot	1	-0/+4
	They represent images, the same way as other images in vml format.
2021-04-28	Smarter smart quotes.	John MacFarlane	1	-1/+1
	Treat a leading " with no closing " as a left curly quote. This supports the practice, in fiction, of continuing paragraphs quoting the same speaker without an end quote. It also helps with quotes that break over lines in line blocks. Closes #7216.
2021-03-31	Treat tabs as spaces in ODT Reader. (#7185)	niszet	1	-0/+1

2021-03-13	Jira reader: mark divs created from panels with class "panel".	Albert Krewinkel	1	-0/+6
	Closes: tarleb/jira-wiki-markup#2
2021-02-28	Remove superfluous imports.	John MacFarlane	1	-2/+0

2021-02-28	T.P.Readers.LaTeX: Don't export tokenize, untokenize.	John MacFarlane	1	-16/+1
	[API change] These were only exported for testing, which seems the wrong thing to do. They don't belong in the public API and are not really usable as they are, without access to the Tok type which is not exported. Removed the tokenize/untokenize roundtrip test. We put a quickcheck property in the comments which may be used when this code is touched (if it is).
2021-02-22	Text.Pandoc.UTF8: change IO functions to return Text, not String.	John MacFarlane	1	-1/+1
	[API change] This affects `readFile`, `getContents`, `writeFileWith`, `writeFile`, `putStrWith`, `putStr`, `putStrLnWith`, `putStrLn`. `hPutStrWith`, `hPutStr`, `hPutStrLnWith`, `hPutStrLn`, `hGetContents`. This avoids the need to uselessly create a linked list of characters when emiting output.
2021-02-18	Org reader: fix bug in org-ref citation parsing.	Albert Krewinkel	1	-0/+40
	The org-ref syntax allows to list multiple citations separated by comma. This fixes a bug that accepted commas as part of the citation id, so all citation lists were parsed as one single citation. Fixes: #7101
2021-02-13	Org: support task_lists extension	Albert Krewinkel	1	-0/+13
	The tasks lists extension is now supported by the org reader and writer; the extension is turned on by default. Closes: #6336
2021-02-12	Jira: require jira-wiki-markup 1.3.3	Albert Krewinkel	1	-0/+7
	* Modified the Doc parser to skip leading blank lines. This fixes parsing of documents which start with multiple blank lines. (#7095) * Prevent URLs within link aliases to be treated as autolinks. (#6944) Fixes: #7095 Fixes: #6944
2021-02-10	Add new unexported module T.P.XMLParser.	John MacFarlane	1	-0/+1
	This exports functions that uses xml-conduit's parser to produce an xml-light Element or [Content]. This allows existing pandoc code to use a better parser without much modification. The new parser is used in all places where xml-light's parser was previously used. Benchmarks show a significant performance improvement in parsing XML-based formats (especially ODT and FB2). Note that the xml-light types use String, so the conversion from xml-conduit types involves a lot of extra allocation. It would be desirable to avoid that in the future by gradually switching to using xml-conduit directly. This can be done module by module. The new parser also reports errors, which we report when possible. A new constructor PandocXMLError has been added to PandocError in T.P.Error [API change]. Closes #7091, which was the main stimulus. These changes revealed the need for some changes in the tests. The docbook-reader.docbook test lacked definitions for the entities it used; these have been added. And the docx golden tests have been updated, because the new parser does not preserve the order of attributes. Add entity defs to docbook-reader.docbook. Update golden tests for docx.
2021-02-07	Avoid unnecessary use of NoImplicitPrelude pragma (#7089)	Albert Krewinkel	28	-54/+0

2021-01-16	Revert "Markdown reader: support GitHub wiki's internal links (#2923) (#6458)"	John MacFarlane	1	-30/+0
	This reverts commit 6efd3460a776620fdb93812daa4f6831e6c332ce. Since this extension is designed to be used with GitHub markdown (gfm), we need to implement the parser as a commonmark extension (commonmark-extensions), rather than in pandoc's markdown reader. When that is done, we can add it here.
2021-01-16	Markdown reader: support GitHub wiki's internal links (#2923) (#6458)	Gautier DI FOLCO	1	-0/+30
	Canges overview: * Add a `Ext_markdown_github_wikilink` constructor to `Extension` [API change]. * Add the parser `githubWikiLink` in `Text.Pandoc.Readers.Markdown` * Add tests.
2021-01-09	Org reader: allow multiple pipe chars in todo sequences	Albert Krewinkel	1	-0/+10
	Additional pipe chars, used to separate "action" state from "no further action" states, are ignored. E.g., for the following sequence, both `DONE` and `FINISHED` are states with no further action required. #+TODO: UNFINISHED \| DONE \| FINISHED Previously, parsing of the todo sequence failed if multiple pipe chars were included. Closes: #7014
2021-01-08	Update copyright notices for 2021 (#7012)	Albert Krewinkel	24	-24/+24

2021-01-03	Org reader: mark verbatim code with class "verbatim". (#6998)	Dimitri Sabadie	1	-2/+2
	* Replace org-mode’s verbatim from code to codeWith. This adds the `"verbatim"` class so that exporters can apply a specific style on it. For instance, it will be possible for HTML to add a CSS rule for code + verbatim class. * Alter test for org-mode’s verbatim change. See previous commit for further detail on the new implementation.
2021-01-01	Org reader: restructure output of captioned code blocks	Albert Krewinkel	1	-3/+3
	The Div wrapper of code blocks with captions now has the class "captioned-content". The caption itself is added as a Plain block inside a Div of class "caption". This makes it easier to write filters which match on captioned code blocks. Existing filters will need to be updated. Closes: #6977
2020-12-05	Org reader: preserve targets of spurious links	Albert Krewinkel	1	-2/+4
	Links with (internal) targets that the reader doesn't know about are converted into emphasized text. Information on the link target is now preserved by wrapping the text in a Span of class `spurious-link`, with an attribute `target` set to the link's original target. This allows to recover and fix broken or unknown links with filters. See: #6916
2020-11-24	HTML reader tests: disable round-trip testing for tables	Albert Krewinkel	1	-11/+3
	Information for cell alignment in a column is not preserved during round-trips.
2020-11-22	Org reader: parse `#+LANGUAGE` into `lang` metadata field	Albert Krewinkel	1	-0/+4
	Fixes: #6845
2020-11-18	Replace org #+KEYWORDS with #+keywords	TEC	7	-92/+92
	As of ~2 years ago, lower case keywords became the standard (though they are handled case insensitive, as always): https://code.orgmode.org/bzg/org-mode/commit/13424336a6f30c50952d291e7a82906c1210daf0 Upper case keywords are exclusive to the manual: - https://orgmode.org/list/871s50zn6p.fsf@nicolasgoaziou.fr/ - https://orgmode.org/list/87tuuw3n15.fsf@nicolasgoaziou.fr/
2020-10-14	Fix remaining typos in tests	Albert Krewinkel	1	-1/+1
	See: #6738
2020-10-06	DOCX reader: Allow empty dates in comments and tracked changes (#6726)	Diego Balseiro	1	-0/+4
	For security reasons, some legal firms delete the date from comments and tracked changes. * Make date optional (Maybe) in tracked changes and comments datatypes * Add tests
2020-09-21	Markdown reader: Set citationNoteNum accurately in citations.	John MacFarlane	1	-4/+4
	This also changes stateLastNoteNumber -> stateNoteNumber.
2020-09-15	LaTeX reader: fix improper empty cell filtering (#6689)	Christian Despres	1	-6/+26

2020-09-13	Fix hlint suggestions, update hlint.yaml (#6680)	Christian Despres	6	-6/+6
	* Fix hlint suggestions, update hlint.yaml Most suggestions were redundant brackets. Some required LambdaCase. The .hlint.yaml file had a small typo, and didn't ignore camelCase suggestions in certain modules.
2020-08-15	[Latex Reader] Fixing issues with \multirow and \multicolumn table cells (#6608)	Laurent P. René de Cotret	1	-4/+13
	* Added test to replicate (#6596) * Table cell reader not consuming spaces correctly (#6596) * Prevented wrong nesting of \multicolumn and \multirow table cells (#6603) * Parse empty table cells (#6603) * Support full prototype for multirow macro (#6603) Closes #6603
2020-08-07	[Latex Reader] Table cell parser not consuming spaces correctly (#6597)	Laurent P. René de Cotret	1	-0/+7
	* Added test to replicate (#6596) * Table cell reader not consuming spaces correctly (#6596)
2020-07-23	Col-span and row-span in LaTeX reader (#6470)	Laurent P. René de Cotret	1	-3/+55
	Add multirow and multicolumn support in LaTex reader. Partially addresses #6311.
2020-07-01	Org reader: respect tables-excluding export setting	Albert Krewinkel	1	-0/+8
	Tables can be removed from the final document with the `#+OPTION: \|:nil` export setting.
2020-06-30	Org reader: respect export setting disabling footnotes	Albert Krewinkel	1	-0/+16
	Footnotes can be removed from the final document with the `#+OPTION: f:nil` export setting.
2020-06-30	Org reader: respect export setting which disables entities	Albert Krewinkel	1	-0/+6
	MathML-like entities, e.g., `\alpha`, can be disabled with the `#+OPTION: e:nil` export setting.
2020-06-29	Org reader: keep unknown keyword lines as raw org	Albert Krewinkel	1	-2/+5
	The lines of unknown keywords, like `#+SOMEWORD: value` are no longer read as metadata, but kept as raw `org` blocks. This ensures that more information is retained when round-tripping org-mode files; additionally, this change makes it possible to support non-standard org extensions via filters.
2020-06-29	Org reader: unify keyword handling	Albert Krewinkel	1	-48/+56
	Handling of export settings and other keywords (like `#+LINK`) has been combined and unified.
2020-06-29	Org reader: support LATEX_HEADER_EXTRA and HTML_HEAD_EXTRA settings	Albert Krewinkel	1	-29/+49
	These export settings are treated like their non-extra counterparts, i.e., the values are added to the `header-includes` metadata list.
2020-06-29	Org reader: allow multiple #+SUBTITLE export settings	Albert Krewinkel	1	-0/+7
	The values of all lines are read as inlines and collected in the `subtitle` metadata field.
2020-06-28	JATS reader: parse abstract element into metadata field of same name (#6482)	Albert Krewinkel	1	-0/+17
	Closes: #6480