pandoc - Conversion between markup formats

Age	Commit message (Collapse)	Author	Files	Lines
2021-02-10	Add new unexported module T.P.XMLParser.	John MacFarlane	8	-62/+109
	This exports functions that uses xml-conduit's parser to produce an xml-light Element or [Content]. This allows existing pandoc code to use a better parser without much modification. The new parser is used in all places where xml-light's parser was previously used. Benchmarks show a significant performance improvement in parsing XML-based formats (especially ODT and FB2). Note that the xml-light types use String, so the conversion from xml-conduit types involves a lot of extra allocation. It would be desirable to avoid that in the future by gradually switching to using xml-conduit directly. This can be done module by module. The new parser also reports errors, which we report when possible. A new constructor PandocXMLError has been added to PandocError in T.P.Error [API change]. Closes #7091, which was the main stimulus. These changes revealed the need for some changes in the tests. The docbook-reader.docbook test lacked definitions for the entities it used; these have been added. And the docx golden tests have been updated, because the new parser does not preserve the order of attributes. Add entity defs to docbook-reader.docbook. Update golden tests for docx.
2021-02-08	ODT reader: finer-grained errors on parse failure.	John MacFarlane	1	-21/+18
	See #7091.
2021-02-08	ODT reader: give more information if zip can't be unpacked.	John MacFarlane	1	-1/+4

2021-02-08	DocBook reader: Support informalfigure (#7079)	Nils Carlson	1	-1/+3
	Add support for informalfigure.
2021-02-06	Markdown reader: improved handling of mmd link attributes in references.	John MacFarlane	1	-0/+2
	Previously they only worked for links that had titles. Closes #7080.
2021-01-31	RST reader: fix handling of header in CSV tables.	John MacFarlane	1	-4/+5
	The interpretation of this line is not affected by the delim option. Closes #7064.
2021-01-26	Clean up BibTeX parsing.	John MacFarlane	1	-0/+18
	Previously there was a messy code path that gave strange results in some cases, not passing through raw tex but trying to extract a string content. This was an artefact of trying to handle some special bibtex-specific commands in the BibTeX reader. Now we just handle these in the LaTeX reader and simplify parsing in the BibTeX reader. This does mean that more raw tex will be passed through (and currently this is not sensitive to the `raw_tex` extension; this should be fixed). Closes #7049.
2021-01-16	Revert "Markdown reader: support GitHub wiki's internal links (#2923) (#6458)"	John MacFarlane	1	-25/+0
	This reverts commit 6efd3460a776620fdb93812daa4f6831e6c332ce. Since this extension is designed to be used with GitHub markdown (gfm), we need to implement the parser as a commonmark extension (commonmark-extensions), rather than in pandoc's markdown reader. When that is done, we can add it here.
2021-01-16	Markdown reader: support GitHub wiki's internal links (#2923) (#6458)	Gautier DI FOLCO	1	-0/+25
	Canges overview: * Add a `Ext_markdown_github_wikilink` constructor to `Extension` [API change]. * Add the parser `githubWikiLink` in `Text.Pandoc.Readers.Markdown` * Add tests.
2021-01-09	Org reader: allow multiple pipe chars in todo sequences	Albert Krewinkel	1	-4/+10
	Additional pipe chars, used to separate "action" state from "no further action" states, are ignored. E.g., for the following sequence, both `DONE` and `FINISHED` are states with no further action required. #+TODO: UNFINISHED \| DONE \| FINISHED Previously, parsing of the todo sequence failed if multiple pipe chars were included. Closes: #7014
2021-01-08	Update copyright notices for 2021 (#7012)	Albert Krewinkel	35	-36/+36

2021-01-04	LaTeX reader: handle filecontents environment.	John MacFarlane	2	-6/+28
	Closes #7003.
2021-01-03	Org reader: mark verbatim code with class "verbatim". (#6998)	Dimitri Sabadie	1	-1/+1
	* Replace org-mode’s verbatim from code to codeWith. This adds the `"verbatim"` class so that exporters can apply a specific style on it. For instance, it will be possible for HTML to add a CSS rule for code + verbatim class. * Alter test for org-mode’s verbatim change. See previous commit for further detail on the new implementation.
2021-01-02	LaTeX reader: put contents of unknown environments in a Div...	John MacFarlane	1	-1/+1
	when `raw_tex` is not enabled. (When `raw_tex` is enabled, the whole environment is parsed as a raw block.) The class name is the name of the environment. Previously, we just included the contents without the surrounding Div, but having a record of the environment's boundaries and name can be useful. Closes #6997.
2021-01-01	Org reader: restructure output of captioned code blocks	Albert Krewinkel	1	-14/+12
	The Div wrapper of code blocks with captions now has the class "captioned-content". The caption itself is added as a Plain block inside a Div of class "caption". This makes it easier to write filters which match on captioned code blocks. Existing filters will need to be updated. Closes: #6977
2020-12-30	Mediawiki reader: allow space around storng/emph delimiters.	John MacFarlane	1	-6/+4
	Closes #6993.
2020-12-30	Hlint fixes	John MacFarlane	1	-1/+1

2020-12-28	HTML reader: use renderTags' from Text.Pandoc.Shared.	Albert Krewinkel	1	-25/+3
	The `renderTags'` function was duplicated when the reader used `Text` as its string type. The duplication is no longer necessary. A side effect of this change is that empty `<col>` elements are written as self-closing tags in raw HTML blocks.
2020-12-10	HTML reader: pay attention to lang attributes on body.	John MacFarlane	1	-3/+6
	These (as well as lang attributes on html) should update lang in metadata. See #6938.
2020-12-10	HTML reader: retain attribute prefixes and avoid duplicates.	John MacFarlane	2	-24/+24
	Previously we stripped attribute prefixes, reading `xml:lang` as `lang` for example. This resulted in two duplicate `lang` attributes when `xml:lang` and `lang` were both used. This commit causes the prefixes to be retained, and also avoids invald duplicate attributes. Closes #6938.
2020-12-10	Add sourcepos extension for commonmarke	John MacFarlane	1	-5/+9
	* Add `Ext_sourcepos` constructor for `Extension`. * Add `sourcepos` extension (only for commonmark). * Bump to 2.11.3 With the `sourcepos` extension set set, `data-pos` attributes are added to the AST by the commonmark reader. No other readers are affected. The `data-pos` attributes are put on elements that accept attributes; for other elements, an enlosing Div or Span is added to hold the attributes. Closes #4565.
2020-12-10	Commonmark reader: refactor specFor, set input name to "".	John MacFarlane	1	-2/+8

2020-12-07	Dokuwiki reader: handle unknown interwiki links better.	John MacFarlane	1	-1/+1
	DokuWiki lets the user define his own Interwiki links. Previously pandoc reacted to these by emitting a google search link, which is not helpful. Instead, we now just emit the full URL including the wikilink prefix, e.g. `faquk>FAQ-mathml`. This at least gives users the ability to modify the links using filters. Closes #6932.
2020-12-05	Org reader: preserve targets of spurious links	Albert Krewinkel	1	-5/+4
	Links with (internal) targets that the reader doesn't know about are converted into emphasized text. Information on the link target is now preserved by wrapping the text in a Span of class `spurious-link`, with an attribute `target` set to the link's original target. This allows to recover and fix broken or unknown links with filters. See: #6916
2020-12-05	LaTeX reader: don't apply theorem default styling to a figure inside.	John MacFarlane	1	-0/+1
	If we put an image in italics, then when rendering to Markdown we no longer get an implicit figure. Closes #6925.
2020-11-29	LaTeX reader: don't parse `\rule` with width 0 as horizontal rule.	John MacFarlane	1	-1/+11

2020-11-28	Fix a tiny Typo in the CSV reader module	Tassos Manganaris	1	-1/+1
	Header comment in the CSV reader module says "RST" instead of "CSV".
2020-11-27	HTML reader tests: improve test coverage of new features	Albert Krewinkel	1	-1/+2

2020-11-27	HTML reader: support body headers, row head columns	Albert Krewinkel	1	-41/+61
	Closes: #6312
2020-11-26	LaTeX reader: preserve center environment (#6852)	Igor Pashev	1	-1/+1
	The contents of the `center` environment are put in a `Div` with class `center`.
2020-11-26	HTML reader: improve support for table headers, footer, attributes	Albert Krewinkel	3	-102/+223
	- `<tfoot>` elements are no longer added to the table body but used as table footer. - Separate `<tbody>` elements are no longer combined into one. - Attributes on `<thead>`, `<tbody>`, `<th>`/`<td>`, and `<tfoot>` elements are preserved.
2020-11-26	HTML reader: allow finer grained options for tag omission	Albert Krewinkel	3	-13/+26

2020-11-25	HTML reader: simplify list attribute handling	Albert Krewinkel	1	-18/+9
	This removes the `foldOrElse` function from the internal Text.Pandoc.CSS module.
2020-11-24	HTML reader: support row or column-spanning table cells	Albert Krewinkel	2	-28/+26

2020-11-24	HTML reader: support blocks in caption	Albert Krewinkel	2	-6/+6

2020-11-24	HTML reader: extract table parsing into separate module	Albert Krewinkel	3	-95/+140

2020-11-23	HTML reader: extract submodules	Albert Krewinkel	4	-239/+342
	Reducing module size should reduce memory use during compilation. This is preparatory work to tackle support for more table features.
2020-11-22	Org reader: parse `#+LANGUAGE` into `lang` metadata field	Albert Krewinkel	1	-0/+2
	Fixes: #6845
2020-11-21	LaTeX reader: more robust parsing of bracketed options.	John MacFarlane	1	-3/+8
	Improves on 9a40976. Closes #6873.
2020-11-20	DocBook reader: Table text width support (#6791)	Nils Carlson	1	-2/+12
	Table width in relation to text width is not natively supported by docbook but is by the docbook fo stylesheets through an XML processing instruction, <?dbfo table-width="50%"?> . Implement support for this instruction in the DocBook reader.
2020-11-20	Improve LaTeX option parsing...	John MacFarlane	1	-1/+3
	in cases where we run into trouble parsing inlines til the closing `]`, e.g. quotes, we return a plain string with the option contents. Previously we mistakenly included the brackets in this string. Closes #6869.
2020-11-19	DocBook reader: drop period in formalpara title...	John MacFarlane	1	-2/+2
	...and put it in a div with class `formalpara-title`, so that people can reformat with filters. Closes #6562. Thanks to rdmuller.
2020-11-18	Man reader: improve handling of .IP.	John MacFarlane	1	-5/+19
	We now better handle `.IP` when it is used with non-bullet, non-numbered lists, creating a definition list. We also skip blank lines like groff itself. Closes #6858.
2020-11-18	Replace org #+KEYWORDS with #+keywords	TEC	1	-11/+11
	As of ~2 years ago, lower case keywords became the standard (though they are handled case insensitive, as always): https://code.orgmode.org/bzg/org-mode/commit/13424336a6f30c50952d291e7a82906c1210daf0 Upper case keywords are exclusive to the manual: - https://orgmode.org/list/871s50zn6p.fsf@nicolasgoaziou.fr/ - https://orgmode.org/list/87tuuw3n15.fsf@nicolasgoaziou.fr/
2020-11-17	Bibtex reader: fall back on en-US if locale for LANG not found.	John MacFarlane	1	-1/+4
	This reproduces earlier pandoc-citeproc behavior. Closes jgm/citeproc#26.
2020-11-17	Markdown reader: fix regression with example list references.	John MacFarlane	1	-1/+5
	This affects example list references followed by dashes. Introduced by commit b8d17f7. Closes #6855.
2020-11-16	Move getNextNumber from Readers.LaTeX to Readers.LaTeX.Parsing.	John MacFarlane	2	-26/+26

2020-11-16	Improve fix to siunitx numbers with minus.	John MacFarlane	1	-1/+1
	- use real minus sign - use tests contributed by Igor Pashev.
2020-11-16	LaTeX reader: Fix negative numbers in siunitx commands.	John MacFarlane	1	-2/+4
	The commit a157e1a broke negative numbers, e.g. `\SI{-33}{\celcius}` or `\num{-3}`. This fixes the regression.
2020-11-15	Markdown reader: fix detection of locators following in-text citations.	John MacFarlane	1	-27/+30
	Prevously, if we had `@foo [p. 33; @bar]`, the `p. 33` would be incorrectly parsed as a prefix of `@bar` rather than a suffix of `@foo`.