pandoc - Conversion between markup formats

Age	Commit message (Collapse)	Author	Files	Lines
2021-07-11	Improved parsing of raw LaTeX from Text streams (rawLaTeXParser).	John MacFarlane	2	-11/+37
	We now use source positions from the token stream to tell us how much of the text stream to consume. Getting this to work required a few other changes to make token source positions accurate. Closes #7434.
2021-07-09	RST reader: fix regression with code includes.	John MacFarlane	1	-1/+5
	With the recent changes to include infrastructure, included code blocks were getting an extra newline. Closes #7436. Added regression test.
2021-07-06	Recognize data-external when reading HTML img tags (#7429)	Michael Hoffmann	1	-8/+3
	Preserve all attributes in img tags. If attributes have a `data-` prefix, it will be stripped. In particular, this preserves a `data-external` attribute as an `external` attribute in the pandoc AST.
2021-07-06	Markdown reader: don't try to read contents in self-closing HTML tag.	John MacFarlane	1	-1/+4
	Previously we had problems parsing raw HTML with self-closing tags like `<col/>`. The problem was that pandoc would look for a closing tag to close the markdown contents, but the closing tag had, in effect, already been parsed by `htmlTag`. This fixes the issue described in <https://groups.google.com/d/msgid/pandoc-discuss/297bc662-7841-4423-bcbb-534e99bbba09n%40googlegroups.com>.
2021-07-06	HTML reader: add col, colgroup to 'closes' definitions	John MacFarlane	1	-1/+3

2021-06-22	Fix regression with comment-only YAML metadata blocks.	John MacFarlane	1	-0/+3
	Closes #7400.
2021-06-21	Improve emailAddress in Text.Pandoc.Parsing.	John MacFarlane	1	-1/+21
	Previously the parser would accept characters in domains that are illegal in domains, and this sometimes caused it to gobble bits of the following text. Closes #7398. Note that this change, by itself, caused some txt2tag reader tests to fail. txt2tags allows bare email addresses with a following form query. So, in addition to the change to emailAddress, we modify the txt2tags parser so it can still handle these cases.
2021-06-12	Docx reader: handle absolute URIs in Relationship Target.	John MacFarlane	1	-5/+11
	Closes #7374.
2021-06-05	DocBook reader: Add support for danger element	Jan Tojnar	1	-1/+2
	Added in DocBook 5.2: - https://github.com/docbook/docbook/pull/64 - https://tdg.docbook.org/tdg/5.2/danger.html
2021-06-01	Markdown reader: fix pipe table regression in 2.11.4.	John MacFarlane	1	-1/+1
	Previously pipe tables with empty headers (that is, a header line with all empty cells) would be rendered as headerless tables. This broke in 2.11.4. The fix here is to produce an AST with an empty table head when a pipe table has all empty header cells. Closes #7343.
2021-06-01	LaTeX reader: don't allow optional * on symbol control sequences.	John MacFarlane	1	-2/+4
	Generally we allow optional starred variants of LaTeX commands (since many allow them, and if we don't accept these explicitly, ignoring the star usually gives acceptable results). But we don't want to do this for `\(*\)` and similar cases. Closes #7340.
2021-05-31	Fix regression with commonmark/gfm yaml metdata block parsing.	John MacFarlane	1	-5/+5
	A regression in 2.14 led to the document body being omitted after YAML metadata in some cases. This is now fixed. Closes #7339.
2021-05-30	HTML reader: fix column width regression.	John MacFarlane	1	-1/+1
	Column widths specified with a style attribute were off by a factor of 100 in 2.14. Closes #7334.
2021-05-29	Markdown reader: in rebasePaths, check for both Windows and Posix	John MacFarlane	1	-4/+5
	absolute paths. Previously Windows pandoc was treating `/foo/bar.jpg` as non-absolute.
2021-05-29	In rebasePath, check for absolute paths two ways.	John MacFarlane	1	-1/+4
	isAbsolute from FilePath doesn't return True on Windows for paths beginning with `/`, so we check that separately.
2021-05-28	Support `rebase_relative_paths` for commonmark based formats.	John MacFarlane	1	-1/+3
	(Including `gfm`.)
2021-05-28	Docx reader: Support new table features.	Emily Bourke	3	-49/+163
	* Column spans * Row spans - The spec says that if the `val` attribute is ommitted, its value should be assumed to be `continue`, and that its values are restricted to {`restart`, `continue`}. If the value has any other value, I think it seems reasonable to default it to `continue`. It might cause problems if the spec is extended in the future by adding a third possible value, in which case this would probably give incorrect behaviour, and wouldn't error. * Allow multiple header rows * Include table description in simple caption - The table description element is like alt text for a table (along with the table caption element). It seems like we should include this somewhere, but I’m not 100% sure how – I’m pairing it with the simple caption for the moment. (Should it maybe go in the block caption instead?) * Detect table captions - Check for caption paragraph style /and/ either the simple or complex table field. This means the caption detection fails for captions which don’t contain a field, as in an example doc I added as a test. However, I think it’s better to be too conservative: a missed table caption will still show up as a paragraph next to the table, whereas if I incorrectly classify something else as a table caption it could cause havoc by pairing it up with a table it’s not at all related to, or dropping it entirely. * Update tests and add new ones Partially fixes: #6316
2021-05-28	Docx reader: Read table column widths.	Emily Bourke	2	-3/+4

2021-05-27	rebase_relative_paths: leave empty paths unchanged.	John MacFarlane	1	-1/+1

2021-05-27	rebase_relative_paths extension: don't change fragment paths.	John MacFarlane	1	-1/+2
	We don't want a pure fragment path to be rewritten, since these are used for cross-referencing.
2021-05-27	Modify rebase_reference_links treatment of reference links/images.	John MacFarlane	1	-5/+4
	The directory is based on the file containing the link reference, not the file containing the link, if these differ.
2021-05-27	Add `rebase_relative_paths` extension.	John MacFarlane	1	-7/+29
	- Add manual entry for (non-default) extension `rebase_relative_paths`. - Add constructor `Ext_rebase_relative_paths` to `Extensions` in Text.Pandoc.Extensions [API change]. When enabled, this extension rewrites relative image and link paths by prepending the (relative) directory of the containing file. - Make Markdown reader sensitive to the new extension. - Add tests for #3752. Closes #3752. NB. currently the extension applies to markdown and associated readers but not commonmark/gfm.
2021-05-27	LaTeX reader: improve `\def` and implement `\newif`.	John MacFarlane	2	-15/+63
	- Improve parsing of `\def` macros. We previously set "verbatim mode" even for parsing the initial `\def`; this caused problems for things like ``` \def\foo{\def\bar{BAR}} \foo \bar ``` - Implement `\newif`. - Add tests.
2021-05-25	Allow compilation with base 4.15	Albert Krewinkel	2	-9/+8

2021-05-25	Use haddock-library-1.10.0	Albert Krewinkel	1	-1/+2

2021-05-25	Jira: add support for "smart" links	Albert Krewinkel	1	-0/+2
	Support has been added for the new `[alias\|https://example.com\|smart-card]` syntax.
2021-05-22	Handle relative lengths (e.g. `2*`) in HTML column widths.	John MacFarlane	1	-14/+33
	See <https://www.w3.org/TR/html4/types.html#h-6.6>. "A relative length has the form "i", where "i" is an integer. When allotting space among elements competing for that space, user agents allot pixel and percentage lengths first, then divide up remaining available space among relative lengths. Each relative length receives a portion of the available space that is proportional to the integer preceding the "". The value "" is equivalent to "1". Thus, if 60 pixels of space are available after the user agent allots pixel and percentage space, and the competing relative lengths are 1, 2, and 3, the 1 will be alloted 10 pixels, the 2* will be alloted 20 pixels, and the 3* will be alloted 30 pixels." Closes #4063.
2021-05-22	Revert "HTML reader: simplify col width parsing"	John MacFarlane	1	-9/+13
	This reverts commit f76fe2ab56606528d4710cc6c40bceb5788c3906.
2021-05-22	HTML reader: simplify col width parsing	Albert Krewinkel	1	-13/+9

2021-05-20	DocBook reader: ensure that first and last names are separated.	John MacFarlane	1	-6/+14
	Closes #6541.
2021-05-20	LaTeX reader: More siunitx improvements. Closes #6658.	John MacFarlane	2	-46/+95
	There's still one slight divergence from the siunitx behavior: we get 'kg m/A/s' instead of 'kg m/(A s)'. At the moment I'm not going to worry about that.
2021-05-20	LaTeX/siunitx: fix parsing of `\cubic` etc. See #6658.	John MacFarlane	1	-35/+50

2021-05-20	LaTeX reader sinuitx: fix + sign on ang.	John MacFarlane	1	-3/+6

2021-05-20	LaTeX reader siunitx: add leading 0 to numbers starting with .	John MacFarlane	1	-2/+5

2021-05-20	LaTeX reader: Fix parsing of `+-` in siunitx numbers.	John MacFarlane	1	-4/+7
	See #6658.
2021-05-20	LaTeX reader: support `\pm` in `SI{..}`.	John MacFarlane	1	-1/+3
	Closes #6620.
2021-05-19	LaTeX reader: better support for `\xspace`.	John MacFarlane	2	-14/+19
	Previously we only supported it in inline contexts; now we support it in all contexts, including math. Partially addresses #7299.
2021-05-17	HTML writer: keep attributes from code nested below pre tag.	Albert Krewinkel	1	-1/+12
	If a code block is defined with `<pre><code class="language-x">…</code></pre>`, where the `<pre>` element has no attributes, then the attributes from the `<code>` element are used instead. Any leading `language-` prefix is dropped in the code's class attribute are dropped to improve syntax highlighting. Closes: #7221
2021-05-15	HTML writer: parse `<header>` as a Div	Albert Krewinkel	1	-0/+2
	HTML5 `<header>` elements are treated like `<div>` elements.
2021-05-14	HTML reader: keep h1 tags as normal headers (#7274)	Albert Krewinkel	1	-5/+1
	The tags `<title>` and `<h1 class="title">` often contain the same information, so the latter was dropped from the document. However, as this can lead to loss of information, the heading is now always retained. Use `--shift-heading-level-by=-1` to turn the `<h1>` into the document title, or a filter to restore the previous behavior. Closes: #2293
2021-05-14	HTML reader: don't fail on unmatched closing "script" tag.	Albert Krewinkel	1	-7/+9
	Prevent the reader from crashing if the HTML input contains an unmatched closing `</script>` tag. Fixes: #7282
2021-05-13	Implement curly-brace syntax for Markdown citation keys.	John MacFarlane	2	-7/+7
	The change provides a way to use citation keys that contain special characters not usable with the standard citation key syntax. Example: `@{foo_bar{x}'}` for the key `foo_bar{x}`. Closes #6026. The change requires adding a new parameter to the `citeKey` parser from Text.Pandoc.Parsing [API change]. Markdown reader: recognize @{..} syntax for citatinos. Markdown writer: use @{..} syntax for citations when needed. Update manual with curly-brace syntax for citations. Closes #6026.
2021-05-12	Fix source position reporting for YAML bibliographies.	John MacFarlane	2	-4/+6
	Closes #7273.
2021-05-09	RST reader: seek include files in the directory...	John MacFarlane	1	-1/+3
	...of the file containing the include directive, as RST requires. Closes #6632.
2021-05-09	Org reader: Resolve org includes relative to ...	John MacFarlane	2	-2/+5
	...the directory containing the file containing the INCLUDE directive. Closes #5501.
2021-05-09	RST reader: use `insertIncludedFile` from T.P.Parsing...	John MacFarlane	1	-58/+36
	instead of reproducing much of its code.
2021-05-09	T.P.Parsing: improve include file functions.	John MacFarlane	2	-3/+3
	Remove old `insertIncludedFileF`. [API change] Give `insertIncludedFile` a more general type, allowing it to be used where `insertIncludedFileF` was.
2021-05-09	Change reader types, allowing better tracking of source positions.	John MacFarlane	34	-355/+443
	Previously, when multiple file arguments were provided, pandoc simply concatenated them and passed the contents to the readers, which took a Text argument. As a result, the readers had no way of knowing which file was the source of any particular bit of text. This meant that we couldn't report accurate source positions on errors or include accurate source positions as attributes in the AST. More seriously, it meant that we couldn't resolve resource paths relative to the files containing them (see e.g. #5501, #6632, #6384, #3752). Add Text.Pandoc.Sources (exported module), with a `Sources` type and a `ToSources` class. A `Sources` wraps a list of `(SourcePos, Text)` pairs. [API change] A parsec `Stream` instance is provided for `Sources`. The module also exports versions of parsec's `satisfy` and other Char parsers that track source positions accurately from a `Sources` stream (or any instance of the new `UpdateSourcePos` class). Text.Pandoc.Parsing now exports these modified Char parsers instead of the ones parsec provides. Modified parsers to use a `Sources` as stream [API change]. The readers that previously took a `Text` argument have been modified to take any instance of `ToSources`. So, they may still be used with a `Text`, but they can also be used with a `Sources` object. In Text.Pandoc.Error, modified the constructor PandocParsecError to take a `Sources` rather than a `Text` as first argument, so parse error locations can be accurately reported. T.P.Error: showPos, do not print "-" as source name.
2021-04-29	Docx reader: add handling of vml image objects (jgm#4735) (#7257)	mbrackeantidot	1	-2/+9
	They represent images, the same way as other images in vml format.
2021-04-28	Smarter smart quotes.	John MacFarlane	3	-47/+12
	Treat a leading " with no closing " as a left curly quote. This supports the practice, in fiction, of continuing paragraphs quoting the same speaker without an end quote. It also helps with quotes that break over lines in line blocks. Closes #7216.