pandoc - Conversion between markup formats

Age	Commit message (Collapse)	Author	Files	Lines
2021-09-21	Use pretty-simple to format native output.	John MacFarlane	1	-311/+1227
	Previously we used our own homespun formatting. But this produces over-long lines that aren't ideal for diffs in tests. Easier to use something off-the-shelf and standard. Closes #7580. Performance is slower by about a factor of 10, but this isn't really a problem because native isn't suitable as a serialization format. (For serialization you should use json, because the reader is so much faster than native.)
2021-01-07	T.P.Parsing: modify gridTableWith' for headerless tables.	John MacFarlane	1	-33/+5
	If the table lacks a header, the header row should be an empty list. Previously we got a list of empty cells, which caused an empty header to be emitted instead of no header. In LaTeX/PDF output that meant we got a double top line with space between. @tarleb @despres - please let me know if this is problematic for some reason I'm not grasping.
2020-11-14	Markdown reader: don't increment stateNoteNumber for example refs.	John MacFarlane	1	-1/+1
	Background: syntactically, references to example list items can't be distinguished from citations; we only know which they are after we've parsed the whole document (and this is resolved in the `runF` stage). This means that pandoc's calculation of `citationNoteNum` can sometimes be wrong when there are example list references. This commit partially addresses #6836, but only for the case where the example list references refer to list items defined previously in the document.
2020-09-21	Markdown reader: Set citationNoteNum accurately in citations.	John MacFarlane	1	-1/+1
	This also changes stateLastNoteNumber -> stateNoteNumber.
2020-04-15	Adapt to the removal of the RowSpan, ColSpan, RowHeadColumns accessors	despresc	1	-65/+65

2020-04-15	Adapt to the newest Table type, fix some previous adaptation issues	despresc	1	-137/+185
	- Writers.Native is now adapted to the new Table type. - Inline captions should now be conditionally wrapped in a Plain, not a Para block. - The toLegacyTable function now lives in Writers.Shared.
2020-04-15	Implement the new Table type	despresc	1	-65/+157

2018-09-19	Markdown reader: distinguish autolinks in the AST.	John MacFarlane	1	-4/+4
	With this change, autolinks are parsed as Links with the `uri` class. (The same is true for bare links, if the `autolink_bare_uris` extension is enabled.) Email autolinks are parsed as Links with the `email` class. This allows the distinction to be represented in the URI. Formerly the `uri` class was added to autolinks by the HTML writer, but it had to guess what was an autolink and could not distinguish `[http://example.com](http://example.com)` from `<http://example.com>`. It also incorrectly recognized `[pandoc](pandoc)` as an autolink. Now the HTML writer simply passes through the `uri` attribute if it is present, but does not add anything. The Textile writer has been modified so that the `uri` class is not explicitly added for autolinks, even if it is present. Closes #4913.
2018-08-15	Markdown reader: Use "tex" instead of "latex" for raw tex-ish content.	John MacFarlane	1	-4/+5
	We can't always tell if it's LaTeX, ConTeXt, or plain TeX. Better just to use "tex" always. Also changed: ConTeXt writer: now outputs raw "tex" blocks as well as "context". (Closes #969). RST writer: uses ".. raw:: latex" for "tex" content. (RST doesn't support raw context anyway.) Note that if "context" or "latex" specifically is desired, you can still force that in a markdown document by using the raw attribute (see MANUAL.txt): ```{=latex} \foo ``` Note that this change may affect some filters, if they assume that raw tex parsed by the Markdown reader will be RawBlock (Format "latex"). In most cases it should be trivial to modify the filters to accept "tex" as well.
2018-01-17	Markdown reader: don't coalesce adjacent raw LaTeX blocks...	John MacFarlane	1	-1/+2
	if they are separated by a blank line. See lierdakil/pandoc-crossref#160 for motivation.
2018-01-13	LaTeX reader: pass through macro defs in rawLaTeXBlock...	John MacFarlane	1	-0/+1
	even if the `latex_macros` extension is set. This reverts to earlier behavior and is probably safer on the whole, since some macros only modify things in included packages, which pandoc's macro expansion can't modify. Closes #4246.
2017-12-22	`latex_macros` extension changes.	John MacFarlane	1	-1/+0
	Don't pass through macro definitions themselves when `latex_macros` is set. The macros have already been applied. If `latex_macros` is enabled, then `rawLaTeXBlock` in Text.Pandoc.Readers.LaTeX will succeed in parsing a macro definition, and will update pandoc's internal macro map accordingly, but the empty string will be returned. Together with earlier changes, this closes #4179.
2017-12-22	Markdown reader: improved raw tex parsing.	John MacFarlane	1	-3/+1
	+ Preserve original whitespace between blocks. + Recognize `\placeformula` as context.
2017-08-19	Markdown reader: use CommonMark rules for list item nesting.	John MacFarlane	1	-7/+7
	Closes #3511. Previously pandoc used the four-space rule: continuation paragraphs, sublists, and other block level content had to be indented 4 spaces. Now the indentation required is determined by the first line of the list item: to be included in the list item, blocks must be indented to the level of the first non-space content after the list marker. Exception: if are 5 or more spaces after the list marker, then the content is interpreted as an indented code block, and continuation paragraphs must be indented two spaces beyond the end of the list marker. See the CommonMark spec for more details and examples. Documents that adhere to the four-space rule should, in most cases, be parsed the same way by the new rules. Here are some examples of texts that will be parsed differently: - a - b will be parsed as a list item with a sublist; under the four-space rule, it would be a list with two items. - a code Here we have an indented code block under the list item, even though it is only indented six spaces from the margin, because it is four spaces past the point where a continuation paragraph could begin. With the four-space rule, this would be a regular paragraph rather than a code block. - a code Here the code block will start with two spaces, whereas under the four-space rule, it would start with `code`. With the four-space rule, indented code under a list item always must be indented eight spaces from the margin, while the new rules require only that it be indented four spaces from the beginning of the first non-space text after the list marker (here, `a`). This change was motivated by a slew of bug reports from people who expected lists to work differently (#3125, #2367, #2575, #2210, #1990, #1137, #744, #172, #137, #128) and by the growing prevalance of CommonMark (now used by GitHub, for example). Users who want to use the old rules can select the `four_space_rule` extension. * Added `four_space_rule` extension. * Added `Ext_four_space_rule` to `Extensions`. * `Parsing` now exports `gobbleAtMostSpaces`, and the type of `gobbleSpaces` has been changed so that a `ReaderOptions` parameter is not needed.
2017-07-24	LaTeX reader: some improvements in macro parsing.	John MacFarlane	1	-0/+1
	Fixed applyMacros so that it operates on the whole string, not just the first token! Don't remove macro definitions from the output, even if Ext_latex_macros is set, so that macros will be applied. Since they're only applied to math in Markdown, removing the macros can have bad effects. Even for math macros, keeping them should be harmless.
2017-07-07	Rewrote LaTeX reader with proper tokenization.	John MacFarlane	1	-2/+2
	This rewrite is primarily motivated by the need to get macros working properly. A side benefit is that the reader is significantly faster (27s -> 19s in one benchmark, and there is a lot of room for further optimization). We now tokenize the input text, then parse the token stream. Macros modify the token stream, so they should now be effective in any context, including math. Thus, we no longer need the clunky macro processing capacities of texmath. A custom state LaTeXState is used instead of ParserState. This, plus the tokenization, will require some rewriting of the exported functions rawLaTeXInline, inlineCommand, rawLaTeXBlock. * Added Text.Pandoc.Readers.LaTeX.Types (new exported module). Exports Macro, Tok, TokType, Line, Column. [API change] * Text.Pandoc.Parsing: adjusted type of `insertIncludedFile` so it can be used with token parser. * Removed old texmath macro stuff from Parsing. Use Macro from Text.Pandoc.Readers.LaTeX.Types instead. * Removed texmath macro material from Markdown reader. * Changed types for Text.Pandoc.Readers.LaTeX's rawLaTeXInline and rawLaTeXBlock. (Both now return a String, and they are polymorphic in state.) * Added orgMacros field to OrgState. [API change] * Removed readerApplyMacros from ReaderOptions. Now we just check the `latex_macros` reader extension. * Allow `\newcommand\foo{blah}` without braces. Fixes #1390. Fixes #2118. Fixes #3236. Fixes #3779. Fixes #934. Fixes #982.
2017-05-11	Combine grid table parsers	Albert Krewinkel	1	-34/+34
	The grid table parsers for markdown and rst was combined into one single parser, slightly changing parsing behavior of both parsers: - The markdown parser now compactifies block content cell-wise: pure text blocks in cells are now treated as paragraphs only if the cell contains multiple paragraphs, and as plain blocks otherwise. Before, this was true only for single-column tables. - The rst parser now accepts newlines and multiple blocks in header cells. Closes: #3638
2017-02-04	Moved tests/ -> test/.	John MacFarlane	1	-0/+198