pandoc - Conversion between markup formats

Age	Commit message (Collapse)	Author	Files	Lines
2018-01-02	Docx reader: Allow for insertion/deletion of paragraphs.	Jesse Rosenthal	1	-4/+44
	If the paragraph has a deleted or inserted paragraph break (depending on the track-changes setting) we hold onto it until the next paragraph. This takes care of accept and reject. For this we introduce a new state which holds the ils from the previous para if necessary. For `--track-changes=all`, we add an empty span with class `paragraph-insertion`/`paragraph-deletion` at the end of the paragraph prior to the break to be inserted or deleted. Closes #3927.
2018-01-02	Docx reader: Parse track changes info into paragraph props.	Jesse Rosenthal	1	-15/+27
	This will tell us whether a paragraph break was inserted or deleted. We add a generalized track-changes parsing function, and use it in `elemToParPart` as well.
2018-01-02	Docx reader: Extract tracked changes type from parpart.	Jesse Rosenthal	2	-6/+19
	We're going to want to use it elsewhere as well, in upcoming tracking of paragraph insertion/deletion.
2018-01-01	Markdown reader: rewrite inlinesInBalancedBrackets.	John MacFarlane	1	-19/+13
	The rewrite is much more direct, avoiding parseFromString. And it performs significantly better; unfortunately, parsing time still increases exponentially. See #1735.
2017-12-31	Docx reader: minor cleanup.	Jesse Rosenthal	1	-1/+2

2017-12-31	Docx Reader: Combine adjacent anchors.	Jesse Rosenthal	1	-20/+47
	There isn't any reason to have numberous anchors in the same place, since we can't maintain docx's non-nesting overlapping. So we reduce to a single anchor, and have all links pointing to one of the overlapping anchors point to that one. This changes the behavior from commit e90c714c7 slightly (use the first anchor instead of the last) so we change the expected test result. Note that because this produces a state that has to be set after every invocation of `parPartToInlines`, we make the main function into a primed subfunction `parPartToInlines'`, and make `parPartToInlines` a wrapper around that.
2017-12-30	Markdown reader: Avoid parsing raw tex unless \ + letter seen.	John MacFarlane	1	-1/+2
	This seems to help with the performance problem, #4216.
2017-12-30	LaTeX reader: Simplified a check for raw tex command.	John MacFarlane	1	-2/+2

2017-12-30	Docx reader: Remove unused anchors.	Jesse Rosenthal	1	-5/+27
	Docx produces a lot of anchors with nothing pointing to them -- we now remove these to produce cleaner output. Note that this has to occur at the end of the process because it has to follow link/anchor rewriting. Closes #3679.
2017-12-31	Muse reader: automatically translate #cover into #cover-image	Alexander Krotov	1	-1/+3
	Amusewiki uses #cover directive to specify cover image.
2017-12-30	Docx reader: Read multiple children of w:sdtContents`	Jesse Rosenthal	1	-5/+9
	Previously we had only read the first child of an sdtContents tag. Now we replace sdt with all children of the sdtContents tag. This changes the expected test result of our nested_anchors test, since now we read docx's generated TOCs.
2017-12-28	LaTeX reader: be more tolerant of `&` character.	John MacFarlane	1	-1/+1
	This allows us to parse unknown tabular environments as raw LaTeX. Closes #4208.
2017-12-28	Org reader: support minlevel option for includes	Albert Krewinkel	1	-14/+37
	The level of headers in included files can be shifted to a higher level by specifying a minimum header level via the `:minlevel` parameter. E.g. `#+include: "tour.org" :minlevel 1` will shift the headers in tour.org such that the topmost headers become level 1 headers. Fixes: #4154
2017-12-27	Fix warning.	John MacFarlane	1	-2/+1

2017-12-27	Small improvement to figcaption parsing. #4184.	John MacFarlane	1	-2/+0

2017-12-27	Merge pull request #4184 from mb21/html-reader-figcaption	John MacFarlane	1	-4/+7
	HTML Reader: be more forgiving about figcaption
2017-12-27	HTML reader: parse div with class `line-block` as LineBlock.	John MacFarlane	1	-1/+13
	See #4162.
2017-12-27	Docx Reader: preprocess Document body to unwrap "w:sdt" elements	Jesse Rosenthal	1	-1/+31
	We walk through the document (using the zipper in Text.XML.Light.Cursor) to unwrap the sdt tags before doing the rest of the parsing of the document. Note that the function is generically named `walkDocument` in case we need to do any further preprocessing in the future. Closes #4190
2017-12-26	LaTeX reader: support `\foreignlanguage` from babel.	John MacFarlane	1	-0/+30

2017-12-24	RST reader: allow empty list items (as docutils does).	John MacFarlane	1	-2/+2
	Closes #4193.
2017-12-23	JATS reader: handle author-notes.	John MacFarlane	1	-5/+6

2017-12-23	JATS reader: code refactoring.	John MacFarlane	1	-63/+48

2017-12-23	JATS reader: include institute metadata.	John MacFarlane	1	-2/+11

2017-12-23	JATS reader: process author metadata.	John MacFarlane	1	-5/+27

2017-12-23	JATS reader: better citation handling.	John MacFarlane	1	-3/+79
	We now convert a ref-list element into a list of citations in metadata, suitable for use with pandoc-citeproc. We also convert references to pandoc citation elements. Thus a JATS article with embedded bibliographic information can be processed with pandoc and pandoc-citeproc to produce a formatted bibliography.
2017-12-23	HTML Reader: be more forgiving about figcaption	mb21	1	-4/+7
	fixes #4183
2017-12-22	Merge pull request #4189 from mb21/export-blocksToInlines	John MacFarlane	2	-3/+3
	API change: export blocksToInlines' from Text.Pandoc.Shared
2017-12-22	`latex_macros` extension changes.	John MacFarlane	2	-5/+11
	Don't pass through macro definitions themselves when `latex_macros` is set. The macros have already been applied. If `latex_macros` is enabled, then `rawLaTeXBlock` in Text.Pandoc.Readers.LaTeX will succeed in parsing a macro definition, and will update pandoc's internal macro map accordingly, but the empty string will be returned. Together with earlier changes, this closes #4179.
2017-12-22	Markdown reader: improved raw tex parsing.	John MacFarlane	1	-6/+9
	+ Preserve original whitespace between blocks. + Recognize `\placeformula` as context.
2017-12-22	LaTeX reader: use applyMacros in rawLaTeXBlock, rawLaTeXInline.	John MacFarlane	1	-2/+5

2017-12-22	LaTeX reader: Refactored inlineCommand.	John MacFarlane	1	-24/+11

2017-12-22	API change: export blocksToInlines' from Text.Pandoc.Shared	mb21	2	-3/+3

2017-12-21	Merge pull request #4177 from stencila/jats-xml-reader	John MacFarlane	1	-0/+404
	Add Basic JATS reader based on DocBook reader
2017-12-22	Improve support for code language in JATS	Hamish Mackenzie	1	-2/+19

2017-12-21	LaTeX reader: Fixed subtle bug in tokenizer.	John MacFarlane	1	-2/+3
	Material following `^^` was dropped if it wasn't a character escape. This only affected invalid LaTeX, so we didn't see it in the wild, but it appeared in a QuickCheck test failure https://travis-ci.org/jgm/pandoc/jobs/319812224
2017-12-21	Muse reader: parse anchors immediately after headings as IDs	Alexander Krotov	1	-5/+9

2017-12-20	Org reader: fix asterisks-related parsing error	Albert Krewinkel	1	-1/+1
	A parsing error was fixed which caused the org reader to fail when parsing a paragraph starting with two or more asterisks. Fixes: #4180
2017-12-20	Muse reader: require that note references does not start with 0	Alexander Krotov	1	-1/+3

2017-12-20	Add Basic JATS reader based on DocBook reader	Hamish Mackenzie	1	-0/+387

2017-12-19	Muse reader: parse empty comments correctly	Alexander Krotov	1	-2/+1

2017-12-17	OPML reader: enable raw HTML and other extensions by default for notes.	John MacFarlane	1	-9/+14
	This fixes a regression in 2.0. Note that extensions can now be individually disabled, e.g. `-f opml-smart-raw_html`. Closes #4164.
2017-12-15	LaTeX reader: export tokenize, untokenize.	John MacFarlane	1	-1/+3
	Mainly so they can be tested.
2017-12-15	Fixed regression in LateX tokenization.	John MacFarlane	1	-2/+2
	This mainly affects the Markdown reader when parsing raw LaTeX with escaped spaces. Closes #4159.
2017-12-14	RST reader: more accurate parsing of references.	John MacFarlane	1	-36/+24
	Previously we erroneously included the enclosing backticks in a reference ID (closes #4156). This change also disables interpretation of syntax inside references, as in docutils. So, there is no emphasis in `my link`_
2017-12-14	Markdown reader: be pickier about table captions.	John MacFarlane	1	-1/+1
	A caption starts with a `:` which can't be followed by punctuation. Otherwise we can falsely interpret the start of a fenced div, or even a table header line like `:--:\|:--:`, as a caption.
2017-12-13	Docx writer: Continue lists after interruption.	Jesse Rosenthal	1	-15/+22
	Docx expects that lists will continue where they left off after an interruption and introduces a new id if a list is starting again. So we keep track of the state of lists and use them to define a "start" attribute, if necessary. Closes #4025
2017-12-13	Markdown reader: always use four space rule for example lists.	John MacFarlane	1	-9/+16
	It would be awkward to indent example list contents to the first non-space character after the label, since example list labels are often long. Thanks to Bernhard Fisseni for the suggestion.
2017-12-12	Markdown: Improved computation of relative cell widths in pipe tables.	John MacFarlane	1	-1/+1

2017-12-12	Pipe tables: use full text width for tables with wrapping cells.	John MacFarlane	1	-2/+2
	Previously we computed the column sizes based on the ratio between the header lines and the text width (as set by `--columns`). This meant that tables with very short header lines would be very narrow. With this change, pipe tables with wrapping cells will always take up the whole text width. The relative column widths will still be determined by the ratio of header lines, but they will be normalized to add up to 1.0.
2017-12-08	LaTeX reader: fix \ before newline.	John MacFarlane	1	-3/+14
	This should be a nonbreaking space, as long as it's not followed by a blank line. This has been fixed at the tokenizer level. Closes #4134.