aboutsummaryrefslogtreecommitdiff
path: root/src/Text/Pandoc/Readers
AgeCommit message (Collapse)AuthorFilesLines
2018-01-02Docx reader: Allow for insertion/deletion of paragraphs.Jesse Rosenthal1-4/+44
If the paragraph has a deleted or inserted paragraph break (depending on the track-changes setting) we hold onto it until the next paragraph. This takes care of accept and reject. For this we introduce a new state which holds the ils from the previous para if necessary. For `--track-changes=all`, we add an empty span with class `paragraph-insertion`/`paragraph-deletion` at the end of the paragraph prior to the break to be inserted or deleted. Closes #3927.
2018-01-02Docx reader: Parse track changes info into paragraph props.Jesse Rosenthal1-15/+27
This will tell us whether a paragraph break was inserted or deleted. We add a generalized track-changes parsing function, and use it in `elemToParPart` as well.
2018-01-02Docx reader: Extract tracked changes type from parpart.Jesse Rosenthal2-6/+19
We're going to want to use it elsewhere as well, in upcoming tracking of paragraph insertion/deletion.
2018-01-01Markdown reader: rewrite inlinesInBalancedBrackets.John MacFarlane1-19/+13
The rewrite is much more direct, avoiding parseFromString. And it performs significantly better; unfortunately, parsing time still increases exponentially. See #1735.
2017-12-31Docx reader: minor cleanup.Jesse Rosenthal1-1/+2
2017-12-31Docx Reader: Combine adjacent anchors.Jesse Rosenthal1-20/+47
There isn't any reason to have numberous anchors in the same place, since we can't maintain docx's non-nesting overlapping. So we reduce to a single anchor, and have all links pointing to one of the overlapping anchors point to that one. This changes the behavior from commit e90c714c7 slightly (use the first anchor instead of the last) so we change the expected test result. Note that because this produces a state that has to be set after every invocation of `parPartToInlines`, we make the main function into a primed subfunction `parPartToInlines'`, and make `parPartToInlines` a wrapper around that.
2017-12-30Markdown reader: Avoid parsing raw tex unless \ + letter seen.John MacFarlane1-1/+2
This seems to help with the performance problem, #4216.
2017-12-30LaTeX reader: Simplified a check for raw tex command.John MacFarlane1-2/+2
2017-12-30Docx reader: Remove unused anchors.Jesse Rosenthal1-5/+27
Docx produces a lot of anchors with nothing pointing to them -- we now remove these to produce cleaner output. Note that this has to occur at the end of the process because it has to follow link/anchor rewriting. Closes #3679.
2017-12-31Muse reader: automatically translate #cover into #cover-imageAlexander Krotov1-1/+3
Amusewiki uses #cover directive to specify cover image.
2017-12-30Docx reader: Read multiple children of w:sdtContents`Jesse Rosenthal1-5/+9
Previously we had only read the first child of an sdtContents tag. Now we replace sdt with all children of the sdtContents tag. This changes the expected test result of our nested_anchors test, since now we read docx's generated TOCs.
2017-12-28LaTeX reader: be more tolerant of `&` character.John MacFarlane1-1/+1
This allows us to parse unknown tabular environments as raw LaTeX. Closes #4208.
2017-12-28Org reader: support minlevel option for includesAlbert Krewinkel1-14/+37
The level of headers in included files can be shifted to a higher level by specifying a minimum header level via the `:minlevel` parameter. E.g. `#+include: "tour.org" :minlevel 1` will shift the headers in tour.org such that the topmost headers become level 1 headers. Fixes: #4154
2017-12-27Fix warning.John MacFarlane1-2/+1
2017-12-27Small improvement to figcaption parsing. #4184.John MacFarlane1-2/+0
2017-12-27Merge pull request #4184 from mb21/html-reader-figcaptionJohn MacFarlane1-4/+7
HTML Reader: be more forgiving about figcaption
2017-12-27HTML reader: parse div with class `line-block` as LineBlock.John MacFarlane1-1/+13
See #4162.
2017-12-27Docx Reader: preprocess Document body to unwrap "w:sdt" elementsJesse Rosenthal1-1/+31
We walk through the document (using the zipper in Text.XML.Light.Cursor) to unwrap the sdt tags before doing the rest of the parsing of the document. Note that the function is generically named `walkDocument` in case we need to do any further preprocessing in the future. Closes #4190
2017-12-26LaTeX reader: support `\foreignlanguage` from babel.John MacFarlane1-0/+30
2017-12-24RST reader: allow empty list items (as docutils does).John MacFarlane1-2/+2
Closes #4193.
2017-12-23JATS reader: handle author-notes.John MacFarlane1-5/+6
2017-12-23JATS reader: code refactoring.John MacFarlane1-63/+48
2017-12-23JATS reader: include institute metadata.John MacFarlane1-2/+11
2017-12-23JATS reader: process author metadata.John MacFarlane1-5/+27
2017-12-23JATS reader: better citation handling.John MacFarlane1-3/+79
We now convert a ref-list element into a list of citations in metadata, suitable for use with pandoc-citeproc. We also convert references to pandoc citation elements. Thus a JATS article with embedded bibliographic information can be processed with pandoc and pandoc-citeproc to produce a formatted bibliography.
2017-12-23HTML Reader: be more forgiving about figcaptionmb211-4/+7
fixes #4183
2017-12-22Merge pull request #4189 from mb21/export-blocksToInlinesJohn MacFarlane2-3/+3
API change: export blocksToInlines' from Text.Pandoc.Shared
2017-12-22`latex_macros` extension changes.John MacFarlane2-5/+11
Don't pass through macro definitions themselves when `latex_macros` is set. The macros have already been applied. If `latex_macros` is enabled, then `rawLaTeXBlock` in Text.Pandoc.Readers.LaTeX will succeed in parsing a macro definition, and will update pandoc's internal macro map accordingly, but the empty string will be returned. Together with earlier changes, this closes #4179.
2017-12-22Markdown reader: improved raw tex parsing.John MacFarlane1-6/+9
+ Preserve original whitespace between blocks. + Recognize `\placeformula` as context.
2017-12-22LaTeX reader: use applyMacros in rawLaTeXBlock, rawLaTeXInline.John MacFarlane1-2/+5
2017-12-22LaTeX reader: Refactored inlineCommand.John MacFarlane1-24/+11
2017-12-22API change: export blocksToInlines' from Text.Pandoc.Sharedmb212-3/+3
2017-12-21Merge pull request #4177 from stencila/jats-xml-readerJohn MacFarlane1-0/+404
Add Basic JATS reader based on DocBook reader
2017-12-22Improve support for code language in JATSHamish Mackenzie1-2/+19
2017-12-21LaTeX reader: Fixed subtle bug in tokenizer.John MacFarlane1-2/+3
Material following `^^` was dropped if it wasn't a character escape. This only affected invalid LaTeX, so we didn't see it in the wild, but it appeared in a QuickCheck test failure https://travis-ci.org/jgm/pandoc/jobs/319812224
2017-12-21Muse reader: parse anchors immediately after headings as IDsAlexander Krotov1-5/+9
2017-12-20Org reader: fix asterisks-related parsing errorAlbert Krewinkel1-1/+1
A parsing error was fixed which caused the org reader to fail when parsing a paragraph starting with two or more asterisks. Fixes: #4180
2017-12-20Muse reader: require that note references does not start with 0Alexander Krotov1-1/+3
2017-12-20Add Basic JATS reader based on DocBook readerHamish Mackenzie1-0/+387
2017-12-19Muse reader: parse empty comments correctlyAlexander Krotov1-2/+1
2017-12-17OPML reader: enable raw HTML and other extensions by default for notes.John MacFarlane1-9/+14
This fixes a regression in 2.0. Note that extensions can now be individually disabled, e.g. `-f opml-smart-raw_html`. Closes #4164.
2017-12-15LaTeX reader: export tokenize, untokenize.John MacFarlane1-1/+3
Mainly so they can be tested.
2017-12-15Fixed regression in LateX tokenization.John MacFarlane1-2/+2
This mainly affects the Markdown reader when parsing raw LaTeX with escaped spaces. Closes #4159.
2017-12-14RST reader: more accurate parsing of references.John MacFarlane1-36/+24
Previously we erroneously included the enclosing backticks in a reference ID (closes #4156). This change also disables interpretation of syntax inside references, as in docutils. So, there is no emphasis in `my *link*`_
2017-12-14Markdown reader: be pickier about table captions.John MacFarlane1-1/+1
A caption starts with a `:` which can't be followed by punctuation. Otherwise we can falsely interpret the start of a fenced div, or even a table header line like `:--:|:--:`, as a caption.
2017-12-13Docx writer: Continue lists after interruption.Jesse Rosenthal1-15/+22
Docx expects that lists will continue where they left off after an interruption and introduces a new id if a list is starting again. So we keep track of the state of lists and use them to define a "start" attribute, if necessary. Closes #4025
2017-12-13Markdown reader: always use four space rule for example lists.John MacFarlane1-9/+16
It would be awkward to indent example list contents to the first non-space character after the label, since example list labels are often long. Thanks to Bernhard Fisseni for the suggestion.
2017-12-12Markdown: Improved computation of relative cell widths in pipe tables.John MacFarlane1-1/+1
2017-12-12Pipe tables: use full text width for tables with wrapping cells.John MacFarlane1-2/+2
Previously we computed the column sizes based on the ratio between the header lines and the text width (as set by `--columns`). This meant that tables with very short header lines would be very narrow. With this change, pipe tables with wrapping cells will always take up the whole text width. The relative column widths will still be determined by the ratio of header lines, but they will be normalized to add up to 1.0.
2017-12-08LaTeX reader: fix \ before newline.John MacFarlane1-3/+14
This should be a nonbreaking space, as long as it's not followed by a blank line. This has been fixed at the tokenizer level. Closes #4134.