aboutsummaryrefslogtreecommitdiff
path: root/src/Text/Pandoc/Readers
AgeCommit message (Collapse)AuthorFilesLines
2018-01-16Docx reader: Parse hyperlinks in instrText tagsJesse Rosenthal1-2/+4
This was a form of hyperlink found in older versions of word. The changes introduced for this, though, create a framework for parsing further fields in MS Word (see the spec, ECMA-376-1:2016, ยง17.16.5, for more on these fields). Closes #3389 and #4266.
2018-01-16Docx reader: Parse instrText info in fldChar tags.Jesse Rosenthal2-5/+102
We introduce a new module, Text.Pandoc.Readers.Docx.Fields which contains a simple parsec parser. At the moment, only simple hyperlink fields are accepted, but that can be extended in the future.
2018-01-16Docx reader: Parse fldChar tagsJesse Rosenthal2-5/+84
This will allow us to parse instrTxt inside fldChar tags.
2018-01-15HTML reader: Fix col width parsing for percentages < 10% (#4262)n3fariox1-3/+6
Rather than take user input, and place a "0." in front, actually calculate the percentage to catch cases where small column sizes (e.g. `2%`) are needed.
2018-01-14RST reader: add aligned environment when needed in math.John MacFarlane1-2/+7
rst2latex.py uses an align* environment for math in `.. math::` blocks, so this math may contain line breaks. If it does, we put the math in an `aligned` environment to simulate rst2latex.py's behavior. Closes #4254.
2018-01-14Markdown reader: Improved inlinesInBalancedBrackets.John MacFarlane1-13/+21
The change both improves performance and fixes a regression whereby normal citations inside inline notes were not parsed correctly. Closes jgm/pandoc-citeproc#315.
2018-01-14LaTeX reader: Advance source position at end of stream.John MacFarlane1-1/+1
2018-01-13LaTeX reader: pass through macro defs in rawLaTeXBlock...John MacFarlane1-4/+2
even if the `latex_macros` extension is set. This reverts to earlier behavior and is probably safer on the whole, since some macros only modify things in included packages, which pandoc's macro expansion can't modify. Closes #4246.
2018-01-13LaTeX reader: fixed pos calculation in tokenizing escaped space.John MacFarlane1-3/+6
2018-01-13LaTeX reader: allow macro definitions inside macros.John MacFarlane1-6/+9
Previously we went into an infinite loop with ``` \newcommand{\noop}[1]{#1} \noop{\newcommand{\foo}[1]{#1}} \foo{hi} ``` See #4253.
2018-01-10RST reader: better handling for headers with an anchor.John MacFarlane1-2/+12
Instead of creating a div containing the header, we put the id directly on the header. This way header promotion will work properly. Closes #4240.
2018-01-05Update copyright notices to include 2018Albert Krewinkel24-48/+48
2018-01-02Docx reader: remove MultiWayIfJesse Rosenthal1-38/+39
Different formatting rules across 7.X and 8.X. Use empty case expression instead.
2018-01-02Docx reader: Allow for insertion/deletion of paragraphs.Jesse Rosenthal1-4/+44
If the paragraph has a deleted or inserted paragraph break (depending on the track-changes setting) we hold onto it until the next paragraph. This takes care of accept and reject. For this we introduce a new state which holds the ils from the previous para if necessary. For `--track-changes=all`, we add an empty span with class `paragraph-insertion`/`paragraph-deletion` at the end of the paragraph prior to the break to be inserted or deleted. Closes #3927.
2018-01-02Docx reader: Parse track changes info into paragraph props.Jesse Rosenthal1-15/+27
This will tell us whether a paragraph break was inserted or deleted. We add a generalized track-changes parsing function, and use it in `elemToParPart` as well.
2018-01-02Docx reader: Extract tracked changes type from parpart.Jesse Rosenthal2-6/+19
We're going to want to use it elsewhere as well, in upcoming tracking of paragraph insertion/deletion.
2018-01-01Markdown reader: rewrite inlinesInBalancedBrackets.John MacFarlane1-19/+13
The rewrite is much more direct, avoiding parseFromString. And it performs significantly better; unfortunately, parsing time still increases exponentially. See #1735.
2017-12-31Docx reader: minor cleanup.Jesse Rosenthal1-1/+2
2017-12-31Docx Reader: Combine adjacent anchors.Jesse Rosenthal1-20/+47
There isn't any reason to have numberous anchors in the same place, since we can't maintain docx's non-nesting overlapping. So we reduce to a single anchor, and have all links pointing to one of the overlapping anchors point to that one. This changes the behavior from commit e90c714c7 slightly (use the first anchor instead of the last) so we change the expected test result. Note that because this produces a state that has to be set after every invocation of `parPartToInlines`, we make the main function into a primed subfunction `parPartToInlines'`, and make `parPartToInlines` a wrapper around that.
2017-12-30Markdown reader: Avoid parsing raw tex unless \ + letter seen.John MacFarlane1-1/+2
This seems to help with the performance problem, #4216.
2017-12-30LaTeX reader: Simplified a check for raw tex command.John MacFarlane1-2/+2
2017-12-30Docx reader: Remove unused anchors.Jesse Rosenthal1-5/+27
Docx produces a lot of anchors with nothing pointing to them -- we now remove these to produce cleaner output. Note that this has to occur at the end of the process because it has to follow link/anchor rewriting. Closes #3679.
2017-12-31Muse reader: automatically translate #cover into #cover-imageAlexander Krotov1-1/+3
Amusewiki uses #cover directive to specify cover image.
2017-12-30Docx reader: Read multiple children of w:sdtContents`Jesse Rosenthal1-5/+9
Previously we had only read the first child of an sdtContents tag. Now we replace sdt with all children of the sdtContents tag. This changes the expected test result of our nested_anchors test, since now we read docx's generated TOCs.
2017-12-28LaTeX reader: be more tolerant of `&` character.John MacFarlane1-1/+1
This allows us to parse unknown tabular environments as raw LaTeX. Closes #4208.
2017-12-28Org reader: support minlevel option for includesAlbert Krewinkel1-14/+37
The level of headers in included files can be shifted to a higher level by specifying a minimum header level via the `:minlevel` parameter. E.g. `#+include: "tour.org" :minlevel 1` will shift the headers in tour.org such that the topmost headers become level 1 headers. Fixes: #4154
2017-12-27Fix warning.John MacFarlane1-2/+1
2017-12-27Small improvement to figcaption parsing. #4184.John MacFarlane1-2/+0
2017-12-27Merge pull request #4184 from mb21/html-reader-figcaptionJohn MacFarlane1-4/+7
HTML Reader: be more forgiving about figcaption
2017-12-27HTML reader: parse div with class `line-block` as LineBlock.John MacFarlane1-1/+13
See #4162.
2017-12-27Docx Reader: preprocess Document body to unwrap "w:sdt" elementsJesse Rosenthal1-1/+31
We walk through the document (using the zipper in Text.XML.Light.Cursor) to unwrap the sdt tags before doing the rest of the parsing of the document. Note that the function is generically named `walkDocument` in case we need to do any further preprocessing in the future. Closes #4190
2017-12-26LaTeX reader: support `\foreignlanguage` from babel.John MacFarlane1-0/+30
2017-12-24RST reader: allow empty list items (as docutils does).John MacFarlane1-2/+2
Closes #4193.
2017-12-23JATS reader: handle author-notes.John MacFarlane1-5/+6
2017-12-23JATS reader: code refactoring.John MacFarlane1-63/+48
2017-12-23JATS reader: include institute metadata.John MacFarlane1-2/+11
2017-12-23JATS reader: process author metadata.John MacFarlane1-5/+27
2017-12-23JATS reader: better citation handling.John MacFarlane1-3/+79
We now convert a ref-list element into a list of citations in metadata, suitable for use with pandoc-citeproc. We also convert references to pandoc citation elements. Thus a JATS article with embedded bibliographic information can be processed with pandoc and pandoc-citeproc to produce a formatted bibliography.
2017-12-23HTML Reader: be more forgiving about figcaptionmb211-4/+7
fixes #4183
2017-12-22Merge pull request #4189 from mb21/export-blocksToInlinesJohn MacFarlane2-3/+3
API change: export blocksToInlines' from Text.Pandoc.Shared
2017-12-22`latex_macros` extension changes.John MacFarlane2-5/+11
Don't pass through macro definitions themselves when `latex_macros` is set. The macros have already been applied. If `latex_macros` is enabled, then `rawLaTeXBlock` in Text.Pandoc.Readers.LaTeX will succeed in parsing a macro definition, and will update pandoc's internal macro map accordingly, but the empty string will be returned. Together with earlier changes, this closes #4179.
2017-12-22Markdown reader: improved raw tex parsing.John MacFarlane1-6/+9
+ Preserve original whitespace between blocks. + Recognize `\placeformula` as context.
2017-12-22LaTeX reader: use applyMacros in rawLaTeXBlock, rawLaTeXInline.John MacFarlane1-2/+5
2017-12-22LaTeX reader: Refactored inlineCommand.John MacFarlane1-24/+11
2017-12-22API change: export blocksToInlines' from Text.Pandoc.Sharedmb212-3/+3
2017-12-21Merge pull request #4177 from stencila/jats-xml-readerJohn MacFarlane1-0/+404
Add Basic JATS reader based on DocBook reader
2017-12-22Improve support for code language in JATSHamish Mackenzie1-2/+19
2017-12-21LaTeX reader: Fixed subtle bug in tokenizer.John MacFarlane1-2/+3
Material following `^^` was dropped if it wasn't a character escape. This only affected invalid LaTeX, so we didn't see it in the wild, but it appeared in a QuickCheck test failure https://travis-ci.org/jgm/pandoc/jobs/319812224
2017-12-21Muse reader: parse anchors immediately after headings as IDsAlexander Krotov1-5/+9
2017-12-20Org reader: fix asterisks-related parsing errorAlbert Krewinkel1-1/+1
A parsing error was fixed which caused the org reader to fail when parsing a paragraph starting with two or more asterisks. Fixes: #4180