aboutsummaryrefslogtreecommitdiff
path: root/src/Text/Pandoc/Readers
AgeCommit message (Collapse)AuthorFilesLines
2019-03-25HTML reader: read `data-foo` attribute into `foo`.John MacFarlane1-1/+2
The HTML writer adds the `data-` prefix for HTML5 for nonstandard attributes. But the attributes are represented in the AST without the `data-` prefix, so we should strip this when reading HTML. Closes #5392.
2019-03-14Markdown writer: be sure implicit figures work in list contexts.John MacFarlane1-11/+13
Previously they would sometimes not work: e.g., when they occured in final paragraphs in lists that were originally parsed as Plain and converted later using PlainToPara. Closes #5368.
2019-03-10LaTeX reader: support `\underline`, `\ul`, `\uline` (#5359)Paul Tilley1-0/+5
These are parsed as a Span with class `underline`, as with other readers.
2019-03-10ipynb reader: removed vestigial ReaderOptions param.John MacFarlane1-18/+16
2019-03-09ipynb reader: remove sensitivity to `raw_html`, `raw_tex` extensions.John MacFarlane1-6/+2
We now include every output format. Pruning is handled by `--ipynb-output=`.
2019-03-09Ipynb reader/writer: better handling of cell metadata.John MacFarlane1-7/+10
We now handle even complex cell metadata in the Div's attributes. Simple metadata fields are rendered as a plain string, and complex ones as JSON.
2019-03-07Add inNote to Footcite and FootcitesJohn MacFarlane1-2/+2
2019-03-02JATS reader: Support fig-group block element (#5317).John MacFarlane1-1/+4
2019-03-01Remove license boilerplate.John MacFarlane51-940/+0
The haddock module header contains essentially the same information, so the boilerplate is redundant and just one more thing to get out of sync.
2019-02-28Markdown Reader: yamlToMeta respects extensions (#5276)Mauro Bieg1-3/+2
Add ReaderOptions parameter to yamlToMeta [API change]. fixes #5272
2019-02-23JATS reader: fix parsing of figures.John MacFarlane1-18/+27
This ensures that a figure containing a single image is parsed as a pandoc "implicit figure" (i.e., a Para with a single Image whose title attribute begins with `fig:`). More complex figures will still be parsed as divs. Closes #5321.
2019-02-21Docx reader: Start adding comment to combine moduleJesse Rosenthal1-0/+40
This module is one of the most opaque parts of the docx reader: it deals with the fact that runs have non-nesting formatting, so we have to figure out the nesting on the fly as we combine them. We start adding commenting, so new developers can understand and, if necessary, modify this module. Specific function comments will be added in the future, but this offers a global description of the purpose of the module.
2019-02-18Docx reader: Trim space inside the last inline.Jesse Rosenthal1-1/+2
We have to add one final mempty when we're combining in order to trim inlines appropriately. (We need to use our own trimming routines here due to the way that formatted inlines are smushed together when converting from docx.) Closes #5273
2019-02-18hlint MuseAlexander Krotov1-1/+1
2019-02-18Muse reader: add secondary note supportAlexander Krotov1-5/+11
2019-02-15Markdown reader: fix bug parsing fenced code blocks.John MacFarlane1-2/+3
Previously parsing would break if the code block contained a string of backticks of sufficient length followed by something other than end of line. Closes #5304.
2019-02-15JATS reader: handle citations with multiple references.John MacFarlane1-7/+10
The rid attribute can have a space-separated list of ids. Closes #5310.
2019-02-12Docx reader: unwrap sdt elements in footnotes and comments.Jesse Rosenthal1-3/+3
We had previously walked the document to unwrap sdt/sdtContent and smartTag tags in `word/document.xml`, but not in the `word/{foot/end}note.xml` and `word/comments.xml`. Closes #5302
2019-02-11Remove redundant import.John MacFarlane1-1/+0
2019-02-10ipynb writer: keep plain text fallbacks in output...John MacFarlane1-26/+14
even if a richer format is included. We don't know what output format will be needed. The fallback can always be weeded out using a filter. Closes #5293.
2019-02-08Make --metadata-file use pandoc-markdown (#5279)Mauro Bieg1-1/+2
see #5272
2019-02-08Docx reader: fix paths in archive to prevent Windows failureJesse Rosenthal1-1/+6
Some paths in archives are absolute (have an opening slash) which, for reasons unknown, produces a failure in the test suite on MS Windows. This fixes that by removing the leading slash if it exists. Closes #5277 (previously closed with 4cce0ef but reopened due to this bug).
2019-02-07Revert "Docx reader: Fix windows error"Jesse Rosenthal1-2/+1
This reverts commit 2142bbe572cea00b7bb5ad3e10a3afb26845a1f7.
2019-02-07Docx reader: Fix windows errorJesse Rosenthal1-1/+2
Try fixing a parsing error on windows by insisting that the parser use a Posix filepath library for splitting doc paths in a zipfile. (It might default on Windows to using a backslash as a separator, while it's always a forward-slash in zip archives.)
2019-02-07Docx reader: Some code cleanupJesse Rosenthal1-15/+25
* clarify function name. We had previously used `getDocumentPath`, but `Document` is an overdetermined term here. Use `getDocumentXmlPath` to make clear what we're doing. * Use field notation for setting ReaderEnv. As we've added (and continue to add) fields, the assignment by position has gotten harder to read. * figure out document.xml path once at the beginning of parsing, and add it to the environment, so we can avoid repeated lookups.
2019-02-07Docx reader: Extend dynamic xml location to detecting relationshipsJesse Rosenthal1-12/+19
Getting the location used to depend on a hard-coded .rels file based on "word/document.xml". We now dynamically detect that file based on the document.xml file specified in "_rels/.rels"
2019-02-06Docx reader: Dynamically determine document.xml path.Jesse Rosenthal1-3/+12
The desktop Word program places the main document file in "word/document.xml", but the online word places it in "word/document2.xml". This file path is actually stated in the root "_rels/.rels" file, in the "Relationship" element with an "http://../officedocument" type. Closes #5277
2019-02-06Handle Word files generated by Microsoft Word Online.John MacFarlane1-0/+2
For some reason, Word in Office 365 Online uses `document2.xml` for the content, instead of `document.xml`. This causes pandoc not to be able to parse docx. This quick fix has the parser check for both `document.xml` and `document2.xml`. Addresses #5277, but a more robust solution would be to get the name of the main document dynamically (who knows whether it might change again?).
2019-02-04Add missing copyright notices and remove license boilerplate (#5112)Albert Krewinkel38-70/+107
Quite a few modules were missing copyright notices. This commit adds copyright notices everywhere via haddock module headers. The old license boilerplate comment is redundant with this and has been removed. Update copyright years to 2019. Closes #4592.
2019-02-04Markdown reader: add newline when parsing blocks in YAML.John MacFarlane1-9/+10
Otherwise last block gets parsed as a Plain rather than a Para. This is a regression in pandoc 2.x. This patch restores pandoc 1.19 behavior. Closes #5271.
2019-02-02ipynb reader: handle images referring to attachments.John MacFarlane1-1/+9
Previously we didn't strip off the attachment: prefix, so even though the attachment is available in the mediabag, pandoc couldn't find it.
2019-01-31LaTeX reader: don't let `\egroup` match `{`.John MacFarlane1-3/+3
`braced` now actually requires nested braces. Otherwise some legitimate command and environment definitions can break (see test/command/tex-group.md).
2019-01-30Org reader: add support for #+SELECT_TAGS.leungbk4-23/+78
2019-01-30Org reader: separate filtering logic from conversion function.leungbk2-8/+11
2019-01-25MediaWiki reader: use `_` instead of `-` in auto-identifiers.John MacFarlane1-1/+6
Partially addresses #4731. We may not still be exactly matching mediawiki's algorithm for identifiers.
2019-01-24Ipynb: Put all jupyter metadata under 'jupyter' key.John MacFarlane1-1/+1
2019-01-24Revert "Prepend `jupyter_` to jupyter metadata keys."John MacFarlane1-6/+0
This reverts commit 5eaff399d5d6dc30b0d453eff42c4101674d75ab.
2019-01-24Prepend `jupyter_` to jupyter metadata keys.John MacFarlane1-0/+6
This avoids conflics with things like 'toc'.
2019-01-22Support ipynb (Jupyter notebook) as input and output format.John MacFarlane1-0/+249
[API change] * Depend on ipynb library. * Add `ipynb` as input and output format. * Added Text.Pandoc.Readers.Ipynb (supports both nbformat v3 and v4). * Added Text.Pandoc.Writers.Ipynb (supports nbformat v4). * Added ipynb readers and writers to T.P.Readers, T.P.Writers, and T.P.Extensions. Register the file extension .ipynb for this format. * Add `PandocIpynbDecodingError` constructor to Text.Pandoc.Error.Error. * Note: there is no template for ipynb.
2019-01-22LaTeX reader: support `\endinput`. Closes #5233.John MacFarlane1-0/+1
2019-01-22Man reader: fix typo. (#5245)Brian Leung1-3/+3
2019-01-21HTML and markdown: treat textarea as a verbatim environment.John MacFarlane2-8/+10
We don't want to parse its contents as Markdown or HTML. Closes #5241.
2019-01-20LaTeX reader: allow includes with dots like cc_by_4.0.John MacFarlane1-3/+5
Previously the `.0` was interpreted as a file extension, leading pandoc not to add `.tex` (and thus not to find the file). The new behavior matches tex more closely.
2019-01-20LaTeX reader: cleaned up 'input' code.John MacFarlane1-10/+5
2019-01-09RST reader: change treatment of `number-lines` directives. (#5207)Brian Leung1-15/+15
Directives of this type without numeric inputs should not have a `startFrom` attribute; with a blank value, the writers can produce extra whitespace.
2019-01-08Removed superfluous sourceCode class on code blocks.John MacFarlane3-11/+7
* These were added by the RST reader and, for literate Haskell, by the Markdown and LaTeX readers. There is no point to this class, and it is not applied consistently by all readers. See #5047. * Reverse order of `literate` and `haskell` classes on code blocks when parsing literate Haskell. Better if `haskell` comes first.
2019-01-08RST reader: handle sourcecode directive as synonynm for code.John MacFarlane1-1/+1
Closes #5204.
2019-01-07Org reader: allow for case of :minlevel == 0.John MacFarlane1-1/+3
See #5190.
2019-01-07Org reader: handle `minlevel` option differently. (#5190)Brian Leung1-3/+1
When `minlevel` exceeds the original minimum level observed in the file to be included, every heading should be shifted rightward.
2019-01-07TWiki reader: fix performance issue with underscores.John MacFarlane1-1/+3
Underscore emphasis can't cross table cell boundaries, but the parser wasn't respecting this, leading to exponential behavior in documents with table cells containing underscores. This fixes the original sample; it's possible that there are other performance issues involving underscores. Closes #3921.