aboutsummaryrefslogtreecommitdiff
path: root/src/Text/Pandoc/Readers
AgeCommit message (Collapse)AuthorFilesLines
2021-09-17Fix linter warning.John MacFarlane1-4/+3
2021-09-16Fix code blocks using `--preserve-tabs`.John MacFarlane1-1/+7
Previously they did not behave as the equivalent input with spaces would. Closes #7573.
2021-09-13RST reader: handle escaped colons in reference definitions.John MacFarlane1-1/+2
Cloess #7568.
2021-09-10feat(ipynb reader): get cell output mime from raw_mimetype tooKolen Cheung1-1/+2
While the spec defined format, in practice raw_mimetype is used. See jupyter/nbformat#229
2021-09-10feat(ipynb reader): add more Jupyter's "Raw NBConvert Format"Kolen Cheung1-6/+10
This adds most of the available formats selectable from Jupyter's interface "Raw NBConvert Format".
2021-09-10fix!: rst mime typeKolen Cheung1-1/+1
BREAKING CHANGE: fix rst mime type according to https://docutils.sourceforge.io/FAQ.html
2021-09-10Remove redundant import.John MacFarlane1-1/+1
2021-09-10Org reader: don't parse a list as first item in a list item.John MacFarlane1-1/+4
Closes #7557.
2021-09-10Ipynb reader handleData: support text/markdown (#7561)Kolen Cheung1-0/+3
`text/markdown` is now a supported mime type for raw output.
2021-09-08RTF reader: support `\binN` for binary image data.John MacFarlane1-11/+22
2021-09-04RTF reader: better handling of `\*` and bookmarks.John MacFarlane1-8/+8
We now ensure that groups starting with `\*` never cause text to be added to the document. In addition, bookmarks now create a span between the start and end of the bookmark, rather than an empty span.
2021-09-04Minor renaming to avoid shadowing.John MacFarlane1-2/+2
2021-09-03RTF reader: if doc begins with {\rtf1 ... } only parse its contents.John MacFarlane1-1/+7
Some documents seem to have non-RTF (e.g. XML) material after the `{\rtf1 ... }` group.
2021-09-03RTF reader: Ignore `\pgdsc` group.John MacFarlane1-0/+1
Otherwise we get style names treated as test.
2021-08-23Markdown reader: fix interaction of --strip-comments and listJohn MacFarlane1-1/+1
parsing. Use of `--strip-comments` was causing tight lists to be rendered as loose (as if the comment were a blank line). Closes #7521.
2021-08-21LaTeX-parser: restrict \endinput to current fileSimon Schuster2-1/+9
2021-08-20RST reader: Fix `:literal:` includes.John MacFarlane1-5/+2
These should create code blocks, not insert raw RST. Closes #7513.
2021-08-19Improve docx reader's robustness in extracting images.John MacFarlane1-5/+6
The docx reader made a couple assumptions about how docx containers were laid out that were not always true, with the result that some images in documents did not get found/extracted. Closes #7511.
2021-08-16Fix bug in last commit due to removal of take1WhileP.John MacFarlane1-2/+2
2021-08-15Multimarkdown sub- and superscripts (#5512) (#7188)OCzarnecki1-8/+16
Added an extension `short_subsuperscripts` which modifies the behavior of `subscript` and `superscript`, allowing subscripts or superscripts containing only alphanumerics to end with a space character (eg. `x^2 = 4` or `H~2 is combustible`). This improves support for multimarkdown. Closes #5512. Add `Ext_short_subsuperscripts` constructor to `Extension` [API change]. This is enabled by default for `markdown_mmd`.
2021-08-13LaTeX reader: proper implicit grouping around environment macros.John MacFarlane1-1/+2
2021-08-12Use Prelude from base-compat for ghc 8.4 too.John MacFarlane1-5/+1
We were having trouble building on ghc 8.4 because of the lack of a Foldable instance for (Alt Maybe) in base < 4.12. Mystery: for some reason our builds were failing for gitit but not in the pandoc CI.
2021-08-11Try fixing compile error on older ghcs.John MacFarlane1-1/+5
See https://github.com/jgm/gitit/runs/3308381697
2021-08-11Fix some lint issues.John MacFarlane2-6/+5
2021-08-11LaTeX reader: Support `\global` before `\def`, `\let`, etc.John MacFarlane1-2/+10
See #7494.
2021-08-11Fix scope for LaTeX macros.John MacFarlane3-55/+100
They should by default scope over the group in which they are defined (except `\gdef` and `\xdef`, which are global). In addition, environments must be treated as groups. We handle this by making sMacros in the LaTeX parser state a STACK of macro tables. Opening a group adds a table to the stack, closing one removes one. Only the top of the stack is queried. This commit adds a parameter for scope to the Macro constructor (not exported). Closes #7494.
2021-08-11LaTeX reader: improve handling of plain TeX macro primitives.John MacFarlane2-6/+29
- Fixed semantics for `\let`. - Implement `\edef`, `\gdef`, and `\xdef`. - Add comment noting that currently `\def` and `\edef` set global macros (so are equivalent to `\gdef` and `\xdef`). This should be fixed by scoping macro definitions to groups, in a future commit. Closes #7474.
2021-08-10HTML reader: treat commments as blank when parsing.John MacFarlane1-5/+7
This modifies pBlank. Previously comments could sometimes flummox the parser. Cloes #7482.
2021-08-10Fix RTF table parsing bug that created undesired nested tables.John MacFarlane1-1/+1
Closes #7488.
2021-08-10Add RTF reader.John MacFarlane1-0/+1333
- `rtf` is now supported as an input format as well as output. - New module Text.Pandoc.Readers.RTF (exporting `readRTF`). [API change] Closes #3982.
2021-08-03Stop using the HTTP package. (#7456)mt_caret1-2/+2
We only depend on the urlEncode function in the package, which is also provided by http-types. The HTTP package also depends on the network package, which has difficulty building on ghcjs. Add internal module Text.Pandoc.Network.HTTP, exporting `urlEncode`.
2021-07-17LaTeX reader: avoid trailing hyphen in translating languages.John MacFarlane1-2/+2
Previously `\foreignlanguage{english}` turned into `<span lang="en-">`. The same issue affected Arabic. Closes #7447.
2021-07-16DocBook reader: handle images with imageobjectco elements.John MacFarlane1-3/+3
Closes #7440.
2021-07-16LaTeX reader: Support `\cline` in LaTeX tables.John MacFarlane1-0/+1
Closes #7442.
2021-07-11DocBook reader: add support for citerefentry (#7437)Jan Tojnar1-1/+5
Originally intended for referring to UNIX manual pages, either part of the same DocBook document as refentry element, or external – hence the manvolnum element. These days, refentry is more general, for example the element documentation pages linked below are each a refentry. As per the *Processing expectations* section of citerefentry, the element is supposed to be a hyperlink to a refentry (when in the same document) but pandoc does not support refentry tag at the moment so that is moot. https://tdg.docbook.org/tdg/5.1/citerefentry.html https://tdg.docbook.org/tdg/5.1/manvolnum.html https://tdg.docbook.org/tdg/5.1/refentry.html This roughly corresponds to a `manpage` role in rST syntax, which produces a `Code` AST node with attributes `.interpreted-text role=manpage` but that does not fit DocBook parser. https://www.sphinx-doc.org/en/master/usage/restructuredtext/roles.html#role-manpage
2021-07-11Improved parsing of raw LaTeX from Text streams (rawLaTeXParser).John MacFarlane2-11/+37
We now use source positions from the token stream to tell us how much of the text stream to consume. Getting this to work required a few other changes to make token source positions accurate. Closes #7434.
2021-07-09RST reader: fix regression with code includes.John MacFarlane1-1/+5
With the recent changes to include infrastructure, included code blocks were getting an extra newline. Closes #7436. Added regression test.
2021-07-06Recognize data-external when reading HTML img tags (#7429)Michael Hoffmann1-8/+3
Preserve all attributes in img tags. If attributes have a `data-` prefix, it will be stripped. In particular, this preserves a `data-external` attribute as an `external` attribute in the pandoc AST.
2021-07-06Markdown reader: don't try to read contents in self-closing HTML tag.John MacFarlane1-1/+4
Previously we had problems parsing raw HTML with self-closing tags like `<col/>`. The problem was that pandoc would look for a closing tag to close the markdown contents, but the closing tag had, in effect, already been parsed by `htmlTag`. This fixes the issue described in <https://groups.google.com/d/msgid/pandoc-discuss/297bc662-7841-4423-bcbb-534e99bbba09n%40googlegroups.com>.
2021-07-06HTML reader: add col, colgroup to 'closes' definitionsJohn MacFarlane1-1/+3
2021-06-22Fix regression with comment-only YAML metadata blocks.John MacFarlane1-0/+3
Closes #7400.
2021-06-21Improve emailAddress in Text.Pandoc.Parsing.John MacFarlane1-1/+21
Previously the parser would accept characters in domains that are illegal in domains, and this sometimes caused it to gobble bits of the following text. Closes #7398. Note that this change, by itself, caused some txt2tag reader tests to fail. txt2tags allows bare email addresses with a following form query. So, in addition to the change to emailAddress, we modify the txt2tags parser so it can still handle these cases.
2021-06-12Docx reader: handle absolute URIs in Relationship Target.John MacFarlane1-5/+11
Closes #7374.
2021-06-05DocBook reader: Add support for danger elementJan Tojnar1-1/+2
Added in DocBook 5.2: - https://github.com/docbook/docbook/pull/64 - https://tdg.docbook.org/tdg/5.2/danger.html
2021-06-01Markdown reader: fix pipe table regression in 2.11.4.John MacFarlane1-1/+1
Previously pipe tables with empty headers (that is, a header line with all empty cells) would be rendered as headerless tables. This broke in 2.11.4. The fix here is to produce an AST with an empty table head when a pipe table has all empty header cells. Closes #7343.
2021-06-01LaTeX reader: don't allow optional * on symbol control sequences.John MacFarlane1-2/+4
Generally we allow optional starred variants of LaTeX commands (since many allow them, and if we don't accept these explicitly, ignoring the star usually gives acceptable results). But we don't want to do this for `\(*\)` and similar cases. Closes #7340.
2021-05-31Fix regression with commonmark/gfm yaml metdata block parsing.John MacFarlane1-5/+5
A regression in 2.14 led to the document body being omitted after YAML metadata in some cases. This is now fixed. Closes #7339.
2021-05-30HTML reader: fix column width regression.John MacFarlane1-1/+1
Column widths specified with a style attribute were off by a factor of 100 in 2.14. Closes #7334.
2021-05-29Markdown reader: in rebasePaths, check for both Windows and PosixJohn MacFarlane1-4/+5
absolute paths. Previously Windows pandoc was treating `/foo/bar.jpg` as non-absolute.
2021-05-29In rebasePath, check for absolute paths two ways.John MacFarlane1-1/+4
isAbsolute from FilePath doesn't return True on Windows for paths beginning with `/`, so we check that separately.