aboutsummaryrefslogtreecommitdiff
path: root/src/Text/Pandoc/Readers
AgeCommit message (Collapse)AuthorFilesLines
2021-11-05Add interface for custom readers written in Lua. (#7671)John MacFarlane1-0/+55
New module Text.Pandoc.Readers.Custom, exporting readCustom [API change]. Users can now do `-f myreader.lua` and pandoc will treat the script myreader.lua as a custom reader, which parses an input string to a pandoc AST, using the pandoc module defined for Lua filters. A sample custom reader can be found in data/reader.lua. Closes #7669.
2021-11-05Support for <indexterm>s when reading DocBook (#7607)Rowan Rodrik van der Molen1-4/+37
* Support for <indexterm>s when reading DocBook * Update implementation status of `<n-ary>` tags * Remove non-idiomatic parentheses * More complete `<indexterm>` support, with tests Co-authored-by: Rowan Rodrik van der Molen <rowan@ytec.nl>
2021-11-02Markdown reader: Improve inlinesInBalancedBrackets.John MacFarlane1-20/+12
This is just a small improvement in terms of performance, but it's simpler and more direct code. Also, we avoid parsing interparagraph spaces in balanced brackets, as the original did.
2021-11-02Docx reader: don't let first line indents trigger block quotes.John MacFarlane1-3/+2
This fixes a regression introduced in pandoc 2.15 by PR #7606. Closes #7655.
2021-10-27Switch back from HsYAML to yaml.John MacFarlane2-116/+62
Reasons: - Performance: HsYAML is around 20 times slower in parsing large YAML bibliographies (#6084). - An issue was submitted to HsYAML, but it hasn't gotten any attention. HsYAML seems borderline unmaintained; it hasn't had a commit in over a year. - Unfortunately this goes back on our attempts to free ourselves from C dependencies (#4535). But I don't see a better alternative until a better pure Haskell parser is available. Closes #6084. Notes: - We've removed the FromYAML instances for all types that had them, since this is a HsYAML-specific typeclass [API change]. (The yaml package just uses From/ToJSON.) - Unlike HsYAML (in the configuration we were using), yaml parses 'Y', 'N', 'Yes', 'No', 'On', 'Off' as boolean values. Users may need to quote these when they are meant to be interpreted as strings. Similarly, 'null' is parsed as a YAML null value (and will be treated as an empty string by pandoc rather than the string 'null'). Quoting it will force it to be interpreted as a string. - Some tests had to be adjusted accordingly. - Pandoc now behaves better when the YAML metadata contains escaping errors: instead of just falling back on treating the section as a table, it raises a YAML parsing error.
2021-10-22Org reader: allow an initial :PROPERTIES: drawer to add to metadata.John MacFarlane1-2/+10
Closes #7520.
2021-10-22Use simpleFigure in Readers.Aner Lucero6-44/+50
2021-10-20Markdown reader: don't parse links or bracketed spans as citations.John MacFarlane1-2/+4
Previously pandoc would parse [link to (@a)](url) as a citation; similarly [(@a)]{#ident} This is undesirable. One should be able to use example references in citations, and even if `@a` is not defined as an example reference, `[@a](url)` should be a link containing an author-in-text citation rather than a normal citation followed by literal `(url)`. Closes #7632.
2021-10-18Docx reader: fix handling of empty fieldsMilan Bracke1-0/+4
Some fields only have an instrText and no content, Pandoc didn't understand these, causing other fields to be misunderstood because it seemed like a field was still open when it wasn't.
2021-10-18Docx parser: implement PAGEREF fieldsMilan Bracke2-0/+26
These fields, often used in tables of contents, can be a hyperlink.
2021-10-18Docx reader: fix handling of nested fieldsMilan Bracke2-115/+150
Fields delimited by fldChar elements can contain other fields. Before, the nested fields would be ignored, except for the end, which would be considered the end of the parent field. To fix this issue, fields needed to be considered containing ParParts instead of Runs, since a Run can't represent complex enough structures. This also impacted Hyperlinks since they can originate from a field.
2021-10-14DocBook reader: honor linenumbering attributeSamuel Tardieu1-0/+1
The attribute DocBook linenumbering="numbered" attribute on code blocks maps to "numberLines" internally.
2021-10-13Fix markdown parsing bug for math in bracketed spans and links.John MacFarlane1-0/+1
This affects math with unbalanced brackets (e.g. `$(0,1]$`) inside links, images, bracketed spans. Closes #7623.
2021-10-11LaTeX reader: Implement siunitx v3 commands.John MacFarlane1-1/+5
We support `\unit`, `\qty`, `\qtyrange`, and `\qtylist` as synonynms of `\si`, `\SI`, `\SIrange`, and `\SIlist`. Closes #7614.
2021-10-10Avoid blockquote when parent style has more indentMilan Bracke3-53/+66
When a paragraph has an indentation different from the parent (named) style, it used to be considered a blockquote. But this only makes sense when the paragraph has more indentation. So this commit adds a check for the indentation of the parent style.
2021-10-10LaTeX reader: Properly handle `\^` followed by group closing.John MacFarlane1-3/+3
Closes #7615.
2021-09-30Docx reader: Add placeholder for word diagramEzwal2-0/+17
2021-09-23HTML reader: handle empty tbody element in table.John MacFarlane1-5/+8
Closes #7589.
2021-09-19LaTeX reader: Recognize that `\vadjust` sometimes takes "pre".John MacFarlane1-0/+7
Closes #7531.
2021-09-19Ignore (and gobble parameters of) CSLReferences environment.John MacFarlane1-0/+1
Otherwise we get the parameters as numbers in the output. Closes #7531.
2021-09-17Fix linter warning.John MacFarlane1-4/+3
2021-09-16Fix code blocks using `--preserve-tabs`.John MacFarlane1-1/+7
Previously they did not behave as the equivalent input with spaces would. Closes #7573.
2021-09-13RST reader: handle escaped colons in reference definitions.John MacFarlane1-1/+2
Cloess #7568.
2021-09-10feat(ipynb reader): get cell output mime from raw_mimetype tooKolen Cheung1-1/+2
While the spec defined format, in practice raw_mimetype is used. See jupyter/nbformat#229
2021-09-10feat(ipynb reader): add more Jupyter's "Raw NBConvert Format"Kolen Cheung1-6/+10
This adds most of the available formats selectable from Jupyter's interface "Raw NBConvert Format".
2021-09-10fix!: rst mime typeKolen Cheung1-1/+1
BREAKING CHANGE: fix rst mime type according to https://docutils.sourceforge.io/FAQ.html
2021-09-10Remove redundant import.John MacFarlane1-1/+1
2021-09-10Org reader: don't parse a list as first item in a list item.John MacFarlane1-1/+4
Closes #7557.
2021-09-10Ipynb reader handleData: support text/markdown (#7561)Kolen Cheung1-0/+3
`text/markdown` is now a supported mime type for raw output.
2021-09-08RTF reader: support `\binN` for binary image data.John MacFarlane1-11/+22
2021-09-04RTF reader: better handling of `\*` and bookmarks.John MacFarlane1-8/+8
We now ensure that groups starting with `\*` never cause text to be added to the document. In addition, bookmarks now create a span between the start and end of the bookmark, rather than an empty span.
2021-09-04Minor renaming to avoid shadowing.John MacFarlane1-2/+2
2021-09-03RTF reader: if doc begins with {\rtf1 ... } only parse its contents.John MacFarlane1-1/+7
Some documents seem to have non-RTF (e.g. XML) material after the `{\rtf1 ... }` group.
2021-09-03RTF reader: Ignore `\pgdsc` group.John MacFarlane1-0/+1
Otherwise we get style names treated as test.
2021-08-23Markdown reader: fix interaction of --strip-comments and listJohn MacFarlane1-1/+1
parsing. Use of `--strip-comments` was causing tight lists to be rendered as loose (as if the comment were a blank line). Closes #7521.
2021-08-21LaTeX-parser: restrict \endinput to current fileSimon Schuster2-1/+9
2021-08-20RST reader: Fix `:literal:` includes.John MacFarlane1-5/+2
These should create code blocks, not insert raw RST. Closes #7513.
2021-08-19Improve docx reader's robustness in extracting images.John MacFarlane1-5/+6
The docx reader made a couple assumptions about how docx containers were laid out that were not always true, with the result that some images in documents did not get found/extracted. Closes #7511.
2021-08-16Fix bug in last commit due to removal of take1WhileP.John MacFarlane1-2/+2
2021-08-15Multimarkdown sub- and superscripts (#5512) (#7188)OCzarnecki1-8/+16
Added an extension `short_subsuperscripts` which modifies the behavior of `subscript` and `superscript`, allowing subscripts or superscripts containing only alphanumerics to end with a space character (eg. `x^2 = 4` or `H~2 is combustible`). This improves support for multimarkdown. Closes #5512. Add `Ext_short_subsuperscripts` constructor to `Extension` [API change]. This is enabled by default for `markdown_mmd`.
2021-08-13LaTeX reader: proper implicit grouping around environment macros.John MacFarlane1-1/+2
2021-08-12Use Prelude from base-compat for ghc 8.4 too.John MacFarlane1-5/+1
We were having trouble building on ghc 8.4 because of the lack of a Foldable instance for (Alt Maybe) in base < 4.12. Mystery: for some reason our builds were failing for gitit but not in the pandoc CI.
2021-08-11Try fixing compile error on older ghcs.John MacFarlane1-1/+5
See https://github.com/jgm/gitit/runs/3308381697
2021-08-11Fix some lint issues.John MacFarlane2-6/+5
2021-08-11LaTeX reader: Support `\global` before `\def`, `\let`, etc.John MacFarlane1-2/+10
See #7494.
2021-08-11Fix scope for LaTeX macros.John MacFarlane3-55/+100
They should by default scope over the group in which they are defined (except `\gdef` and `\xdef`, which are global). In addition, environments must be treated as groups. We handle this by making sMacros in the LaTeX parser state a STACK of macro tables. Opening a group adds a table to the stack, closing one removes one. Only the top of the stack is queried. This commit adds a parameter for scope to the Macro constructor (not exported). Closes #7494.
2021-08-11LaTeX reader: improve handling of plain TeX macro primitives.John MacFarlane2-6/+29
- Fixed semantics for `\let`. - Implement `\edef`, `\gdef`, and `\xdef`. - Add comment noting that currently `\def` and `\edef` set global macros (so are equivalent to `\gdef` and `\xdef`). This should be fixed by scoping macro definitions to groups, in a future commit. Closes #7474.
2021-08-10HTML reader: treat commments as blank when parsing.John MacFarlane1-5/+7
This modifies pBlank. Previously comments could sometimes flummox the parser. Cloes #7482.
2021-08-10Fix RTF table parsing bug that created undesired nested tables.John MacFarlane1-1/+1
Closes #7488.
2021-08-10Add RTF reader.John MacFarlane1-0/+1333
- `rtf` is now supported as an input format as well as output. - New module Text.Pandoc.Readers.RTF (exporting `readRTF`). [API change] Closes #3982.