aboutsummaryrefslogtreecommitdiff
path: root/src/Text/Pandoc/Readers
AgeCommit message (Collapse)AuthorFilesLines
2021-11-24LaTeX reader: omit visible content for `\label{...}`.John MacFarlane1-2/+1
Previously we included the text of the label in square brackets, but this is undesirable in many cases. See discussion in <https://github.com/jgm/pandoc/issues/813#issuecomment-978232426>.
2021-11-24HTML reader: parse attributes on links and images.John MacFarlane2-11/+10
Closes #6970.
2021-11-23Improve detection of pipe table line widths.John MacFarlane1-14/+18
Fixed calculation of maximum column widths in pipe tables. It is now based on the length of the markdown line, rather than a "stringified" version of the parsed line. This should be more predictable for users. In addition, we take into account double-wide characters such as emojis. Closes #7713.
2021-11-21yamlBsToRefs: allow multiple YAML documents.John MacFarlane1-2/+2
Some people use `---` as the end delimiter in YAML bibliography files, which causes the `yaml` library to emit an error unless we explicitly allow multiple YAML documents (and just consider the first). In T.P.Readers.Metadata
2021-11-20Capture `alt-text` in JATS figures (#7703)Albert Krewinkel1-2/+13
Co-authored-by: Aner Lucero <4rgento@gmail.com>
2021-11-18RST reader: handle class attribute for for custom roles (#7700)willj-dev1-8/+16
Previously the class attribute was ignored, and the name of the role used as the class. Closes #7699.
2021-11-15LaTeX reader: add rudimentary support for `\autoref` (#7693)Albert Krewinkel1-0/+1
2021-11-09Accept empty `--metadata-file`.John MacFarlane1-0/+1
Closes #7675. This is a regression from 2.15 behavior.
2021-11-08Add `<titleabbr>` support to DocBook readerRowan Rodrik van der Molen1-5/+12
2021-11-07LaTeX reader: add 'uri' class when parsing `\url`.John MacFarlane1-2/+2
Closes #7672.
2021-11-06Pass ReaderOptions to custom readers as second parameter.John MacFarlane1-4/+3
2021-11-05Add interface for custom readers written in Lua. (#7671)John MacFarlane1-0/+55
New module Text.Pandoc.Readers.Custom, exporting readCustom [API change]. Users can now do `-f myreader.lua` and pandoc will treat the script myreader.lua as a custom reader, which parses an input string to a pandoc AST, using the pandoc module defined for Lua filters. A sample custom reader can be found in data/reader.lua. Closes #7669.
2021-11-05Support for <indexterm>s when reading DocBook (#7607)Rowan Rodrik van der Molen1-4/+37
* Support for <indexterm>s when reading DocBook * Update implementation status of `<n-ary>` tags * Remove non-idiomatic parentheses * More complete `<indexterm>` support, with tests Co-authored-by: Rowan Rodrik van der Molen <rowan@ytec.nl>
2021-11-02Markdown reader: Improve inlinesInBalancedBrackets.John MacFarlane1-20/+12
This is just a small improvement in terms of performance, but it's simpler and more direct code. Also, we avoid parsing interparagraph spaces in balanced brackets, as the original did.
2021-11-02Docx reader: don't let first line indents trigger block quotes.John MacFarlane1-3/+2
This fixes a regression introduced in pandoc 2.15 by PR #7606. Closes #7655.
2021-10-27Switch back from HsYAML to yaml.John MacFarlane2-116/+62
Reasons: - Performance: HsYAML is around 20 times slower in parsing large YAML bibliographies (#6084). - An issue was submitted to HsYAML, but it hasn't gotten any attention. HsYAML seems borderline unmaintained; it hasn't had a commit in over a year. - Unfortunately this goes back on our attempts to free ourselves from C dependencies (#4535). But I don't see a better alternative until a better pure Haskell parser is available. Closes #6084. Notes: - We've removed the FromYAML instances for all types that had them, since this is a HsYAML-specific typeclass [API change]. (The yaml package just uses From/ToJSON.) - Unlike HsYAML (in the configuration we were using), yaml parses 'Y', 'N', 'Yes', 'No', 'On', 'Off' as boolean values. Users may need to quote these when they are meant to be interpreted as strings. Similarly, 'null' is parsed as a YAML null value (and will be treated as an empty string by pandoc rather than the string 'null'). Quoting it will force it to be interpreted as a string. - Some tests had to be adjusted accordingly. - Pandoc now behaves better when the YAML metadata contains escaping errors: instead of just falling back on treating the section as a table, it raises a YAML parsing error.
2021-10-22Org reader: allow an initial :PROPERTIES: drawer to add to metadata.John MacFarlane1-2/+10
Closes #7520.
2021-10-22Use simpleFigure in Readers.Aner Lucero6-44/+50
2021-10-20Markdown reader: don't parse links or bracketed spans as citations.John MacFarlane1-2/+4
Previously pandoc would parse [link to (@a)](url) as a citation; similarly [(@a)]{#ident} This is undesirable. One should be able to use example references in citations, and even if `@a` is not defined as an example reference, `[@a](url)` should be a link containing an author-in-text citation rather than a normal citation followed by literal `(url)`. Closes #7632.
2021-10-18Docx reader: fix handling of empty fieldsMilan Bracke1-0/+4
Some fields only have an instrText and no content, Pandoc didn't understand these, causing other fields to be misunderstood because it seemed like a field was still open when it wasn't.
2021-10-18Docx parser: implement PAGEREF fieldsMilan Bracke2-0/+26
These fields, often used in tables of contents, can be a hyperlink.
2021-10-18Docx reader: fix handling of nested fieldsMilan Bracke2-115/+150
Fields delimited by fldChar elements can contain other fields. Before, the nested fields would be ignored, except for the end, which would be considered the end of the parent field. To fix this issue, fields needed to be considered containing ParParts instead of Runs, since a Run can't represent complex enough structures. This also impacted Hyperlinks since they can originate from a field.
2021-10-14DocBook reader: honor linenumbering attributeSamuel Tardieu1-0/+1
The attribute DocBook linenumbering="numbered" attribute on code blocks maps to "numberLines" internally.
2021-10-13Fix markdown parsing bug for math in bracketed spans and links.John MacFarlane1-0/+1
This affects math with unbalanced brackets (e.g. `$(0,1]$`) inside links, images, bracketed spans. Closes #7623.
2021-10-11LaTeX reader: Implement siunitx v3 commands.John MacFarlane1-1/+5
We support `\unit`, `\qty`, `\qtyrange`, and `\qtylist` as synonynms of `\si`, `\SI`, `\SIrange`, and `\SIlist`. Closes #7614.
2021-10-10Avoid blockquote when parent style has more indentMilan Bracke3-53/+66
When a paragraph has an indentation different from the parent (named) style, it used to be considered a blockquote. But this only makes sense when the paragraph has more indentation. So this commit adds a check for the indentation of the parent style.
2021-10-10LaTeX reader: Properly handle `\^` followed by group closing.John MacFarlane1-3/+3
Closes #7615.
2021-09-30Docx reader: Add placeholder for word diagramEzwal2-0/+17
2021-09-23HTML reader: handle empty tbody element in table.John MacFarlane1-5/+8
Closes #7589.
2021-09-19LaTeX reader: Recognize that `\vadjust` sometimes takes "pre".John MacFarlane1-0/+7
Closes #7531.
2021-09-19Ignore (and gobble parameters of) CSLReferences environment.John MacFarlane1-0/+1
Otherwise we get the parameters as numbers in the output. Closes #7531.
2021-09-17Fix linter warning.John MacFarlane1-4/+3
2021-09-16Fix code blocks using `--preserve-tabs`.John MacFarlane1-1/+7
Previously they did not behave as the equivalent input with spaces would. Closes #7573.
2021-09-13RST reader: handle escaped colons in reference definitions.John MacFarlane1-1/+2
Cloess #7568.
2021-09-10feat(ipynb reader): get cell output mime from raw_mimetype tooKolen Cheung1-1/+2
While the spec defined format, in practice raw_mimetype is used. See jupyter/nbformat#229
2021-09-10feat(ipynb reader): add more Jupyter's "Raw NBConvert Format"Kolen Cheung1-6/+10
This adds most of the available formats selectable from Jupyter's interface "Raw NBConvert Format".
2021-09-10fix!: rst mime typeKolen Cheung1-1/+1
BREAKING CHANGE: fix rst mime type according to https://docutils.sourceforge.io/FAQ.html
2021-09-10Remove redundant import.John MacFarlane1-1/+1
2021-09-10Org reader: don't parse a list as first item in a list item.John MacFarlane1-1/+4
Closes #7557.
2021-09-10Ipynb reader handleData: support text/markdown (#7561)Kolen Cheung1-0/+3
`text/markdown` is now a supported mime type for raw output.
2021-09-08RTF reader: support `\binN` for binary image data.John MacFarlane1-11/+22
2021-09-04RTF reader: better handling of `\*` and bookmarks.John MacFarlane1-8/+8
We now ensure that groups starting with `\*` never cause text to be added to the document. In addition, bookmarks now create a span between the start and end of the bookmark, rather than an empty span.
2021-09-04Minor renaming to avoid shadowing.John MacFarlane1-2/+2
2021-09-03RTF reader: if doc begins with {\rtf1 ... } only parse its contents.John MacFarlane1-1/+7
Some documents seem to have non-RTF (e.g. XML) material after the `{\rtf1 ... }` group.
2021-09-03RTF reader: Ignore `\pgdsc` group.John MacFarlane1-0/+1
Otherwise we get style names treated as test.
2021-08-23Markdown reader: fix interaction of --strip-comments and listJohn MacFarlane1-1/+1
parsing. Use of `--strip-comments` was causing tight lists to be rendered as loose (as if the comment were a blank line). Closes #7521.
2021-08-21LaTeX-parser: restrict \endinput to current fileSimon Schuster2-1/+9
2021-08-20RST reader: Fix `:literal:` includes.John MacFarlane1-5/+2
These should create code blocks, not insert raw RST. Closes #7513.
2021-08-19Improve docx reader's robustness in extracting images.John MacFarlane1-5/+6
The docx reader made a couple assumptions about how docx containers were laid out that were not always true, with the result that some images in documents did not get found/extracted. Closes #7511.
2021-08-16Fix bug in last commit due to removal of take1WhileP.John MacFarlane1-2/+2