aboutsummaryrefslogtreecommitdiff
path: root/src/Text/Pandoc/Readers
AgeCommit message (Collapse)AuthorFilesLines
2021-12-29Merge https://github.com/jgm/pandocIgor Pashev28-631/+2351
2021-12-28Use `splitDirectories` istead of `splitPath`.John MacFarlane1-1/+1
We were using `splitPath` in two places in the code where `splitDirectories` should have been used. This led to a test for `..` in paths in `extractMedia` failing, so that images with `..` in the path name could be extracted outside the directory specified by `extractMedia`. It also led a test for `media` in resource paths to fail in the docx reader.
2021-12-14Org reader: parse official org-cite citations.John MacFarlane1-39/+160
We also support the older org-ref style as a fallback. We no longer support the "markdown-style" citations. See #7329.
2021-12-14Org reader: remove support for "Berkeley style" citations.John MacFarlane1-145/+42
See #7329.
2021-12-13Markdown reader: fix parsing of "bare locators"...John MacFarlane1-1/+1
...after author-in-text citations. Previously `@item [p. 12; @item2]` was incorrectly parsed as three citations rather than two. This is now fixed by ensuring that `prefix` doesn't gobble any semicolons.
2021-12-11Custom reader: ensure old Readers continue to workAlbert Krewinkel1-16/+47
Retry conversion by passing a string instead of sources when the `Reader` fails with a message that hints at an outdated function. A deprecation notice is reported in that case.
2021-12-11Custom reader: pass list of sources instead of concatenated textAlbert Krewinkel1-6/+4
The first argument passed to Lua `Reader` functions is no longer a plain string but a richer data structure. The structure can easily be converted to a string by applying `tostring`, but is also a list with elements that contain each the *text* and *name* of each input source as a property of the respective name. A small example is added to the custom reader documentation, showcasing its use in a reader that creates a syntax-highlighted code block for each source code file passed as input. Existing readers must be updated.
2021-12-07Revert "Markdown reader: Improve inlinesInBalancedBrackets."John MacFarlane1-12/+20
This reverts commit fa83246d7de8527bbf59dfac9636a42ede185194.
2021-12-06Ipynb reader & writer: properly handle cell "id".John MacFarlane1-9/+13
This is passed through if it exists (in Nb4); otherwise the writer will add a random one so that cells all have an "id". Closes #7728.
2021-11-30Markdown reader: don't allow `^` at beginning of link or image label.John MacFarlane1-2/+1
This is reserved for footnotes. Fixes a regression introduced by 0a93acf. Closes #7723.
2021-11-24LaTeX reader: Fix semantics of `\ref`.John MacFarlane1-5/+3
We were including the ams environment type in addition to the number. This is proper behavior for `\cref` but not for `\ref`. To support `\cref` we need to store the environment label separately.
2021-11-24LaTeX reader: improve references.John MacFarlane4-5/+27
- Resolve references to theorem environments. - Remove Span caused by "label" in figure, table, and theorem environments; this had an id that duplicated the environments' id. See #813.
2021-11-24LaTeX reader: omit visible content for `\label{...}`.John MacFarlane1-2/+1
Previously we included the text of the label in square brackets, but this is undesirable in many cases. See discussion in <https://github.com/jgm/pandoc/issues/813#issuecomment-978232426>.
2021-11-24HTML reader: parse attributes on links and images.John MacFarlane2-11/+10
Closes #6970.
2021-11-23Improve detection of pipe table line widths.John MacFarlane1-14/+18
Fixed calculation of maximum column widths in pipe tables. It is now based on the length of the markdown line, rather than a "stringified" version of the parsed line. This should be more predictable for users. In addition, we take into account double-wide characters such as emojis. Closes #7713.
2021-11-21yamlBsToRefs: allow multiple YAML documents.John MacFarlane1-2/+2
Some people use `---` as the end delimiter in YAML bibliography files, which causes the `yaml` library to emit an error unless we explicitly allow multiple YAML documents (and just consider the first). In T.P.Readers.Metadata
2021-11-20Capture `alt-text` in JATS figures (#7703)Albert Krewinkel1-2/+13
Co-authored-by: Aner Lucero <4rgento@gmail.com>
2021-11-18RST reader: handle class attribute for for custom roles (#7700)willj-dev1-8/+16
Previously the class attribute was ignored, and the name of the role used as the class. Closes #7699.
2021-11-15LaTeX reader: add rudimentary support for `\autoref` (#7693)Albert Krewinkel1-0/+1
2021-11-09Accept empty `--metadata-file`.John MacFarlane1-0/+1
Closes #7675. This is a regression from 2.15 behavior.
2021-11-08Add `<titleabbr>` support to DocBook readerRowan Rodrik van der Molen1-5/+12
2021-11-07LaTeX reader: add 'uri' class when parsing `\url`.John MacFarlane1-2/+2
Closes #7672.
2021-11-06Pass ReaderOptions to custom readers as second parameter.John MacFarlane1-4/+3
2021-11-05Add interface for custom readers written in Lua. (#7671)John MacFarlane1-0/+55
New module Text.Pandoc.Readers.Custom, exporting readCustom [API change]. Users can now do `-f myreader.lua` and pandoc will treat the script myreader.lua as a custom reader, which parses an input string to a pandoc AST, using the pandoc module defined for Lua filters. A sample custom reader can be found in data/reader.lua. Closes #7669.
2021-11-05Support for <indexterm>s when reading DocBook (#7607)Rowan Rodrik van der Molen1-4/+37
* Support for <indexterm>s when reading DocBook * Update implementation status of `<n-ary>` tags * Remove non-idiomatic parentheses * More complete `<indexterm>` support, with tests Co-authored-by: Rowan Rodrik van der Molen <rowan@ytec.nl>
2021-11-02Markdown reader: Improve inlinesInBalancedBrackets.John MacFarlane1-20/+12
This is just a small improvement in terms of performance, but it's simpler and more direct code. Also, we avoid parsing interparagraph spaces in balanced brackets, as the original did.
2021-11-02Docx reader: don't let first line indents trigger block quotes.John MacFarlane1-3/+2
This fixes a regression introduced in pandoc 2.15 by PR #7606. Closes #7655.
2021-10-27Switch back from HsYAML to yaml.John MacFarlane2-116/+62
Reasons: - Performance: HsYAML is around 20 times slower in parsing large YAML bibliographies (#6084). - An issue was submitted to HsYAML, but it hasn't gotten any attention. HsYAML seems borderline unmaintained; it hasn't had a commit in over a year. - Unfortunately this goes back on our attempts to free ourselves from C dependencies (#4535). But I don't see a better alternative until a better pure Haskell parser is available. Closes #6084. Notes: - We've removed the FromYAML instances for all types that had them, since this is a HsYAML-specific typeclass [API change]. (The yaml package just uses From/ToJSON.) - Unlike HsYAML (in the configuration we were using), yaml parses 'Y', 'N', 'Yes', 'No', 'On', 'Off' as boolean values. Users may need to quote these when they are meant to be interpreted as strings. Similarly, 'null' is parsed as a YAML null value (and will be treated as an empty string by pandoc rather than the string 'null'). Quoting it will force it to be interpreted as a string. - Some tests had to be adjusted accordingly. - Pandoc now behaves better when the YAML metadata contains escaping errors: instead of just falling back on treating the section as a table, it raises a YAML parsing error.
2021-10-22Org reader: allow an initial :PROPERTIES: drawer to add to metadata.John MacFarlane1-2/+10
Closes #7520.
2021-10-22Use simpleFigure in Readers.Aner Lucero6-44/+50
2021-10-20Markdown reader: don't parse links or bracketed spans as citations.John MacFarlane1-2/+4
Previously pandoc would parse [link to (@a)](url) as a citation; similarly [(@a)]{#ident} This is undesirable. One should be able to use example references in citations, and even if `@a` is not defined as an example reference, `[@a](url)` should be a link containing an author-in-text citation rather than a normal citation followed by literal `(url)`. Closes #7632.
2021-10-18Docx reader: fix handling of empty fieldsMilan Bracke1-0/+4
Some fields only have an instrText and no content, Pandoc didn't understand these, causing other fields to be misunderstood because it seemed like a field was still open when it wasn't.
2021-10-18Docx parser: implement PAGEREF fieldsMilan Bracke2-0/+26
These fields, often used in tables of contents, can be a hyperlink.
2021-10-18Docx reader: fix handling of nested fieldsMilan Bracke2-115/+150
Fields delimited by fldChar elements can contain other fields. Before, the nested fields would be ignored, except for the end, which would be considered the end of the parent field. To fix this issue, fields needed to be considered containing ParParts instead of Runs, since a Run can't represent complex enough structures. This also impacted Hyperlinks since they can originate from a field.
2021-10-14DocBook reader: honor linenumbering attributeSamuel Tardieu1-0/+1
The attribute DocBook linenumbering="numbered" attribute on code blocks maps to "numberLines" internally.
2021-10-13Fix markdown parsing bug for math in bracketed spans and links.John MacFarlane1-0/+1
This affects math with unbalanced brackets (e.g. `$(0,1]$`) inside links, images, bracketed spans. Closes #7623.
2021-10-11LaTeX reader: Implement siunitx v3 commands.John MacFarlane1-1/+5
We support `\unit`, `\qty`, `\qtyrange`, and `\qtylist` as synonynms of `\si`, `\SI`, `\SIrange`, and `\SIlist`. Closes #7614.
2021-10-10Avoid blockquote when parent style has more indentMilan Bracke3-53/+66
When a paragraph has an indentation different from the parent (named) style, it used to be considered a blockquote. But this only makes sense when the paragraph has more indentation. So this commit adds a check for the indentation of the parent style.
2021-10-10LaTeX reader: Properly handle `\^` followed by group closing.John MacFarlane1-3/+3
Closes #7615.
2021-09-30Docx reader: Add placeholder for word diagramEzwal2-0/+17
2021-09-23HTML reader: handle empty tbody element in table.John MacFarlane1-5/+8
Closes #7589.
2021-09-19LaTeX reader: Recognize that `\vadjust` sometimes takes "pre".John MacFarlane1-0/+7
Closes #7531.
2021-09-19Ignore (and gobble parameters of) CSLReferences environment.John MacFarlane1-0/+1
Otherwise we get the parameters as numbers in the output. Closes #7531.
2021-09-17Fix linter warning.John MacFarlane1-4/+3
2021-09-16Fix code blocks using `--preserve-tabs`.John MacFarlane1-1/+7
Previously they did not behave as the equivalent input with spaces would. Closes #7573.
2021-09-13RST reader: handle escaped colons in reference definitions.John MacFarlane1-1/+2
Cloess #7568.
2021-09-10feat(ipynb reader): get cell output mime from raw_mimetype tooKolen Cheung1-1/+2
While the spec defined format, in practice raw_mimetype is used. See jupyter/nbformat#229
2021-09-10feat(ipynb reader): add more Jupyter's "Raw NBConvert Format"Kolen Cheung1-6/+10
This adds most of the available formats selectable from Jupyter's interface "Raw NBConvert Format".
2021-09-10fix!: rst mime typeKolen Cheung1-1/+1
BREAKING CHANGE: fix rst mime type according to https://docutils.sourceforge.io/FAQ.html
2021-09-10Remove redundant import.John MacFarlane1-1/+1