aboutsummaryrefslogtreecommitdiff
path: root/src/Text/Pandoc/Readers
AgeCommit message (Collapse)AuthorFilesLines
2020-11-26HTML reader: allow finer grained options for tag omissionAlbert Krewinkel3-13/+26
2020-11-25HTML reader: simplify list attribute handlingAlbert Krewinkel1-18/+9
This removes the `foldOrElse` function from the internal Text.Pandoc.CSS module.
2020-11-24HTML reader: support row or column-spanning table cellsAlbert Krewinkel2-28/+26
2020-11-24HTML reader: support blocks in captionAlbert Krewinkel2-6/+6
2020-11-24HTML reader: extract table parsing into separate moduleAlbert Krewinkel3-95/+140
2020-11-23HTML reader: extract submodulesAlbert Krewinkel4-239/+342
Reducing module size should reduce memory use during compilation. This is preparatory work to tackle support for more table features.
2020-11-22Org reader: parse `#+LANGUAGE` into `lang` metadata fieldAlbert Krewinkel1-0/+2
Fixes: #6845
2020-11-21LaTeX reader: more robust parsing of bracketed options.John MacFarlane1-3/+8
Improves on 9a40976. Closes #6873.
2020-11-20DocBook reader: Table text width support (#6791)Nils Carlson1-2/+12
Table width in relation to text width is not natively supported by docbook but is by the docbook fo stylesheets through an XML processing instruction, <?dbfo table-width="50%"?> . Implement support for this instruction in the DocBook reader.
2020-11-20Improve LaTeX option parsing...John MacFarlane1-1/+3
in cases where we run into trouble parsing inlines til the closing `]`, e.g. quotes, we return a plain string with the option contents. Previously we mistakenly included the brackets in this string. Closes #6869.
2020-11-19DocBook reader: drop period in formalpara title...John MacFarlane1-2/+2
...and put it in a div with class `formalpara-title`, so that people can reformat with filters. Closes #6562. Thanks to rdmuller.
2020-11-18Man reader: improve handling of .IP.John MacFarlane1-5/+19
We now better handle `.IP` when it is used with non-bullet, non-numbered lists, creating a definition list. We also skip blank lines like groff itself. Closes #6858.
2020-11-18Replace org #+KEYWORDS with #+keywordsTEC1-11/+11
As of ~2 years ago, lower case keywords became the standard (though they are handled case insensitive, as always): https://code.orgmode.org/bzg/org-mode/commit/13424336a6f30c50952d291e7a82906c1210daf0 Upper case keywords are exclusive to the manual: - https://orgmode.org/list/871s50zn6p.fsf@nicolasgoaziou.fr/ - https://orgmode.org/list/87tuuw3n15.fsf@nicolasgoaziou.fr/
2020-11-17Bibtex reader: fall back on en-US if locale for LANG not found.John MacFarlane1-1/+4
This reproduces earlier pandoc-citeproc behavior. Closes jgm/citeproc#26.
2020-11-17Markdown reader: fix regression with example list references.John MacFarlane1-1/+5
This affects example list references followed by dashes. Introduced by commit b8d17f7. Closes #6855.
2020-11-16Move getNextNumber from Readers.LaTeX to Readers.LaTeX.Parsing.John MacFarlane2-26/+26
2020-11-16Improve fix to siunitx numbers with minus.John MacFarlane1-1/+1
- use real minus sign - use tests contributed by Igor Pashev.
2020-11-16LaTeX reader: Fix negative numbers in siunitx commands.John MacFarlane1-2/+4
The commit a157e1a broke negative numbers, e.g. `\SI{-33}{\celcius}` or `\num{-3}`. This fixes the regression.
2020-11-15Markdown reader: fix detection of locators following in-text citations.John MacFarlane1-27/+30
Prevously, if we had `@foo [p. 33; @bar]`, the `p. 33` would be incorrectly parsed as a prefix of `@bar` rather than a suffix of `@foo`.
2020-11-14Markdown reader: don't increment stateNoteNumber for example refs.John MacFarlane1-0/+12
Background: syntactically, references to example list items can't be distinguished from citations; we only know which they are after we've parsed the whole document (and this is resolved in the `runF` stage). This means that pandoc's calculation of `citationNoteNum` can sometimes be wrong when there are example list references. This commit partially addresses #6836, but only for the case where the example list references refer to list items defined previously in the document.
2020-11-10Remove redundant bracket.John MacFarlane1-1/+1
2020-11-10Fix corner case in YAML metadata parsing.John MacFarlane1-1/+4
Previously YAML metadata would sometimes not get recognized if a field ended with a newline followed by spaces. Closes #6823.
2020-11-07Lint code in PRs and when committing to master (#6790)Albert Krewinkel10-20/+15
* Remove unused LANGUAGE pragmata * Apply HLint suggestions * Configure HLint to ignore some warnings * Lint code when committing to master
2020-11-05LaTeX reader: better handling of `\\` inside math in table cells.John MacFarlane1-0/+2
Previously this confused the table parser. Closes #6811.
2020-11-03Properly support optional cite argument for `\blockquote`.John MacFarlane1-7/+8
(LaTeX reader) Closes #6802.
2020-11-02LaTeX reader: fix bug parsing macro arguments.John MacFarlane1-1/+5
If `\cL` is defined as `\mathcal{L}`, and `\til` as `\tilde{#1}`, then `\til\cL` should expand to `\tilde{\mathcal{L}}`, but pandoc was expanding it to `\tilde\mathcal{L}`. This is fixed by parsing the arguments in "verbatim mode" when the macro expands arguments at the point of use. Closes #6796.
2020-10-26DocBook Reader: fix duplicate bibliography bug (#6773)Nils Carlson1-5/+4
Also add unit test to ensure the behavior stays consistent.
2020-10-23HTML reader: Parse contents of iframes.John MacFarlane1-4/+17
See #6770.
2020-10-23HTML reader: parse inline svg as image...John MacFarlane1-0/+17
...unless `raw_html` is set in the reader (in which case the svg is passed through as raw HTML). Closes #6770.
2020-10-16DocBook reader: bibliomisc and anchor support (#6754)Nils Carlson1-3/+11
Also do some minor refactoring - bibliodiv without a title no longer results in an empty Header.
2020-10-14Fix typos in comments, doc strings, error messages, and testsAlbert Krewinkel3-3/+3
Typos reported by https://fossies.org/linux/test/pandoc-master.tar.gz/codespell.html See: #6738
2020-10-13LaTeX reader: support more acronym commands.John MacFarlane1-0/+10
`\acl`, `\aclp`, and capitalized versions of already supported commands. Closes #6746.
2020-10-12Commonmark reader: add pipe_table extension after defaults.John MacFarlane1-21/+22
Otherwise we get bad results for non-table, non-paragraph lines containing pipe characters. Closes #6739. See also jgm/commonmark-hs#52.
2020-10-10LaTeX reader: allow blank lines inside `\author`.John MacFarlane1-6/+3
2020-10-08LaTeX reader: Fix parsing of "show name" in newtheorem.John MacFarlane2-6/+7
Previously we were just treating it as a string and ignoring accents and formatting. See #6734.
2020-10-08Extend fix to #6719 to JATS readerJohn MacFarlane1-13/+13
2020-10-08DocBook reader: don't squelch space at end of emphasis element.John MacFarlane1-16/+16
Instead, include it after the emphasis. Closes #6719. Same fix was made for other inline elements, e.g. strikethrough.
2020-10-08Qualify some uses of fail to avoid ambiguity.John MacFarlane1-6/+6
2020-10-07Remove redundant import.John MacFarlane1-1/+0
2020-10-07Remove redundant import.John MacFarlane1-1/+1
2020-10-07Raise informative errors when YAML metadata parsing fails.John MacFarlane2-28/+28
Closes #6730. Previously the command would succeed, returning empty metadata, with no errors or warnings. API changes: - Remove now unused CouldNotParseYamlMetadata constructor for LogMessage (T.P.Logging). - Add 'Maybe FilePath' parameter to yamlToMeta in T.P.Readers.Markdown.
2020-10-06DOCX reader: Allow empty dates in comments and tracked changes (#6726)Diego Balseiro2-12/+17
For security reasons, some legal firms delete the date from comments and tracked changes. * Make date optional (Maybe) in tracked changes and comments datatypes * Add tests
2020-10-05Fixed regresison in last commit.John MacFarlane2-22/+43
Parsing of YAML bibliographies was broken; this fixes it.
2020-10-05Add yamlToRefs, yamlBsToRefs.John MacFarlane2-3/+69
T.P.Readers.Markdown now exports yamlToRefs. [API change] T.P.Readers.Metadata exports yamlBsToRefs. [API change] These allow specifying an id filter so we parse only references that are used in the document. Improves timing with a 3M yaml references file from 36s to 17s.
2020-09-25RST reader: apply `.. class::` directly to following Header.John MacFarlane1-1/+6
rather than creating a surrounding Div. Closes #6699.
2020-09-25Org reader: fix HLint warningsAlbert Krewinkel1-2/+2
2020-09-24DocBook reader: Implement table cell alignment (#6698)Nils Carlson1-6/+11
2020-09-21Add built-in citation support using new citeproc library.John MacFarlane3-0/+124
This deprecates the use of the external pandoc-citeproc filter; citation processing is now built in to pandoc. * Add dependency on citeproc library. * Add Text.Pandoc.Citeproc module (and some associated unexported modules under Text.Pandoc.Citeproc). Exports `processCitations`. [API change] * Add data files needed for Text.Pandoc.Citeproc: default.csl in the data directory, and a citeproc directory that is just used at compile-time. Note that we've added file-embed as a mandatory rather than a conditional depedency, because of the biblatex localization files. We might eventually want to use readDataFile for this, but it would take some code reorganization. * Text.Pandoc.Loging: Add `CiteprocWarning` to `LogMessage` and use it in `processCitations`. [API change] * Add tests from the pandoc-citeproc package as command tests (including some tests pandoc-citeproc did not pass). * Remove instructions for building pandoc-citeproc from CI and release binary build instructions. We will no longer distribute pandoc-citeproc. * Markdown reader: tweak abbreviation support. Don't insert a nonbreaking space after a potential abbreviation if it comes right before a note or citation. This messes up several things, including citeproc's moving of note citations. * Add `csljson` as and input and output format. This allows pandoc to convert between `csljson` and other bibliography formats, and to generate formatted versions of CSL JSON bibliographies. * Add module Text.Pandoc.Writers.CslJson, exporting `writeCslJson`. [API change] * Add module Text.Pandoc.Readers.CslJson, exporting `readCslJson`. [API change] * Added `bibtex`, `biblatex` as input formats. This allows pandoc to convert between BibLaTeX and BibTeX and other bibliography formats, and to generated formatted versions of BibTeX/BibLaTeX bibliographies. * Add module Text.Pandoc.Readers.BibTeX, exporting `readBibTeX` and `readBibLaTeX`. [API change] * Make "standalone" implicit if output format is a bibliography format. This is needed because pandoc readers for bibliography formats put the bibliographic information in the `references` field of metadata; and unless standalone is specified, metadata gets ignored. (TODO: This needs improvement. We should trigger standalone for the reader when the input format is bibliographic, and for the writer when the output format is markdown.) * Carry over `citationNoteNum` to `citationNoteNumber`. This was just ignored in pandoc-citeproc. * Text.Pandoc.Filter: Add `CiteprocFilter` constructor to Filter. [API change] This runs the processCitations transformation. We need to treat it like a filter so it can be placed in the sequence of filter runs (after some, before others). In FromYAML, this is parsed from `citeproc` or `{type: citeproc}`, so this special filter may be specified either way in a defaults file (or by `citeproc: true`, though this gives no control of positioning relative to other filters). TODO: we need to add something to the manual section on defaults files for this. * Add deprecation warning if `upandoc-citeproc` filter is used. * Add `--citeproc/-C` option to trigger citation processing. This behaves like a filter and will be positioned relative to filters as they appear on the command line. * Rewrote the manual on citatations, adding a dedicated Citations section which also includes some information formerly found in the pandoc-citeproc man page. * Look for CSL styles in the `csl` subdirectory of the pandoc user data directory. This changes the old pandoc-citeproc behavior, which looked in `~/.csl`. Users can simply symlink `~/.csl` to the `csl` subdirectory of their pandoc user data directory if they want the old behavior. * Add support for CSL bibliography entry formatting to LaTeX, HTML, Ms writers. Added CSL-related CSS to styles.html.
2020-09-21Markdown reader: Set citationNoteNum accurately in citations.John MacFarlane1-5/+26
This also changes stateLastNoteNumber -> stateNoteNumber.
2020-09-19Change deprecated Builder.isNull to null.John MacFarlane6-9/+9