aboutsummaryrefslogtreecommitdiff
path: root/src/Text/Pandoc
AgeCommit message (Collapse)AuthorFilesLines
2021-08-20RST reader: Fix `:literal:` includes.John MacFarlane1-5/+2
These should create code blocks, not insert raw RST. Closes #7513.
2021-08-19Improve docx reader's robustness in extracting images.John MacFarlane1-5/+6
The docx reader made a couple assumptions about how docx containers were laid out that were not always true, with the result that some images in documents did not get found/extracted. Closes #7511.
2021-08-18pptx: Include image title in descriptionEmily Bourke2-12/+19
The image title (i.e. `![alt text](link "title")`) was previously ignored when writing to pptx. This commit includes it in PowerPoint's description of the image, along with the link (which was already included). Fixes 7352.
2021-08-17Revise citeproc code to fit new citeproc 0.5 API.John MacFarlane3-43/+13
Linkification of URLs in the bibliography is now done in the citeproc library, depending on the setting of an option. We set that option depending on the value of the metadata field `link-bibliography` (defaulting to true, for consistency with earlier behavior, though the new behavior includes the CSL draft recommendation of hyperlinking the title or the whole entry if a DOI, PMID, PMCID, or URL field is present but not explicitly rendered). These changes implement the following recommendations from the draft CSL v1.0.2 spec (Appendix VI): > The CSL syntax does not have support for configuration of links. > However, processors should include links on bibliographic references, > using the following rules: > If the bibliography entry for an item renders any of the following > identifiers, the identifier should be anchored as a link, with the > target of the link as follows: > - url: output as is > - doi: prepend with "`https://doi.org/`" > - pmid: prepend with "`https://www.ncbi.nlm.nih.gov/pubmed/`" > - pmcid: prepend with "`https://www.ncbi.nlm.nih.gov/pmc/articles/`" > If the identifier is rendered as a URI, include rendered URI components > (e.g. "`https://doi.org/`") in the link anchor. Do not include any other > affix text in the link anchor (e.g. "Available from: ", "doi: ", "PMID: "). > If the bibliography entry for an item does not render any of > the above identifiers, then set the anchor of the link as the item > title. If title is not rendered, then set the anchor of the link as the > full bibliography entry for the item. Set the target of the link as one > of the following, in order of priority: > > - doi: prepend with "`https://doi.org/`" > - pmcid: prepend with "`https://www.ncbi.nlm.nih.gov/pmc/articles/`" > - pmid: prepend with "`https://www.ncbi.nlm.nih.gov/pubmed/`" > - url: output as is > > If the item data does not include any of the above identifiers, do not > include a link. > > Citation processors should include an option flag for calling > applications to disable bibliography linking behavior. Thanks to Benjamin Bray for getting this all working.
2021-08-17Rename TemplateWarning -> PowerpointTemplateWarning.John MacFarlane2-6/+7
@undergroundquizscene - I think TemplateWarning is apt to be confusing, since this actually doesn't have anything to do with what we call 'templates' in pandoc. Hence the change to a powerpoint-specific name.
2021-08-17pptx: Select layouts from reference doc by nameEmily Bourke1-19/+206
Until now, users had to make sure that their reference doc contains layouts in a specific order: the first four layouts in the file had to have a specific structure, or else pandoc would error (or sometimes successfully produce a pptx file, which PowerPoint would then fail to open). This commit changes the layout selection to use the layout names rather than order: users must make sure their reference doc contains four layouts with specific names, and if a layout with the right name isn’t found pandoc will output a warning and use the corresponding layout from the default reference doc as a fallback. I believe the use of names rather than order will be clearer to users, and the clearer errors will help them troubleshoot when things go wrong. - Add tests for moved layouts - Add tests for deleted layouts - Add newly included layouts to slideMaster1.xml to fix tests
2021-08-17Add TemplateWarning log message type [API change]Emily Bourke1-0/+6
This is a general warning to use for messages about templates.
2021-08-17Escape backslashes in haddock comments (#7505)Emily Bourke1-4/+4
Any literal backslash needs to be escaped: these are currently showing up as “‘r’” instead of “‘\r’”. Co-authored-by: Emily Bourke <undergroundquizscene@protonmail.com>
2021-08-16Fix bug in last commit due to removal of take1WhileP.John MacFarlane1-2/+2
2021-08-15Multimarkdown sub- and superscripts (#5512) (#7188)OCzarnecki2-15/+20
Added an extension `short_subsuperscripts` which modifies the behavior of `subscript` and `superscript`, allowing subscripts or superscripts containing only alphanumerics to end with a space character (eg. `x^2 = 4` or `H~2 is combustible`). This improves support for multimarkdown. Closes #5512. Add `Ext_short_subsuperscripts` constructor to `Extension` [API change]. This is enabled by default for `markdown_mmd`.
2021-08-15Make docx writer sensitive to `native_numbering` extension.John MacFarlane3-12/+22
Figure and table numbers are now only included if `native_numbering` is enabled. (By default it is disabled.) This is a behavior change with respect to 2.14.1, but the behavior is that of previous versions. The change was necessary to avoid incompatibilities between pandoc's native numbering and third-party cross reference filters like pandoc-crossref. Closes #7499.
2021-08-13Convert Quoted in bib entries to special Spans...John MacFarlane1-1/+3
before passing them off to citeproc. This ensures that we get proper localization and flipflopping if, e.g., quotes are used in titles. Closes jgm/citeproc#87.
2021-08-13Citeproc: avoid odd handling of quotes.John MacFarlane1-1/+6
citeproc changes allow us to ignore Quoted elements; citeproc now uses its own method for represented quoted things, and only localizes and flipflops quotes it adds itself. See #87. The one thing left to do is to convert Quoted elements in bibliography databases (esp. titles) to `Span ("",["csl-quoted"],[])` before passing them to citeproc, IF the localized quotes for the quote type match the standard inverted commas.
2021-08-13Removed quote localization from citeproc processing.John MacFarlane1-20/+1
This is now done in citeproc itself.
2021-08-13Fix raw LaTeX injection issue (LaTeX writer).John MacFarlane1-5/+10
Using a code block containing `\end{verbatim}`, one could inject raw TeX into a LaTeX document even when `raw_tex` is disabled. Thanks to Augustin Laville for noticing the bug. Closes #7497.
2021-08-13LaTeX reader: proper implicit grouping around environment macros.John MacFarlane1-1/+2
2021-08-12Use Prelude from base-compat for ghc 8.4 too.John MacFarlane1-5/+1
We were having trouble building on ghc 8.4 because of the lack of a Foldable instance for (Alt Maybe) in base < 4.12. Mystery: for some reason our builds were failing for gitit but not in the pandoc CI.
2021-08-11Try fixing compile error on older ghcs.John MacFarlane1-1/+5
See https://github.com/jgm/gitit/runs/3308381697
2021-08-11Fix some lint issues.John MacFarlane2-6/+5
2021-08-11LaTeX reader: Support `\global` before `\def`, `\let`, etc.John MacFarlane1-2/+10
See #7494.
2021-08-11Fix scope for LaTeX macros.John MacFarlane3-55/+100
They should by default scope over the group in which they are defined (except `\gdef` and `\xdef`, which are global). In addition, environments must be treated as groups. We handle this by making sMacros in the LaTeX parser state a STACK of macro tables. Opening a group adds a table to the stack, closing one removes one. Only the top of the stack is queried. This commit adds a parameter for scope to the Macro constructor (not exported). Closes #7494.
2021-08-11LaTeX reader: improve handling of plain TeX macro primitives.John MacFarlane2-6/+29
- Fixed semantics for `\let`. - Implement `\edef`, `\gdef`, and `\xdef`. - Add comment noting that currently `\def` and `\edef` set global macros (so are equivalent to `\gdef` and `\xdef`). This should be fixed by scoping macro definitions to groups, in a future commit. Closes #7474.
2021-08-10HTML reader: treat commments as blank when parsing.John MacFarlane1-5/+7
This modifies pBlank. Previously comments could sometimes flummox the parser. Cloes #7482.
2021-08-10Fix RTF table parsing bug that created undesired nested tables.John MacFarlane1-1/+1
Closes #7488.
2021-08-10Add RTF reader.John MacFarlane2-0/+1336
- `rtf` is now supported as an input format as well as output. - New module Text.Pandoc.Readers.RTF (exporting `readRTF`). [API change] Closes #3982.
2021-08-08Allow `--slide-level=0`.John MacFarlane1-2/+2
When the slide level is set to 0, headings won't be used at all in splitting the document into slides. Horizontal rules must be used to separate slides. Closes #7476.
2021-08-04RTF writer: emit \outlinelevel for section headings.John MacFarlane1-1/+2
2021-08-03Stop using the HTTP package. (#7456)mt_caret6-11/+29
We only depend on the urlEncode function in the package, which is also provided by http-types. The HTTP package also depends on the network package, which has difficulty building on ghcjs. Add internal module Text.Pandoc.Network.HTTP, exporting `urlEncode`.
2021-08-03LaTeX table writer: Increase column width precision (#7466)Peter Fabinski1-1/+1
In some cases, the rounding performed by the LaTeX table writer would introduce visible overrun outside the text area. This adds two more decimal places to the width values.
2021-08-01RTF writer: omit `\bin` in `\pict`.John MacFarlane1-1/+1
According to the spec, this is not needed or wanted when the data is in hexadecimal format, as it is here.
2021-07-29parseFromString: preserve at least the source directory.John MacFarlane1-1/+1
Previously we just set the source name to "chunk" when parsing from strings, to avoid misleading source positions. This had the side effect that `rebase_relative_paths` would break inside sections that were parsed as strings. So, now we use "ORIGINAL_SOURCE_PATH_chunk" instead of just "chunk". Closes #7464.
2021-07-22LaTeX writer: Use ulem for underline.John MacFarlane1-1/+3
ulem is conditionally included already when the `strikeout` variable is set, so we set this when there is underlined text, and use `\uline` instead of `\underline`. This fixes wrapping for underlined text. Closes #7351.
2021-07-22MIME: use image/x-xcf instead of application/x-xcf.John MacFarlane1-1/+1
Closes #7454.
2021-07-17LaTeX reader: avoid trailing hyphen in translating languages.John MacFarlane1-2/+2
Previously `\foreignlanguage{english}` turned into `<span lang="en-">`. The same issue affected Arabic. Closes #7447.
2021-07-16DocBook reader: handle images with imageobjectco elements.John MacFarlane1-3/+3
Closes #7440.
2021-07-16LaTeX reader: Support `\cline` in LaTeX tables.John MacFarlane1-0/+1
Closes #7442.
2021-07-16PDF: Fix svgIn path error.John MacFarlane1-1/+1
We were duplicating the temp directory; this didn't show up on macOS or linux because there we use absolute paths for the temp directory. Closes #7431.
2021-07-11DocBook reader: add support for citerefentry (#7437)Jan Tojnar1-1/+5
Originally intended for referring to UNIX manual pages, either part of the same DocBook document as refentry element, or external – hence the manvolnum element. These days, refentry is more general, for example the element documentation pages linked below are each a refentry. As per the *Processing expectations* section of citerefentry, the element is supposed to be a hyperlink to a refentry (when in the same document) but pandoc does not support refentry tag at the moment so that is moot. https://tdg.docbook.org/tdg/5.1/citerefentry.html https://tdg.docbook.org/tdg/5.1/manvolnum.html https://tdg.docbook.org/tdg/5.1/refentry.html This roughly corresponds to a `manpage` role in rST syntax, which produces a `Code` AST node with attributes `.interpreted-text role=manpage` but that does not fit DocBook parser. https://www.sphinx-doc.org/en/master/usage/restructuredtext/roles.html#role-manpage
2021-07-11Improved parsing of raw LaTeX from Text streams (rawLaTeXParser).John MacFarlane2-11/+37
We now use source positions from the token stream to tell us how much of the text stream to consume. Getting this to work required a few other changes to make token source positions accurate. Closes #7434.
2021-07-09Always use / when adding directory to image path with extractMedia.John MacFarlane1-1/+1
Even on Windows. May help with #7431.
2021-07-09RST reader: fix regression with code includes.John MacFarlane1-1/+5
With the recent changes to include infrastructure, included code blocks were getting an extra newline. Closes #7436. Added regression test.
2021-07-07Don't incorporate externally linked images in EPUB documents (#7430)Michael Hoffmann1-1/+2
Just like it is possible to avoid incorporating an image in EPUB by passing `data-external="1"` to a raw HTML snippet, this makes the same possible for native Images, by looking for an associated `external` attribute.
2021-07-06Recognize data-external when reading HTML img tags (#7429)Michael Hoffmann1-8/+3
Preserve all attributes in img tags. If attributes have a `data-` prefix, it will be stripped. In particular, this preserves a `data-external` attribute as an `external` attribute in the pandoc AST.
2021-07-06T.P.PDF, convertImage: normalize paths.John MacFarlane1-3/+3
This will avoid paths on Windows with mixed path separators, which may cause problems with SVG conversion. See #7431.
2021-07-06Markdown reader: don't try to read contents in self-closing HTML tag.John MacFarlane1-1/+4
Previously we had problems parsing raw HTML with self-closing tags like `<col/>`. The problem was that pandoc would look for a closing tag to close the markdown contents, but the closing tag had, in effect, already been parsed by `htmlTag`. This fixes the issue described in <https://groups.google.com/d/msgid/pandoc-discuss/297bc662-7841-4423-bcbb-534e99bbba09n%40googlegroups.com>.
2021-07-06HTML reader: add col, colgroup to 'closes' definitionsJohn MacFarlane1-1/+3
2021-07-05Add command test for #7394.John MacFarlane1-0/+1
And fix a small bug in handling of citations in notes, which led to commas at the end of sentences in some cases.
2021-07-05Citeproc: cleanup and efficiency improvement in deNote.John MacFarlane1-15/+21
2021-07-05Revamp note citation handling.John MacFarlane1-14/+30
Use latest citeproc, which uses a Span with a class rather than a Note for notes. This helps us distinguish between user notes and citation notes. Don't put citations at the beginning of a note in parentheses. (Closes #7394.)
2021-07-02HTML5 writer, remove aria-hidden when explicit atl text is provided.Aner Lucero1-4/+7