aboutsummaryrefslogtreecommitdiff
path: root/test/Tests
AgeCommit message (Collapse)AuthorFilesLines
2021-07-06Recognize data-external when reading HTML img tags (#7429)Michael Hoffmann1-0/+6
Preserve all attributes in img tags. If attributes have a `data-` prefix, it will be stripped. In particular, this preserves a `data-external` attribute as an `external` attribute in the pandoc AST.
2021-06-29Docx writer: Add table numbering for captioned tables.John MacFarlane1-1/+3
The numbers are added using fields, so that Word can create a list of tables that will update automatically.
2021-06-05DocBook writer: Remove non-existent admonitionsJan Tojnar1-6/+6
attention, error and hint are actually just reStructuredText specific. danger was too until introduced in DocBook 5.2: https://github.com/docbook/docbook/issues/55
2021-05-29Reduce size of cover image in test epub.John MacFarlane1-1/+1
2021-05-28Docx reader: Support new table features.Emily Bourke1-0/+16
* Column spans * Row spans - The spec says that if the `val` attribute is ommitted, its value should be assumed to be `continue`, and that its values are restricted to {`restart`, `continue`}. If the value has any other value, I think it seems reasonable to default it to `continue`. It might cause problems if the spec is extended in the future by adding a third possible value, in which case this would probably give incorrect behaviour, and wouldn't error. * Allow multiple header rows * Include table description in simple caption - The table description element is like alt text for a table (along with the table caption element). It seems like we should include this somewhere, but I’m not 100% sure how – I’m pairing it with the simple caption for the moment. (Should it maybe go in the block caption instead?) * Detect table captions - Check for caption paragraph style /and/ either the simple or complex table field. This means the caption detection fails for captions which don’t contain a field, as in an example doc I added as a test. However, I think it’s better to be too conservative: a missed table caption will still show up as a paragraph next to the table, whereas if I incorrectly classify something else as a table caption it could cause havoc by pairing it up with a table it’s not at all related to, or dropping it entirely. * Update tests and add new ones Partially fixes: #6316
2021-05-28Docx reader: Read table column widths.Emily Bourke1-0/+5
2021-05-26Command tests: fail if a file contains no tests.John MacFarlane1-1/+4
And fix a test that failed in that way!
2021-05-25Jira: add support for "smart" linksAlbert Krewinkel2-0/+16
Support has been added for the new `[alias|https://example.com|smart-card]` syntax.
2021-05-24MediaBag improvements.John MacFarlane1-5/+5
In the current dev version, we will sometimes add a version of an image with a hashed name, keeping the original version with the original name, which would leave to undesirable duplication. This change separates the media's filename from the media's canonical name (which is the path of the link in the document itself). Filenames are based on SHA1 hashes and assigned automatically. In Text.Pandoc.MediaBag: - Export MediaItem type [API change]. - Change MediaBag type to a map from Text to MediaItem [API change]. - `lookupMedia` now returns a `MediaItem` [API change]. - Change `insertMedia` so it sets the `mediaPath` to a filename based on the SHA1 hash of the contents. This will be used when contents are extracted. In Text.Pandoc.Class.PandocMonad: - Remove `fetchMediaResource` [API change]. Lua MediaBag module has been changed minimally. In the future it would be better, probably, to give Lua access to the full MediaItem type.
2021-05-24Jira writer: use `{color}` when span has a color attributeAlbert Krewinkel1-0/+4
Closes: tarleb/jira-wiki-markup#10
2021-05-17HTML writer: keep attributes from code nested below pre tag.Albert Krewinkel1-0/+11
If a code block is defined with `<pre><code class="language-x">…</code></pre>`, where the `<pre>` element has no attributes, then the attributes from the `<code>` element are used instead. Any leading `language-` prefix is dropped in the code's *class* attribute are dropped to improve syntax highlighting. Closes: #7221
2021-05-17HTML writer: ensure headings only have valid attribs in HTML4Albert Krewinkel1-52/+57
Fixes: #5944
2021-05-17ConTeXt writer: use span identifiers as reference anchors.Albert Krewinkel1-0/+3
Closes: #7246
2021-05-17ConTeXt writer tests: keep code lines below 80 chars.Albert Krewinkel1-113/+119
2021-05-15HTML writer: parse `<header>` as a DivAlbert Krewinkel1-5/+9
HTML5 `<header>` elements are treated like `<div>` elements.
2021-05-09Change reader types, allowing better tracking of source positions.John MacFarlane1-2/+2
Previously, when multiple file arguments were provided, pandoc simply concatenated them and passed the contents to the readers, which took a Text argument. As a result, the readers had no way of knowing which file was the source of any particular bit of text. This meant that we couldn't report accurate source positions on errors or include accurate source positions as attributes in the AST. More seriously, it meant that we couldn't resolve resource paths relative to the files containing them (see e.g. #5501, #6632, #6384, #3752). Add Text.Pandoc.Sources (exported module), with a `Sources` type and a `ToSources` class. A `Sources` wraps a list of `(SourcePos, Text)` pairs. [API change] A parsec `Stream` instance is provided for `Sources`. The module also exports versions of parsec's `satisfy` and other Char parsers that track source positions accurately from a `Sources` stream (or any instance of the new `UpdateSourcePos` class). Text.Pandoc.Parsing now exports these modified Char parsers instead of the ones parsec provides. Modified parsers to use a `Sources` as stream [API change]. The readers that previously took a `Text` argument have been modified to take any instance of `ToSources`. So, they may still be used with a `Text`, but they can also be used with a `Sources` object. In Text.Pandoc.Error, modified the constructor PandocParsecError to take a `Sources` rather than a `Text` as first argument, so parse error locations can be accurately reported. T.P.Error: showPos, do not print "-" as source name.
2021-04-29Docx reader: add handling of vml image objects (jgm#4735) (#7257)mbrackeantidot1-0/+4
They represent images, the same way as other images in vml format.
2021-04-28Smarter smart quotes.John MacFarlane1-1/+1
Treat a leading " with no closing " as a left curly quote. This supports the practice, in fiction, of continuing paragraphs quoting the same speaker without an end quote. It also helps with quotes that break over lines in line blocks. Closes #7216.
2021-04-28JATS writer: use either styled-content or named-content for spans.Albert Krewinkel1-5/+9
If the element has a content-type attribute, or at least one class, then that value is used as `content-type` and the span is put inside a `<named-content>` element. Otherwise a `<styled-content>` element is used instead. Closes: #7211
2021-04-10JATS writer: convert spans to <named-content> elementsAlbert Krewinkel1-0/+15
Spans with attributes are converted to `<named-content>` elements instead of being wrapped with `<milestone-start/>` and `<milestone-end>` elements. Milestone elements are not allowed in documents using the articleauthoring tag set, so this change ensures the creation of valid documents. Closes: #7211
2021-04-05JATS writer: escape disallows chars in identifiersAlbert Krewinkel1-90/+115
XML identifiers must start with an underscore or letter, and can contain only a limited set of punctuation characters. Any IDs not adhering to these rules are rewritten by writing the offending characters as Uxxxx, where `xxxx` is the character's hex code.
2021-03-31Treat tabs as spaces in ODT Reader. (#7185)niszet1-0/+1
2021-03-20Include Header.Attr.attributes as XML attributes on sectionErik Rask1-0/+37
Add key-value pairs found in the attributes list of Header.Attr as XML attributes on the corresponding section element. Any key name not allowed as an XML attribute name is dropped, as are keys with invalid values where they are defined as enums in DocBook, and xml:id (for DocBook 5)/id (for DocBook 4) to not intervene with computed identifiers.
2021-03-19Tests: Use getExecutablePath from base...John MacFarlane2-2/+2
avoiding the need to depend on the executable-path package.
2021-03-19Tests: factor out setupEnvironment in Test.Helpers.John MacFarlane3-25/+26
This avoids code duplication between Command and Old.
2021-03-19Fix finding of data files from test programs.John MacFarlane2-2/+5
Apparently Cabal sets a `pandoc_datadir` environment variable so that the data files will be sought in the source directory rather than in the final destination (where they aren't yet installed). So we no longer need to set `--data-dir` in the tests. We just need to make sure `pandoc_datadir` is set in the environment when we call the program in the test suite. This will fix the issue with loading of pandoc.lua when pandoc is built with `-embed_data_files`, reported in #7163. Closes #7163.
2021-03-13Jira reader: mark divs created from panels with class "panel".Albert Krewinkel1-0/+6
Closes: tarleb/jira-wiki-markup#2
2021-03-13Jira writer: improve div/panel handlingAlbert Krewinkel1-0/+30
Include div attributes in panels, always render divs with class `panel` as panels, and avoid nesting of panels.
2021-03-08Jira writer: use noformat instead of code for unknown languages.Albert Krewinkel1-0/+10
Code blocks that are not marked as a language supported by Jira are rendered as preformatted text with `{noformat}` blocks. Fixes: tarleb/jira-wiki-markup#4
2021-03-01Jira writer: use Span identifiers as anchorsAlbert Krewinkel1-1/+8
Closes: tarleb/jira-wiki-markup#3.
2021-02-28Remove superfluous imports.John MacFarlane1-2/+0
2021-02-28T.P.Readers.LaTeX: Don't export tokenize, untokenize.John MacFarlane1-16/+1
[API change] These were only exported for testing, which seems the wrong thing to do. They don't belong in the public API and are not really usable as they are, without access to the Tok type which is not exported. Removed the tokenize/untokenize roundtrip test. We put a quickcheck property in the comments which may be used when this code is touched (if it is).
2021-02-22tests: print accurate location if a test failsAlbert Krewinkel1-1/+1
Ensures that tasty-hunit reports the location of the failing test instead of the location of the helper `test` function.
2021-02-22Text.Pandoc.UTF8: change IO functions to return Text, not String.John MacFarlane3-4/+5
[API change] This affects `readFile`, `getContents`, `writeFileWith`, `writeFile`, `putStrWith`, `putStr`, `putStrLnWith`, `putStrLn`. `hPutStrWith`, `hPutStr`, `hPutStrLnWith`, `hPutStrLn`, `hGetContents`. This avoids the need to uselessly create a linked list of characters when emiting output.
2021-02-18Org reader: fix bug in org-ref citation parsing.Albert Krewinkel1-0/+40
The org-ref syntax allows to list multiple citations separated by comma. This fixes a bug that accepted commas as part of the citation id, so all citation lists were parsed as one single citation. Fixes: #7101
2021-02-16Rename Text.Pandoc.XMLParser -> Text.Pandoc.XML.Light...John MacFarlane1-1/+2
..and add new definitions isomorphic to xml-light's, but with Text instead of String. This allows us to keep most of the code in existing readers that use xml-light, but avoid lots of unnecessary allocation. We also add versions of the functions from xml-light's Text.XML.Light.Output and Text.XML.Light.Proc that operate on our modified XML types, and functions that convert xml-light types to our types (since some of our dependencies, like texmath, use xml-light). Update golden tests for docx and pptx. OOXML test: Use `showContent` instead of `ppContent` in `displayDiff`. Docx: Do a manual traversal to unwrap sdt and smartTag. This is faster, and needed to pass the tests. Benchmarks: A = prior to 8ca191604dcd13af27c11d2da225da646ebce6fc (Feb 8) B = as of 8ca191604dcd13af27c11d2da225da646ebce6fc (Feb 8) C = this commit | Reader | A | B | C | | ------- | ----- | ------ | ----- | | docbook | 18 ms | 12 ms | 10 ms | | opml | 65 ms | 62 ms | 35 ms | | jats | 15 ms | 11 ms | 9 ms | | docx | 72 ms | 69 ms | 44 ms | | odt | 78 ms | 41 ms | 28 ms | | epub | 64 ms | 61 ms | 56 ms | | fb2 | 14 ms | 5 ms | 4 ms |
2021-02-13Org: support task_lists extensionAlbert Krewinkel2-11/+59
The tasks lists extension is now supported by the org reader and writer; the extension is turned on by default. Closes: #6336
2021-02-12Jira: require jira-wiki-markup 1.3.3Albert Krewinkel1-0/+7
* Modified the Doc parser to skip leading blank lines. This fixes parsing of documents which start with multiple blank lines. (#7095) * Prevent URLs within link aliases to be treated as autolinks. (#6944) Fixes: #7095 Fixes: #6944
2021-02-10Add new unexported module T.P.XMLParser.John MacFarlane1-0/+1
This exports functions that uses xml-conduit's parser to produce an xml-light Element or [Content]. This allows existing pandoc code to use a better parser without much modification. The new parser is used in all places where xml-light's parser was previously used. Benchmarks show a significant performance improvement in parsing XML-based formats (especially ODT and FB2). Note that the xml-light types use String, so the conversion from xml-conduit types involves a lot of extra allocation. It would be desirable to avoid that in the future by gradually switching to using xml-conduit directly. This can be done module by module. The new parser also reports errors, which we report when possible. A new constructor PandocXMLError has been added to PandocError in T.P.Error [API change]. Closes #7091, which was the main stimulus. These changes revealed the need for some changes in the tests. The docbook-reader.docbook test lacked definitions for the entities it used; these have been added. And the docx golden tests have been updated, because the new parser does not preserve the order of attributes. Add entity defs to docbook-reader.docbook. Update golden tests for docx.
2021-02-07Avoid unnecessary use of NoImplicitPrelude pragma (#7089)Albert Krewinkel52-100/+0
2021-02-02Fixed some compiler warnings in tests.John MacFarlane3-14/+3
2021-02-02Lua: add module "pandoc.path"Albert Krewinkel1-0/+2
The module allows to work with file paths in a convenient and platform-independent manner. Closes: #6001 Closes: #6565
2021-02-02Test suite: a more robust way of testing the executable.John MacFarlane3-66/+45
Mmny of our tests require running the pandoc executable. This is problematic for a few different reasons. First, cabal-install will sometimes run the test suite after building the library but before building the executable, which means the executable isn't in place for the tests. One can work around that by first building, then building and running the tests, but that's fragile. Second, we have to find the executable. So far, we've done that using a function findPandoc that attempts to locate it relative to the test executable (which can be located using findExecutablePath). But the logic here is delicate and work with every combination of options. To solve both problems, we add an `--emulate` option to the `test-pandoc` executable. When `--emulate` occurs as the first argument passed to `test-pandoc`, the program simply emulates the regular pandoc executable, using the rest of the arguments (after `--emulate`). Thus, test-pandoc --emulate -f markdown -t latex is just like pandoc -f markdown -t latex Since all the work is done by library functions, implementing this emulation just takes a couple lines of code and should be entirely reliable. With this change, we can test the pandoc executable by running the test program itself (locatable using findExecutablePath) with the `--emulate` option. This removes the need for the fragile `findPandoc` step, and it means we can run our integration tests even when we're just building the library, not the executable. Part of this change involved simplifying some complex handling to set environment variables for dynamic library paths. I have tested a build with `--enable-dynamic-executable`, and it works, but further testing may be needed.
2021-01-16Revert "Markdown reader: support GitHub wiki's internal links (#2923) (#6458)"John MacFarlane1-30/+0
This reverts commit 6efd3460a776620fdb93812daa4f6831e6c332ce. Since this extension is designed to be used with GitHub markdown (gfm), we need to implement the parser as a commonmark extension (commonmark-extensions), rather than in pandoc's markdown reader. When that is done, we can add it here.
2021-01-16Markdown reader: support GitHub wiki's internal links (#2923) (#6458)Gautier DI FOLCO1-0/+30
Canges overview: * Add a `Ext_markdown_github_wikilink` constructor to `Extension` [API change]. * Add the parser `githubWikiLink` in `Text.Pandoc.Readers.Markdown` * Add tests.
2021-01-09Org reader: allow multiple pipe chars in todo sequencesAlbert Krewinkel1-0/+10
Additional pipe chars, used to separate "action" state from "no further action" states, are ignored. E.g., for the following sequence, both `DONE` and `FINISHED` are states with no further action required. #+TODO: UNFINISHED | DONE | FINISHED Previously, parsing of the todo sequence failed if multiple pipe chars were included. Closes: #7014
2021-01-08Update copyright notices for 2021 (#7012)Albert Krewinkel30-30/+30
2021-01-03Org reader: mark verbatim code with class "verbatim". (#6998)Dimitri Sabadie1-2/+2
* Replace org-mode’s verbatim from code to codeWith. This adds the `"verbatim"` class so that exporters can apply a specific style on it. For instance, it will be possible for HTML to add a CSS rule for code + verbatim class. * Alter test for org-mode’s verbatim change. See previous commit for further detail on the new implementation.
2021-01-01Org reader: restructure output of captioned code blocksAlbert Krewinkel1-3/+3
The Div wrapper of code blocks with captions now has the class "captioned-content". The caption itself is added as a Plain block inside a Div of class "caption". This makes it easier to write filters which match on captioned code blocks. Existing filters will need to be updated. Closes: #6977
2020-12-20LaTeX writer: support colspans and rowspans in tables. (#6950)Albert Krewinkel1-1/+1
Note that the multirow package is needed for rowspans. It is included in the latex template under a variable, so that it won't be used unless needed for a table.