Age | Commit message (Collapse) | Author | Files | Lines |
|
..and add new definitions isomorphic to xml-light's, but with
Text instead of String. This allows us to keep most of the code in
existing readers that use xml-light, but avoid lots of unnecessary
allocation.
We also add versions of the functions from xml-light's
Text.XML.Light.Output and Text.XML.Light.Proc that operate
on our modified XML types, and functions that convert
xml-light types to our types (since some of our dependencies,
like texmath, use xml-light).
Update golden tests for docx and pptx.
OOXML test: Use `showContent` instead of `ppContent` in `displayDiff`.
Docx: Do a manual traversal to unwrap sdt and smartTag.
This is faster, and needed to pass the tests.
Benchmarks:
A = prior to 8ca191604dcd13af27c11d2da225da646ebce6fc (Feb 8)
B = as of 8ca191604dcd13af27c11d2da225da646ebce6fc (Feb 8)
C = this commit
| Reader | A | B | C |
| ------- | ----- | ------ | ----- |
| docbook | 18 ms | 12 ms | 10 ms |
| opml | 65 ms | 62 ms | 35 ms |
| jats | 15 ms | 11 ms | 9 ms |
| docx | 72 ms | 69 ms | 44 ms |
| odt | 78 ms | 41 ms | 28 ms |
| epub | 64 ms | 61 ms | 56 ms |
| fb2 | 14 ms | 5 ms | 4 ms |
|
|
The tasks lists extension is now supported by the org reader and writer;
the extension is turned on by default.
Closes: #6336
|
|
* Modified the Doc parser to skip leading blank lines. This fixes
parsing of documents which start with multiple blank lines.
(#7095)
* Prevent URLs within link aliases to be treated as autolinks.
(#6944)
Fixes: #7095
Fixes: #6944
|
|
This exports functions that uses xml-conduit's parser to
produce an xml-light Element or [Content]. This allows
existing pandoc code to use a better parser without
much modification.
The new parser is used in all places where xml-light's
parser was previously used. Benchmarks show a significant
performance improvement in parsing XML-based formats
(especially ODT and FB2).
Note that the xml-light types use String, so the
conversion from xml-conduit types involves a lot
of extra allocation. It would be desirable to
avoid that in the future by gradually switching
to using xml-conduit directly. This can be done
module by module.
The new parser also reports errors, which we report
when possible.
A new constructor PandocXMLError has been added to
PandocError in T.P.Error [API change].
Closes #7091, which was the main stimulus.
These changes revealed the need for some changes
in the tests. The docbook-reader.docbook test
lacked definitions for the entities it used; these
have been added. And the docx golden tests have been
updated, because the new parser does not preserve
the order of attributes.
Add entity defs to docbook-reader.docbook.
Update golden tests for docx.
|
|
|
|
|
|
The module allows to work with file paths in a convenient and
platform-independent manner.
Closes: #6001
Closes: #6565
|
|
Mmny of our tests require running the pandoc
executable. This is problematic for a few different reasons.
First, cabal-install will sometimes run the test suite
after building the library but before building the executable,
which means the executable isn't in place for the tests.
One can work around that by first building, then building
and running the tests, but that's fragile. Second,
we have to find the executable. So far, we've done that
using a function findPandoc that attempts to locate it
relative to the test executable (which can be located
using findExecutablePath). But the logic here is delicate
and work with every combination of options.
To solve both problems, we add an `--emulate` option to
the `test-pandoc` executable. When `--emulate` occurs
as the first argument passed to `test-pandoc`, the
program simply emulates the regular pandoc executable,
using the rest of the arguments (after `--emulate`).
Thus,
test-pandoc --emulate -f markdown -t latex
is just like
pandoc -f markdown -t latex
Since all the work is done by library functions,
implementing this emulation just takes a couple lines
of code and should be entirely reliable.
With this change, we can test the pandoc executable
by running the test program itself (locatable using
findExecutablePath) with the `--emulate` option.
This removes the need for the fragile `findPandoc`
step, and it means we can run our integration tests
even when we're just building the library, not the
executable.
Part of this change involved simplifying some complex
handling to set environment variables for dynamic
library paths. I have tested a build with
`--enable-dynamic-executable`, and it works, but
further testing may be needed.
|
|
This reverts commit 6efd3460a776620fdb93812daa4f6831e6c332ce.
Since this extension is designed to be used with
GitHub markdown (gfm), we need to implement the parser
as a commonmark extension (commonmark-extensions),
rather than in pandoc's markdown reader. When that is
done, we can add it here.
|
|
Canges overview:
* Add a `Ext_markdown_github_wikilink` constructor to `Extension` [API change].
* Add the parser `githubWikiLink` in `Text.Pandoc.Readers.Markdown`
* Add tests.
|
|
Additional pipe chars, used to separate "action" state from "no further
action" states, are ignored. E.g., for the following sequence, both
`DONE` and `FINISHED` are states with no further action required.
#+TODO: UNFINISHED | DONE | FINISHED
Previously, parsing of the todo sequence failed if multiple pipe chars
were included.
Closes: #7014
|
|
|
|
* Replace org-mode’s verbatim from code to codeWith.
This adds the `"verbatim"` class so that exporters can apply a specific
style on it. For instance, it will be possible for HTML to add a CSS
rule for code + verbatim class.
* Alter test for org-mode’s verbatim change.
See previous commit for further detail on the new implementation.
|
|
The Div wrapper of code blocks with captions now has the class
"captioned-content". The caption itself is added as a Plain block
inside a Div of class "caption". This makes it easier to write filters
which match on captioned code blocks. Existing filters will need to be
updated.
Closes: #6977
|
|
Note that the multirow package is needed for rowspans.
It is included in the latex template under a variable,
so that it won't be used unless needed for a table.
|
|
Closes: #6933
|
|
Docbook writer: handle admonitions
|
|
Docbook reader produces a `Div` with `title` class for `<title>` element
within an “admonition” element. Markdown writer then turns this
into a fenced div with `title` class attribute. Since fenced divs
are block elements, their content is recognized as a paragraph
by the Markdown reader. This is an issue for Docbook writer because
it would produce an invalid DocBook document from such AST –
the `<title>` element can only contain “inline” elements.
Let’s handle this invalid special case separately by unwrapping
the paragraph before creating the `<title>` element.
|
|
Similarly to https://github.com/jgm/pandoc/commit/d6fdfe6f2bba2a8ed25d6c9f11861774001f7a91,
we should handle admonitions.
|
|
Links with (internal) targets that the reader doesn't know about are
converted into emphasized text. Information on the link target is now
preserved by wrapping the text in a Span of class `spurious-link`, with
an attribute `target` set to the link's original target. This allows to
recover and fix broken or unknown links with filters.
See: #6916
|
|
Information for cell alignment in a column is not preserved during
round-trips.
|
|
Fixes: #6845
|
|
|
|
As of ~2 years ago, lower case keywords became the standard (though they
are handled case insensitive, as always):
https://code.orgmode.org/bzg/org-mode/commit/13424336a6f30c50952d291e7a82906c1210daf0
Upper case keywords are exclusive to the manual:
- https://orgmode.org/list/871s50zn6p.fsf@nicolasgoaziou.fr/
- https://orgmode.org/list/87tuuw3n15.fsf@nicolasgoaziou.fr/
|
|
Previously we used Setext (underlined) headings by default.
The default is now ATX (`##` style).
* Add the `--markdown-headings=atx|setext` option.
* Deprecate `--atx-headers`.
* Add constructor 'ATXHeadingInLHS` constructor to `LogMessage` [API change].
* Support `markdown-headings` in defaults files.
* Document new options in MANUAL.
Closes #6662.
|
|
See: #6738
|
|
This means that `--accept` can be used to update expected output.
|
|
For security reasons, some legal firms delete the date from comments and
tracked changes.
* Make date optional (Maybe) in tracked changes and comments datatypes
* Add tests
|
|
If the first element of a bulleted or ordered list is another list,
then that first item will disappear if the target format is docx. This
changes the docx writer so that it prepends an empty string for those
cases. With this, no items will disappear.
Closes #5948.
|
|
This also changes stateLastNoteNumber -> stateNoteNumber.
|
|
|
|
Closes: #6314
|
|
* Fix hlint suggestions, update hlint.yaml
Most suggestions were redundant brackets. Some required
LambdaCase.
The .hlint.yaml file had a small typo, and didn't ignore camelCase
suggestions in certain modules.
|
|
Part of: #6314
|
|
Writers.Tables is now Writers.AnnotatedTable. All of the types and
functions in it have had the "Ann" removed from them. Now it is
expected that the module be imported qualified.
|
|
* HTML writer: add support for row headers, colspans, rowspans
* Add planet table tests
See #6312
|
|
Add Writers.Tables helper functions and types, add tests for those
The Writers.Tables module contains an AnnTable type that is a pandoc
Table with added inferred information that should be enough for
writers (in particular the HTML writer) to operate on without having
to lay out the table themselves.
The toAnnTable and fromAnnTable functions in that module convert
between AnnTable and Table. In addition to producing an AnnTable with
coherent and well-formed annotations, the toAnnTable function also
normalizes its input Table like the table builder does.
Various tests ensure that toAnnTable normalizes tables exactly like
the table builder, and that its annotations are coherent.
|
|
* Added test to replicate (#6596)
* Table cell reader not consuming spaces correctly (#6596)
* Prevented wrong nesting of \multicolumn and \multirow table cells (#6603)
* Parse empty table cells (#6603)
* Support full prototype for multirow macro (#6603)
Closes #6603
|
|
* Added test to replicate (#6596)
* Table cell reader not consuming spaces correctly (#6596)
|
|
Add multirow and multicolumn support in LaTex reader.
Partially addresses #6311.
|
|
If a line of ms code block output starts with a period (.), it should
be prepended by '\&' so that it is not interpreted as a roff command.
Fixes #6505
|
|
Tables can be removed from the final document with the `#+OPTION:
|:nil` export setting.
|
|
Footnotes can be removed from the final document with the `#+OPTION:
f:nil` export setting.
|
|
MathML-like entities, e.g., `\alpha`, can be disabled with the
`#+OPTION: e:nil` export setting.
|
|
The lines of unknown keywords, like `#+SOMEWORD: value` are no longer
read as metadata, but kept as raw `org` blocks. This ensures that more
information is retained when round-tripping org-mode files;
additionally, this change makes it possible to support non-standard org
extensions via filters.
|
|
Handling of export settings and other keywords (like `#+LINK`) has been
combined and unified.
|
|
These export settings are treated like their non-extra counterparts,
i.e., the values are added to the `header-includes` metadata list.
|
|
The values of all lines are read as inlines and collected in the
`subtitle` metadata field.
|
|
Closes: #6480
|
|
The value is stored in the `institute` metadata field and used in the
default beamer presentation template.
|