Age | Commit message (Collapse) | Author | Files | Lines |
|
The HTML writer adds the `data-` prefix for HTML5
for nonstandard attributes. But the attributes are
represented in the AST without the `data-` prefix,
so we should strip this when reading HTML.
Closes #5392.
|
|
Previously they would sometimes not work: e.g., when they
occured in final paragraphs in lists that were originally
parsed as Plain and converted later using PlainToPara.
Closes #5368.
|
|
These are parsed as a Span with class `underline`, as with other readers.
|
|
|
|
We now include every output format. Pruning is handled by
`--ipynb-output=`.
|
|
We now handle even complex cell metadata in the Div's attributes.
Simple metadata fields are rendered as a plain string, and complex ones
as JSON.
|
|
|
|
|
|
The haddock module header contains essentially the
same information, so the boilerplate is redundant and
just one more thing to get out of sync.
|
|
Add ReaderOptions parameter to yamlToMeta [API change].
fixes #5272
|
|
This ensures that a figure containing a single image
is parsed as a pandoc "implicit figure" (i.e., a
Para with a single Image whose title attribute begins
with `fig:`). More complex figures will still be parsed
as divs.
Closes #5321.
|
|
This module is one of the most opaque parts of the docx reader: it
deals with the fact that runs have non-nesting formatting, so we have
to figure out the nesting on the fly as we combine them.
We start adding commenting, so new developers can understand and, if
necessary, modify this module. Specific function comments will be
added in the future, but this offers a global description of the
purpose of the module.
|
|
We have to add one final mempty when we're combining in order to trim
inlines appropriately. (We need to use our own trimming routines here
due to the way that formatted inlines are smushed together when
converting from docx.)
Closes #5273
|
|
|
|
|
|
Previously parsing would break if the code block
contained a string of backticks of sufficient length
followed by something other than end of line.
Closes #5304.
|
|
The rid attribute can have a space-separated list of ids.
Closes #5310.
|
|
We had previously walked the document to unwrap sdt/sdtContent and
smartTag tags in `word/document.xml`, but not in the
`word/{foot/end}note.xml` and `word/comments.xml`.
Closes #5302
|
|
|
|
even if a richer format is included.
We don't know what output format will be needed.
The fallback can always be weeded out using a filter.
Closes #5293.
|
|
see #5272
|
|
Some paths in archives are absolute (have an opening slash) which, for
reasons unknown, produces a failure in the test suite on MS
Windows. This fixes that by removing the leading slash if it exists.
Closes #5277 (previously closed with 4cce0ef but reopened due to this bug).
|
|
This reverts commit 2142bbe572cea00b7bb5ad3e10a3afb26845a1f7.
|
|
Try fixing a parsing error on windows by insisting that the parser use
a Posix filepath library for splitting doc paths in a zipfile. (It
might default on Windows to using a backslash as a separator, while
it's always a forward-slash in zip archives.)
|
|
* clarify function name. We had previously used `getDocumentPath`,
but `Document` is an overdetermined term here. Use
`getDocumentXmlPath` to make clear what we're doing.
* Use field notation for setting ReaderEnv. As we've added (and
continue to add) fields, the assignment by position has gotten
harder to read.
* figure out document.xml path once at the beginning of parsing, and
add it to the environment, so we can avoid repeated lookups.
|
|
Getting the location used to depend on a hard-coded .rels file based
on "word/document.xml". We now dynamically detect that file based on
the document.xml file specified in "_rels/.rels"
|
|
The desktop Word program places the main document file in
"word/document.xml", but the online word places it in
"word/document2.xml". This file path is actually stated in the root
"_rels/.rels" file, in the "Relationship" element with an
"http://../officedocument" type.
Closes #5277
|
|
For some reason, Word in Office 365 Online uses `document2.xml`
for the content, instead of `document.xml`. This causes pandoc
not to be able to parse docx.
This quick fix has the parser check for both `document.xml`
and `document2.xml`.
Addresses #5277, but a more robust solution would be to
get the name of the main document dynamically (who knows
whether it might change again?).
|
|
Quite a few modules were missing copyright notices.
This commit adds copyright notices everywhere via haddock module
headers. The old license boilerplate comment is redundant with this and has
been removed.
Update copyright years to 2019.
Closes #4592.
|
|
Otherwise last block gets parsed as a Plain rather than
a Para.
This is a regression in pandoc 2.x. This patch restores
pandoc 1.19 behavior.
Closes #5271.
|
|
Previously we didn't strip off the attachment: prefix,
so even though the attachment is available in the mediabag,
pandoc couldn't find it.
|
|
`braced` now actually requires nested braces.
Otherwise some legitimate command and environment
definitions can break (see test/command/tex-group.md).
|
|
|
|
|
|
Partially addresses #4731.
We may not still be exactly matching mediawiki's algorithm
for identifiers.
|
|
|
|
This reverts commit 5eaff399d5d6dc30b0d453eff42c4101674d75ab.
|
|
This avoids conflics with things like 'toc'.
|
|
[API change]
* Depend on ipynb library.
* Add `ipynb` as input and output format.
* Added Text.Pandoc.Readers.Ipynb (supports both nbformat v3 and v4).
* Added Text.Pandoc.Writers.Ipynb (supports nbformat v4).
* Added ipynb readers and writers to T.P.Readers,
T.P.Writers, and T.P.Extensions. Register the
file extension .ipynb for this format.
* Add `PandocIpynbDecodingError` constructor to Text.Pandoc.Error.Error.
* Note: there is no template for ipynb.
|
|
|
|
|
|
We don't want to parse its contents as Markdown or HTML.
Closes #5241.
|
|
Previously the `.0` was interpreted as a file extension,
leading pandoc not to add `.tex` (and thus not to find the
file).
The new behavior matches tex more closely.
|
|
|
|
Directives of this type without numeric inputs should not have a
`startFrom` attribute; with a blank value, the writers can produce
extra whitespace.
|
|
* These were added by the RST reader and, for literate Haskell,
by the Markdown and LaTeX readers. There is no point to
this class, and it is not applied consistently by all readers.
See #5047.
* Reverse order of `literate` and `haskell` classes on code blocks
when parsing literate Haskell. Better if `haskell` comes first.
|
|
Closes #5204.
|
|
See #5190.
|
|
When `minlevel` exceeds the original minimum level observed in the
file to be included, every heading should be shifted rightward.
|
|
Underscore emphasis can't cross table cell boundaries,
but the parser wasn't respecting this, leading to exponential
behavior in documents with table cells containing underscores.
This fixes the original sample; it's possible that there
are other performance issues involving underscores.
Closes #3921.
|