Age | Commit message (Collapse) | Author | Files | Lines |
|
Additional state changes need to be made after a newline is parsed,
otherwise markup may not be recognized correctly.
This fixes a bug where markup after certain block-types would not be
recognized. E.g. `/emph/` in the following snippet was not parsed as
emphasized.
foo
# comment
/emph/
|
|
A parser state attribute was used to keep track of block attributes
defined in meta-lines. Global state is undesirable, so block attributes
are no longer saved as part of the parser state. Old functions and the
respective part of the parser state are removed.
|
|
This should fix #2924.
Testing on the epub that caused the problem originally
would be welcome.
|
|
Add class option for code block in RST reader
|
|
All known export options are parsed but ignored.
|
|
Org-mode allows to specify export settings via `#+OPTIONS` lines.
Disabling simple sub- and superscripts is one of these export options,
this options is now supported.
|
|
The org reader code has become large and confusing. Extracting smaller
parts into submodules should help to clean things up.
|
|
Org fixes (reader and writer)
|
|
The last fix for whitespace handling of inline LaTeX commands was
incorrect, preventing correct recognition of inline LaTeX commands which
contain spaces. This fix ensures that only trailing whitespace is cut
off.
|
|
|
|
Org reader: table parsing code refactoring and fixes
|
|
The org-reader was droping space after unescaped LaTeX-style symbol
commands: `\ForAll \Auml` resulted in `∀Ä` but should give `∀ Ä`
instead. This seems to be because the LaTeX-reader treats the
command-terminating space as part of the command. Dropping the trailing
space from the symbol-command fixes this issue.
|
|
This fixes Org mode parsing of some corner cases regarding empty cells
and rows. Empty cells weren't parsed correctly, e.g. `|||` should be
two empty cells, but would be parsed as a single cell containing a pipe
character. Empty rows where parsed as alignment rows and dropped from
the output.
This fixes #2616.
|
|
This refactores the codes conversing a list table lines to an org table
ADT. The old code was simplified and is now slightly less ugly.
|
|
Emacs Org-mode doesn't add any padding to table rows. The first
row (header or first body row) is used to determine the column count, no
other magic is performed.
The org reader was padding rows to the length of the longest table row.
This was done due to a misunderstanding of how Org handles tables. This
feature reflected how Org-mode handles tables when pressing <TAB>. The
Org exporter however, which is what the reader should implement, doesn't
do any of this. So this was a mis-feature that made the reader more
complex and reduced comparability. It was hence removed.
|
|
According to http://docutils.sourceforge.net/docs/ref/rst/directives.html#code,
the code directive supports the ":class:" option.
|
|
Commit 91dc3342 made `readDocx` throw PandocError if there was an
unarchiving error. This extends that fix to `readOdt` and `readEPUB`.
|
|
Previously, readDocx would error out if zip-archive failed. We change
the archive extraction step from `toArchive` to `toArchiveOrFail`, which
returns an Either value.
|
|
Fixes #2862
Also fix up tab handling for leading whitespace in code blocks.
|
|
`moveTo` and `moveFrom` are track-changes tags that are used when a
block of text is moved in the document. We now recognize these tags and
treat them the same as `insert` and `delete`, respectively. So,
`--track-changes=accept` will show the moved version, while
`--track-changes=reject` will show the original version.
|
|
Closes #2799.
Also added -s to markdown-reader-more test.
|
|
We are now more forgiving about parsing invalid HTML with
unescaped `&` as raw HTML. (Previously any unescaped `&`
would cause pandoc not to recognize the string as raw HTML.)
Closes #2410.
|
|
This was a regression, with the rewrite of `htmlInBalanced`
(from `Text.Pandoc.Readers.HTML`) in 1.17.
It caused newlines to be omitted in raw HTML blocks.
Closes #2804.
|
|
Some word functions -- especially graphics -- give various choices for
content so there can be backwards compatibility. This follows the
largely undocumented feature by working through the choices until we
find one that works.
Note that we had to split out the processing of child elems of runs into
a separate function so we can recurse properly. Any processing of an
element *within* a run (other than a plain run) should go into
`childElemToRun`.
|
|
Word uses list numbering styles to number its headings. We only call
something a numbered list if it does not also heave a heading style.
|
|
The regular readDocx just becomes a special case.
|
|
In order to be able to collect warnings during parsing, we add a state
monad transformer to the D monad. At the moment, this only includes a
list of warning strings (nothing currently triggers them, however). We
use StateT instead of WriterT to correspond more closely with the
warnings behavior in T.P.Parsing.
|
|
+ If the base path does not end with slash, the last component
will be replaced. E.g. base = `http://example.com/foo`
combines with `bar.html` to give `http://example.com/bar.html`.
+ If the href begins with a slash, the whole path of the base
is replaced. E.g. base = `http://example.com/foo/` combines
with `/bar.html` to give `http://example.com/bar.html`.
Closes #2777.
|
|
Fixes #2765.
Added test case.
|
|
|
|
We already allowed them in the header, but not in the body
rows, for some reason. This gives compatibility with org-mode
tables.
|
|
Previously an emph element could be parsed across the newline
at the end of the pipe table row.
I thought this would help with #2765, but it doesn't.
|
|
The feature checklist in the source code was out of date. Update.
|
|
e.g. `$$\hbox{$i$}$$`.
Partially addresses #2743.
|
|
The docx reader used to use a Modifiable typeclass to combine both
Blocks and Inlines. But all the work was in the inlines. So most of the
generality was wasted, at the expense of making the code harder to
understand. This gets rid of the generality, and adds functions for
Blocks and Inlines. It should be a bit easier to work with going forward.
|
|
This should give better performance.
See #2730.
|
|
|
|
Prefix even empty figure names with "fig:"
|
|
Org reader: Refactor link-target processing
|
|
This version avoids an exponential performance problem with `<script>` tags,
and it should be faster in general.
Closes #2730.
|
|
Closes #2718.
|
|
Previously smart quotes were incorrect in the following:
'$\neg(x \in x)$'.
(because of the following period). This commit fixes the problem,
which was introduced by commit 4229cf2d92faf5774fe1a3a9c89a5de885cf75cd.
|
|
We want to make sure that links have their spaces removed, and are
appropriately smushed together.
This closes #2689
|
|
Cleanup of the code for link target handling. Most notably, the
canonicalization of a link is handled by a separate function.
This fixes #2684.
|
|
This gives better results when people write e.g. `\TeX{}` in Markdown.
\TeX{} and \LaTeX{}
now works as expected with `pandoc -f markdown -t latex`.
Closes #2687.
|
|
Put them in a list in the metadata so they are all
preserved, rather than (as before) throwing out all
but one..
|
|
See #2171.
|
|
Closes #2674.
|
|
This avoids performance problems in documents with many identically
named headers.
Closes #2671.
|
|
The convention used by pandoc for figures is to mark them by prefixing
the name with "fig:". The org reader failed to do this if a figure had
no name. The test for this was broken as well.
This fixes #2643.
|