Age | Commit message (Collapse) | Author | Files | Lines |
|
* The new reader is more robust, accurate, and extensible.
It is still quite incomplete, but it should be easier
now to add features.
* Text.Pandoc.Parsing: Added withRaw combinator.
* Markdown reader: do escapedChar before raw latex inline.
Otherwise we capture commands like \{.
* Fixed latex citation tests for new citeproc.
* Handle \include{} commands in latex.
This is done in pandoc.hs, not the (pure) latex reader.
But the reader exports the needed function, handleIncludes.
* Moved err and warn from pandoc.hs to Shared.
* Fixed tests - raw tex should sometimes have trailing space.
* Updated lhs-test for highlighting-kate changes.
|
|
|
|
Previously the ID attribute got lost if it didn't come first.
Now attributes can come in any order.
|
|
You can now write
```ruby
x = 2
```
instead of
~~~ {.ruby}
x = 2
~~~~
|
|
Closes #348. Closes #108.
|
|
Top line of table must not be followed by a blank line.
This bug caused slowdown on some files with hrules and tables,
and pandoc tried to interpret the hrules as the tops of
multiline tables.
|
|
This change also means that
[link with [link](/url)](/url)
will turn into
<p><a href="/url">link with link</a></p>
instead of
<p><a href="/url">link with [link](/url)</a></p>
|
|
Pandoc previously behaved like Markdown.pl for consecutive
lists of different styles. Thus, the following would be parsed
as a single ordered list, rather than an ordered list followed
by an unordered list:
1. one
2. two
- one
- two
This patch makes pandoc behave more sensibly, parsing this as
two lists. Any change in list type (ordered/unordered) or in
list number style will trigger a new list. Thus, the following
will also be parsed as two lists:
1. one
2. two
a. one
b. two
Since we regard this as a bug in Markdown.pl, and not something
anyone would ever rely on, we do not preserve the old behavior
even when `--strict` is selected.
|
|
* Added stateLastStrPos to ParserState. This lets us keep track
of whether we're parsing the position immediately after a 'str'.
If we encounter a ' in such a location, it must be an apostrophe,
and can't be a single quote start.
* Set this in the markdown, textile, html, and rst str parsers.
* Closes #360.
|
|
|
|
|
|
This solves a problem stemming from the fact that a parser
doesn't know what came *before* in the input stream.
Previously pandoc would parse
D'oh l'*aide*
as containing a single quoted "oh l", when both `'`s should
be apostrophes. (Issue #360.) There are two issues here.
(a) It is obvious that the first `'` is not an open quote,
becaues of the preceding `D`. This patch solves the problem.
(b) It is obvious to us that the second `'` is not an
open quote, because we see that *aide* is some text.
But getting a good algorithm that has good performance is
a bit tricky. You can't assume that `'` followed by `*`
is always an apostrophe:
*'this is quoted'*
This patch does not fix (b).
|
|
Closes #312.
|
|
|
|
|
|
Refactored escapedChar into escapedChar', escapedChar.
|
|
Previously `[@item1 and nowhere else]` yielded the locator ", and nowhere
else", or, with the new citeproc-hs, "and nowhere else".
Now it yields " and nowhere else".
|
|
The characters '.',':',';','$','<','>','~','#','-','_' can
be used only between two letters or digits in a citation key.
This means that '@item1.' will be parsed as a citation, 'item1',
followed by a period, instead of a citation 'item1.', as was the
case previously.
Thanks to David Sanson for alerting us to the problem.
|
|
|
|
It requires parsec 3, and currently pandoc can build with parsec 2.
|
|
Ported code from pandoc2.
Now all tests pass.
|
|
Resolves Issue #304: problems with
(@item1; @item2)
because the final paren was being parsed as part of
the item key.
|
|
These previously caused infinite looping and stack overflows.
For example:
[^1]
[^1]: See [^1]
Note references are allowed in reST notes, so this isn't a full
implementation of reST. That can come later. For now we need to
prevent the stack overflows.
Partially resolves Issue #297.
|
|
The point of the change is to allow html tags to be used freely
at the left margin of a markdown+lhs document.
Thanks to Conal Elliot for the suggestion.
|
|
|
|
The last patch did not handle cases with > 4 spaces.
Also added a more general test case.
|
|
The problem was in input like this:
[^1]: note
not in note.
Also added a test case for this.
|
|
Previously pandoc got confused by blank rows in the header.
|
|
Additional related changes:
* URLs in Code in autolinks now use class "url".
* Require highlighting-kate 0.2.8.2, which omits the final <br/> tag,
essential for inline code.
|
|
|
|
|
|
The old TeX, HtmlInline and RawHtml elements have been removed
and replaced by generic RawInline and RawBlock elements.
All modules updated to use the new raw elements.
|
|
|
|
|
|
This speeds up the markdown reader.
|
|
specialChars, strChar, specialCharsMinusLt.
|
|
|
|
'(_hi_)' was being parsed with literal underscores (no emphasis).
The fix: the 'str' parser now only parses alphanumerics and
embedded underscores. All other symbols are handled by the
'symbol' parser. This has a slight effect on the AST, since
you'll get [Str "hi",Str ":"] insntead of [Str "hi:"]. But there
should not be a visible effect in any of the writers.
Thanks to gwern for pointing out the regression.
|
|
* The new reader is faster and more accurate.
* API changes for Text.Pandoc.Readers.HTML:
- removed rawHtmlBlock, anyHtmlBlockTag, anyHtmlInlineTag,
anyHtmlTag, anyHtmlEndTag, htmlEndTag, extractTagType,
htmlBlockElement, htmlComment
- added htmlTag, htmlInBalanced, isInlineTag, isBlockTag, isTextTag
* tagsoup is a new dependency.
* Text.Pandoc.Parsing: Generalized type on readWith.
* Benchmark.hs: Added length calculation to force full evaluation.
* Updated HTML reader tests.
* Updated markdown and textile readers to use the functions from
the HTML reader.
* Note: The markdown reader now correctly handles some cases it did not
before. For example:
<hr/>
is reproduced without adding a space.
<script>
a = '<b>';
</script>
is parsed correctly.
|
|
|
|
There was a bug in parsing '_emph_, ...': when followed by
a comma, underscore emphasis did not register. (Thanks to
gwern for pointing this out.)
This bug was introduced by the change in
c66921f2acea456af527b93e2daa1d8594798642
|
|
This allows different writers to handle punctuation in the suffix
differently.
|
|
|
|
E.g., Mr.
Frank.
|
|
|
|
* The recent change allowing spaces and newlines in the URL
caused problems when reference keys are stacked up without
blank lines between. This is now fixed.
* Added test.
|
|
Moved inlineNote parser after superscript parser,
so ^[link](/foo)^ gets recognized as a superscripted
link, not an inline note followed by garbage.
Thanks to Conal Elliott for pointing out the problem.
|
|
|
|
This is better done on the resulting HTML; use the xss-sanitize library
for this. xss-sanitize is based on pandoc's sanitization, but improves
it.
- Removed stateSanitize from ParserState.
- Removed --sanitize-html option.
|
|
Also, a string of consecutive spaces or tabs is now parsed
as a single space. If you have multiple spaces in your URL,
use %20%20.
|