Age | Commit message (Collapse) | Author | Files | Lines |
|
|
|
|
|
This speeds up the markdown reader.
|
|
specialChars, strChar, specialCharsMinusLt.
|
|
|
|
'(_hi_)' was being parsed with literal underscores (no emphasis).
The fix: the 'str' parser now only parses alphanumerics and
embedded underscores. All other symbols are handled by the
'symbol' parser. This has a slight effect on the AST, since
you'll get [Str "hi",Str ":"] insntead of [Str "hi:"]. But there
should not be a visible effect in any of the writers.
Thanks to gwern for pointing out the regression.
|
|
* The new reader is faster and more accurate.
* API changes for Text.Pandoc.Readers.HTML:
- removed rawHtmlBlock, anyHtmlBlockTag, anyHtmlInlineTag,
anyHtmlTag, anyHtmlEndTag, htmlEndTag, extractTagType,
htmlBlockElement, htmlComment
- added htmlTag, htmlInBalanced, isInlineTag, isBlockTag, isTextTag
* tagsoup is a new dependency.
* Text.Pandoc.Parsing: Generalized type on readWith.
* Benchmark.hs: Added length calculation to force full evaluation.
* Updated HTML reader tests.
* Updated markdown and textile readers to use the functions from
the HTML reader.
* Note: The markdown reader now correctly handles some cases it did not
before. For example:
<hr/>
is reproduced without adding a space.
<script>
a = '<b>';
</script>
is parsed correctly.
|
|
|
|
There was a bug in parsing '_emph_, ...': when followed by
a comma, underscore emphasis did not register. (Thanks to
gwern for pointing this out.)
This bug was introduced by the change in
c66921f2acea456af527b93e2daa1d8594798642
|
|
This allows different writers to handle punctuation in the suffix
differently.
|
|
|
|
E.g., Mr.
Frank.
|
|
|
|
* The recent change allowing spaces and newlines in the URL
caused problems when reference keys are stacked up without
blank lines between. This is now fixed.
* Added test.
|
|
Moved inlineNote parser after superscript parser,
so ^[link](/foo)^ gets recognized as a superscripted
link, not an inline note followed by garbage.
Thanks to Conal Elliott for pointing out the problem.
|
|
|
|
This is better done on the resulting HTML; use the xss-sanitize library
for this. xss-sanitize is based on pandoc's sanitization, but improves
it.
- Removed stateSanitize from ParserState.
- Removed --sanitize-html option.
|
|
Also, a string of consecutive spaces or tabs is now parsed
as a single space. If you have multiple spaces in your URL,
use %20%20.
|
|
This change avoids repeated parsing of inline lists for 'plain'
blocks.
|
|
Don't skipNonindentSpaces in noteMarker, since it's also
used in the inline note parser.
|
|
Now “Hi” gets parsed as a Quoted DoubleQuote inline.
|
|
|
|
+ Parameterized smartPunctuation on an inline parser.
+ Handle smartPunctuation in Textile reader.
|
|
The 'str' parser now reads internal _'s as part of the string.
This prevents pandoc from getting started looking for an emphasized
block, which can cause exponential slowdowns in some cases.
Resolves Issue #182.
|
|
Previously, curly quotes were just parsed literally, leading
to problems in some output formats. Now they are parsed as
Quoted inlines, if --smart is specified.
Resolves Issue #270.
|
|
This broke when we added the Key type. We had assumed that
the custom case-insensitive Ord instance would ensure case-insensitive
matching, but that is not how Data.Map works.
* Added a test case for case-insensitivity in markdown-reader-more
* Removed old refsMatch from Text.Pandoc.Parsing module;
* hid the 'Key' constructor;
* dropped the custom Ord and Eq instances, deriving instead;
* added fromKey and toKey to convert between Keys and Inline lists;
* toKey ensures that keys are case-insensitive, since this is the
only way the API provides to construct a Key.
Resolves Issue #272.
|
|
Conflicts:
src/Text/Pandoc/Definition.hs
|
|
|
|
|
|
Do a quick lookahead to make sure what follows looks like a setext
header before parsing any Inlines. This gives a 15% performance
boost in one benchmark. Many thanks to knieriem for finding
the problem (in peg-markdown):
https://github.com/jgm/peg-markdown/issues/issue/3
|
|
If suffix doesn't begin with punctuation, include opening
comma and space in result.
Previously,
@item [only a suffix]
would result in something like
Doe (2002only a suffix)
because there was no opening delimiter.
|
|
Patch from Nathan Gass.
|
|
|
|
|
|
Now we handle a suffix after a bare locator, e.g.
@item1 [p. 30, suffix]
The suffix now includes any punctuation that introduces it.
A few tests fail because of problems with citeproc (extra space
before the suffix, missing space after comma separating multiple
page ranges in the locator).
|
|
Suffixes and prefixes are now [Inline]. The locator is separated
from the citation key by a blank space. The locator consists of
one introductory word and any number of words containing at
least one digit. The suffix, if any, is separated from the locator
by a comma, and continues til the end of the citation.
|
|
|
|
citationPrefix now [Inline] rather than String;
citationSuffix added.
This change presupposes no changes in citeproc-hs.
It passes a string for these values to citeproc-hs.
Eventually, citeproc-hs should use an [Inline] for
these as well.
|
|
|
|
Added a form for in-text citations:
@doe99 [30; see also @smith99].
|
|
Patch from Andrea Rossato.
|
|
|
|
So,
aaa <!-- comment --> bbb
can be a single paragraph.
|
|
By Cabal policy, the API should not change depending on flags.
|
|
|
|
Patch from Andrea Rossato.
Note: the markdown syntax is preliminary and will probably change.
|
|
|
|
Example:
\newcommand{\plus}[2]{#1 + #2}
$\plus{3}{4}$
yields:
3+4
|
|
The refs are now replaced by numbers at the final stage, using
processWith.
|
|
|