Age | Commit message (Collapse) | Author | Files | Lines |
|
|
|
'(_hi_)' was being parsed with literal underscores (no emphasis).
The fix: the 'str' parser now only parses alphanumerics and
embedded underscores. All other symbols are handled by the
'symbol' parser. This has a slight effect on the AST, since
you'll get [Str "hi",Str ":"] insntead of [Str "hi:"]. But there
should not be a visible effect in any of the writers.
Thanks to gwern for pointing out the regression.
|
|
|
|
|
|
|
|
|
|
* The new reader is faster and more accurate.
* API changes for Text.Pandoc.Readers.HTML:
- removed rawHtmlBlock, anyHtmlBlockTag, anyHtmlInlineTag,
anyHtmlTag, anyHtmlEndTag, htmlEndTag, extractTagType,
htmlBlockElement, htmlComment
- added htmlTag, htmlInBalanced, isInlineTag, isBlockTag, isTextTag
* tagsoup is a new dependency.
* Text.Pandoc.Parsing: Generalized type on readWith.
* Benchmark.hs: Added length calculation to force full evaluation.
* Updated HTML reader tests.
* Updated markdown and textile readers to use the functions from
the HTML reader.
* Note: The markdown reader now correctly handles some cases it did not
before. For example:
<hr/>
is reproduced without adding a space.
<script>
a = '<b>';
</script>
is parsed correctly.
|
|
|
|
I had previously assumed that we needed to ignore
</script> occuring in a string literal or javascript
comment. It turns out, though, that browsers aren't
that smart.
|
|
It did not work before, because - and quotes were gobbled
up by the str parser.
|
|
(So they can trigger Quoted environments.)
|
|
This was achieved by rearranging the parsers in inline.
Benchmarks went from 500ms to 307ms -- not quite back to the
279ms we had in 1.6, before supporting smart punctuation and
footnotes, but close.
|
|
|
|
Resolves Issue #274.
|
|
|
|
There was a bug in parsing '_emph_, ...': when followed by
a comma, underscore emphasis did not register. (Thanks to
gwern for pointing this out.)
This bug was introduced by the change in
c66921f2acea456af527b93e2daa1d8594798642
|
|
This allows different writers to handle punctuation in the suffix
differently.
|
|
|
|
|
|
E.g., Mr.
Frank.
|
|
|
|
* The recent change allowing spaces and newlines in the URL
caused problems when reference keys are stacked up without
blank lines between. This is now fixed.
* Added test.
|
|
Moved inlineNote parser after superscript parser,
so ^[link](/foo)^ gets recognized as a superscripted
link, not an inline note followed by garbage.
Thanks to Conal Elliott for pointing out the problem.
|
|
Previously you'd get unexpected behavior on a document that
contained '\begin{document}' in, say, a verbatim block.
|
|
|
|
This is better done on the resulting HTML; use the xss-sanitize library
for this. xss-sanitize is based on pandoc's sanitization, but improves
it.
- Removed stateSanitize from ParserState.
- Removed --sanitize-html option.
|
|
Also, a string of consecutive spaces or tabs is now parsed
as a single space. If you have multiple spaces in your URL,
use %20%20.
|
|
This change avoids repeated parsing of inline lists for 'plain'
blocks.
|
|
|
|
We now parse PBS(Public Broadcasting System) as if it were
"PBS (Public Broadcasting System)".
|
|
Resolves issue #258.
Note that there are some differences in how docutils and
pandoc treat footnotes. Currently pandoc ignores the numeral
or symbol used in the note; footnotes are put in an auto-numbered
ordered list.
|
|
Don't skipNonindentSpaces in noteMarker, since it's also
used in the inline note parser.
|
|
|
|
|
|
|
|
Now “Hi” gets parsed as a Quoted DoubleQuote inline.
|
|
|
|
+ Parameterized smartPunctuation on an inline parser.
+ Handle smartPunctuation in Textile reader.
|
|
|
|
The 'str' parser now reads internal _'s as part of the string.
This prevents pandoc from getting started looking for an emphasized
block, which can cause exponential slowdowns in some cases.
Resolves Issue #182.
|
|
Previously, curly quotes were just parsed literally, leading
to problems in some output formats. Now they are parsed as
Quoted inlines, if --smart is specified.
Resolves Issue #270.
|
|
This broke when we added the Key type. We had assumed that
the custom case-insensitive Ord instance would ensure case-insensitive
matching, but that is not how Data.Map works.
* Added a test case for case-insensitivity in markdown-reader-more
* Removed old refsMatch from Text.Pandoc.Parsing module;
* hid the 'Key' constructor;
* dropped the custom Ord and Eq instances, deriving instead;
* added fromKey and toKey to convert between Keys and Inline lists;
* toKey ensures that keys are case-insensitive, since this is the
only way the API provides to construct a Key.
Resolves Issue #272.
|
|
Conflicts:
src/Text/Pandoc/Definition.hs
|
|
The smartPuncutation parser from the markdown parser
was being used, but this creates two problems:
* smart punctuation rules are slightly different in textile,
for example, a single dash wish space around becomes an
En dash.
* the following gets parsed as a double quoted string followed
by a colon, rather than as a link:
"emphasized text":http://my.url.com
This needs rethinking.
|
|
|
|
|
|
This is consistent with how the other readers work.
|
|
* A single hyphen between two word characters is no longer a
potential strikeout-starter.
* Acronym explanations are dropped.
|
|
It's part of the textile spec to allow raw HTML,
just as with markdown.
-R is no longer needed in test suite.
|
|
|