Age | Commit message (Collapse) | Author | Files | Lines |
|
|
|
* Added stateLastStrPos to ParserState. This lets us keep track
of whether we're parsing the position immediately after a 'str'.
If we encounter a ' in such a location, it must be an apostrophe,
and can't be a single quote start.
* Set this in the markdown, textile, html, and rst str parsers.
* Closes #360.
|
|
It was always possible to include raw DocBook tags in a markdown
document, but now pandoc will be able to distinguish block from
inline tags and behave accordingly. Thus, for example,
<sidebar>
hello
</sidebar>
will not be wrapped in `<para>` tags.
|
|
See bug #274, which was not completely fixed by the last patch.
|
|
These aren't valid in HTML, but many HTML files produced by
Windows tools contain them. We substitute correct unicode
characters.
|
|
For example, in
Just a few glitches remaining.
<ul><li> In this situation, one loses the list.
</ul>
And in this, the preformatting.
<pre>Preformatted text not starting with its own blank line.
</pre>
Thansk to Dirk Laurie for noticing the issue.
|
|
Closes #274.
|
|
* Skip spaces after <b>, <emph>, etc.
* Convert Plain elements into Para when they're in a list
item with Para, Pre, BlockQuote, CodeBlock.
An example of HTML that pandoc handles better now:
~~~~
<h4> Testing html to markdown </h4>
<ul>
<li>
<b> An item in a list </b>
<p> An introductory sentence.
<pre>
Some preformatted text
at this stage comes next.
But alas! much havoc
is wrought by Pandoc.
</pre>
</ul>
~~~~
Thanks to Dirk Laurie for reporting the issues.
|
|
Additional related changes:
* URLs in Code in autolinks now use class "url".
* Require highlighting-kate 0.2.8.2, which omits the final <br/> tag,
essential for inline code.
|
|
The old TeX, HtmlInline and RawHtml elements have been removed
and replaced by generic RawInline and RawBlock elements.
All modules updated to use the new raw elements.
|
|
Resolves Issue #106. Thanks to Rodja Trappe for the idea
and some sample code.
|
|
This avoids the need for manual parsing all over the place.
|
|
|
|
|
|
* The new reader is faster and more accurate.
* API changes for Text.Pandoc.Readers.HTML:
- removed rawHtmlBlock, anyHtmlBlockTag, anyHtmlInlineTag,
anyHtmlTag, anyHtmlEndTag, htmlEndTag, extractTagType,
htmlBlockElement, htmlComment
- added htmlTag, htmlInBalanced, isInlineTag, isBlockTag, isTextTag
* tagsoup is a new dependency.
* Text.Pandoc.Parsing: Generalized type on readWith.
* Benchmark.hs: Added length calculation to force full evaluation.
* Updated HTML reader tests.
* Updated markdown and textile readers to use the functions from
the HTML reader.
* Note: The markdown reader now correctly handles some cases it did not
before. For example:
<hr/>
is reproduced without adding a space.
<script>
a = '<b>';
</script>
is parsed correctly.
|
|
I had previously assumed that we needed to ignore
</script> occuring in a string literal or javascript
comment. It turns out, though, that browsers aren't
that smart.
|
|
It did not work before, because - and quotes were gobbled
up by the str parser.
|
|
Resolves Issue #274.
|
|
This is better done on the resulting HTML; use the xss-sanitize library
for this. xss-sanitize is based on pandoc's sanitization, but improves
it.
- Removed stateSanitize from ParserState.
- Removed --sanitize-html option.
|
|
|
|
|
|
Previously '<code><a>x</a></code>' would be parsed as
Code "<a>x</a>", which is not what you want.
|
|
Partially resolves Issue #247.
|
|
+ Text.Pandoc.Parsing
|
|
|
|
|
|
* Added stringToURI to Shared. This is used in the HTML
writer for all URIs. It properly URI-encodes high
characters (> 127), leaving everything else (including
symbols and spaces) the same.
* Modified unsanitaryURI to allow UTF8 characters in a URI.
(First, we convert the URI to URI-encoded octets, then we
pass through parseURIReference.)
This resolves gitit Issue #99. Previously
'[abc](http://gitit.net/测试)' would not be rendered as
a link when --sanitize was selected.
|
|
Resolves Issue #216.
git-svn-id: https://pandoc.googlecode.com/svn/trunk@1837 788f1e2b-df1e-0410-8736-df70ead52e1b
|
|
The following is not valid xhtml, but the intent is clear:
<ol>
<li>one</li>
<ol><li>sub</li></ol>
<li>two</li>
</ol>
We'll treat the <ol> as if it's in a <li>.
Resolves Issue #215.
git-svn-id: https://pandoc.googlecode.com/svn/trunk@1836 788f1e2b-df1e-0410-8736-df70ead52e1b
|
|
+ Incorporated idea (from HXT) that an element can be closed
by an open tag for another element.
+ Javascript is partially parsed to make sure that a <script>
section is not closed by a </script> in a comment or string.
+ More lenient non-quoted attribute values.
Now we accept anything but a space character, quote, or <>.
This helps in parsing e.g. www.google.com!
+ Bare & signs are now parsed as a string. This is a common
HTML mistake.
+ Skip a bare < in malformed HTML.
git-svn-id: https://pandoc.googlecode.com/svn/trunk@1825 788f1e2b-df1e-0410-8736-df70ead52e1b
|
|
git-svn-id: https://pandoc.googlecode.com/svn/trunk@1750 788f1e2b-df1e-0410-8736-df70ead52e1b
|
|
Meta [Inline] [[Inline]] [Inline] rather than
Meta [Inline] [String] String.
This is a breaking change for libraries that use pandoc and
manipulate the metadata.
Changed .native files in test suite for new Meta format.
git-svn-id: https://pandoc.googlecode.com/svn/trunk@1699 788f1e2b-df1e-0410-8736-df70ead52e1b
|
|
Definition lists are now more compatible with PHP Markdown Extra.
Resolves Issue #24.
+ You can have multiple definitions for a term (but still not
multiple terms).
+ Multi-block definitions no longer need a
column before each block (indeed, this will now cause
multiple definitions).
+ The marker no longer needs to be flush with the left margin,
but can be indented at or two spaces. Also, ~ as well as :
can be used as the marker (this suggestion due to David
Wheeler.)
+ There can now be a blank line between the term and
the definitions.
git-svn-id: https://pandoc.googlecode.com/svn/trunk@1656 788f1e2b-df1e-0410-8736-df70ead52e1b
|
|
Resolves Issue #108.
git-svn-id: https://pandoc.googlecode.com/svn/trunk@1645 788f1e2b-df1e-0410-8736-df70ead52e1b
|
|
(Added a needed try.)
git-svn-id: https://pandoc.googlecode.com/svn/trunk@1621 788f1e2b-df1e-0410-8736-df70ead52e1b
|
|
Example:
- a
<!--
- b
-->
- c
Resolves Issue #142.
git-svn-id: https://pandoc.googlecode.com/svn/trunk@1615 788f1e2b-df1e-0410-8736-df70ead52e1b
|
|
git-svn-id: https://pandoc.googlecode.com/svn/trunk@1608 788f1e2b-df1e-0410-8736-df70ead52e1b
|
|
Resolves Issue #157. ('try' in the wrong place.)
git-svn-id: https://pandoc.googlecode.com/svn/trunk@1605 788f1e2b-df1e-0410-8736-df70ead52e1b
|
|
git-svn-id: https://pandoc.googlecode.com/svn/trunk@1567 788f1e2b-df1e-0410-8736-df70ead52e1b
|
|
git-svn-id: https://pandoc.googlecode.com/svn/trunk@1528 788f1e2b-df1e-0410-8736-df70ead52e1b
|
|
git-svn-id: https://pandoc.googlecode.com/svn/trunk@1104 788f1e2b-df1e-0410-8736-df70ead52e1b
|
|
are more trouble than they're worth.
git-svn-id: https://pandoc.googlecode.com/svn/trunk@1064 788f1e2b-df1e-0410-8736-df70ead52e1b
|
|
+ Added library Text.Pandoc.Include, with a template haskell
function $(includeStrFrom fname) to include a file as a string
constant at compile time.
+ This removes the need for the 'templates' directory or Makefile
target. These have been removed.
+ The base source directory has been changed from src to .
+ A new 'data' directory has been added, containing the ASCIIMathML.js
script, writer headers, and S5 files.
+ The src/wrappers directory has been moved to 'wrappers'.
+ The Text.Pandoc.ASCIIMathML library is no longer needed, since
Text.Pandoc.Writers.HTML can use includeStrFrom to include the
ASCIIMathML.js code directly. It has been removed.
git-svn-id: https://pandoc.googlecode.com/svn/trunk@1063 788f1e2b-df1e-0410-8736-df70ead52e1b
|
|
from contents of <pre>...</pre> in codeBlock parser.
git-svn-id: https://pandoc.googlecode.com/svn/trunk@1023 788f1e2b-df1e-0410-8736-df70ead52e1b
|
|
+ <code> tag is no longer needed. <pre> suffices.
+ all HTML tags in the code block (e.g. for syntax highlighting)
are skipped, because they are not portable to other output formats.
git-svn-id: https://pandoc.googlecode.com/svn/trunk@1022 788f1e2b-df1e-0410-8736-df70ead52e1b
|
|
git-svn-id: https://pandoc.googlecode.com/svn/trunk@1016 788f1e2b-df1e-0410-8736-df70ead52e1b
|
|
+ <code>...</code> not surrounded by <pre> should count as
inline HTML, not code block.
+ parser for minimized attributes should not swallow trailing spaces
git-svn-id: https://pandoc.googlecode.com/svn/trunk@1015 788f1e2b-df1e-0410-8736-df70ead52e1b
|
|
(HTML reader).
git-svn-id: https://pandoc.googlecode.com/svn/trunk@863 788f1e2b-df1e-0410-8736-df70ead52e1b
|
|
+ LaTeX reader: skip anything after \end{document}
+ HTML reader: fixed bug skipping material after </html> -- previously,
stuff at the end was skipped even if no </html> was present, which
meant only part of the file would be parsed and no error issued
+ HTML reader: added new constant eitherBlockOrInline with elements that
may count either as block-level or inline
+ Modified isInline and isBlock to take this into account
+ modified rawHtmlBlock to accept any tag (even an inline tag);
this is innocuous, because rawHtmlBlock is tried only if a regular
inline element can't be parsed.
git-svn-id: https://pandoc.googlecode.com/svn/trunk@862 788f1e2b-df1e-0410-8736-df70ead52e1b
|
|
git-svn-id: https://pandoc.googlecode.com/svn/trunk@844 788f1e2b-df1e-0410-8736-df70ead52e1b
|