Age | Commit message (Collapse) | Author | Files | Lines |
|
These aren't valid in HTML, but many HTML files produced by
Windows tools contain them. We substitute correct unicode
characters.
|
|
For example, in
Just a few glitches remaining.
<ul><li> In this situation, one loses the list.
</ul>
And in this, the preformatting.
<pre>Preformatted text not starting with its own blank line.
</pre>
Thansk to Dirk Laurie for noticing the issue.
|
|
Closes #274.
|
|
* Skip spaces after <b>, <emph>, etc.
* Convert Plain elements into Para when they're in a list
item with Para, Pre, BlockQuote, CodeBlock.
An example of HTML that pandoc handles better now:
~~~~
<h4> Testing html to markdown </h4>
<ul>
<li>
<b> An item in a list </b>
<p> An introductory sentence.
<pre>
Some preformatted text
at this stage comes next.
But alas! much havoc
is wrought by Pandoc.
</pre>
</ul>
~~~~
Thanks to Dirk Laurie for reporting the issues.
|
|
Additional related changes:
* URLs in Code in autolinks now use class "url".
* Require highlighting-kate 0.2.8.2, which omits the final <br/> tag,
essential for inline code.
|
|
The old TeX, HtmlInline and RawHtml elements have been removed
and replaced by generic RawInline and RawBlock elements.
All modules updated to use the new raw elements.
|
|
Resolves Issue #106. Thanks to Rodja Trappe for the idea
and some sample code.
|
|
This avoids the need for manual parsing all over the place.
|
|
|
|
|
|
* The new reader is faster and more accurate.
* API changes for Text.Pandoc.Readers.HTML:
- removed rawHtmlBlock, anyHtmlBlockTag, anyHtmlInlineTag,
anyHtmlTag, anyHtmlEndTag, htmlEndTag, extractTagType,
htmlBlockElement, htmlComment
- added htmlTag, htmlInBalanced, isInlineTag, isBlockTag, isTextTag
* tagsoup is a new dependency.
* Text.Pandoc.Parsing: Generalized type on readWith.
* Benchmark.hs: Added length calculation to force full evaluation.
* Updated HTML reader tests.
* Updated markdown and textile readers to use the functions from
the HTML reader.
* Note: The markdown reader now correctly handles some cases it did not
before. For example:
<hr/>
is reproduced without adding a space.
<script>
a = '<b>';
</script>
is parsed correctly.
|
|
I had previously assumed that we needed to ignore
</script> occuring in a string literal or javascript
comment. It turns out, though, that browsers aren't
that smart.
|
|
It did not work before, because - and quotes were gobbled
up by the str parser.
|
|
Resolves Issue #274.
|
|
This is better done on the resulting HTML; use the xss-sanitize library
for this. xss-sanitize is based on pandoc's sanitization, but improves
it.
- Removed stateSanitize from ParserState.
- Removed --sanitize-html option.
|
|
|
|
|
|
Previously '<code><a>x</a></code>' would be parsed as
Code "<a>x</a>", which is not what you want.
|
|
Partially resolves Issue #247.
|
|
+ Text.Pandoc.Parsing
|
|
|
|
|
|
* Added stringToURI to Shared. This is used in the HTML
writer for all URIs. It properly URI-encodes high
characters (> 127), leaving everything else (including
symbols and spaces) the same.
* Modified unsanitaryURI to allow UTF8 characters in a URI.
(First, we convert the URI to URI-encoded octets, then we
pass through parseURIReference.)
This resolves gitit Issue #99. Previously
'[abc](http://gitit.net/测试)' would not be rendered as
a link when --sanitize was selected.
|
|
Resolves Issue #216.
git-svn-id: https://pandoc.googlecode.com/svn/trunk@1837 788f1e2b-df1e-0410-8736-df70ead52e1b
|
|
The following is not valid xhtml, but the intent is clear:
<ol>
<li>one</li>
<ol><li>sub</li></ol>
<li>two</li>
</ol>
We'll treat the <ol> as if it's in a <li>.
Resolves Issue #215.
git-svn-id: https://pandoc.googlecode.com/svn/trunk@1836 788f1e2b-df1e-0410-8736-df70ead52e1b
|
|
+ Incorporated idea (from HXT) that an element can be closed
by an open tag for another element.
+ Javascript is partially parsed to make sure that a <script>
section is not closed by a </script> in a comment or string.
+ More lenient non-quoted attribute values.
Now we accept anything but a space character, quote, or <>.
This helps in parsing e.g. www.google.com!
+ Bare & signs are now parsed as a string. This is a common
HTML mistake.
+ Skip a bare < in malformed HTML.
git-svn-id: https://pandoc.googlecode.com/svn/trunk@1825 788f1e2b-df1e-0410-8736-df70ead52e1b
|
|
git-svn-id: https://pandoc.googlecode.com/svn/trunk@1750 788f1e2b-df1e-0410-8736-df70ead52e1b
|
|
Meta [Inline] [[Inline]] [Inline] rather than
Meta [Inline] [String] String.
This is a breaking change for libraries that use pandoc and
manipulate the metadata.
Changed .native files in test suite for new Meta format.
git-svn-id: https://pandoc.googlecode.com/svn/trunk@1699 788f1e2b-df1e-0410-8736-df70ead52e1b
|
|
Definition lists are now more compatible with PHP Markdown Extra.
Resolves Issue #24.
+ You can have multiple definitions for a term (but still not
multiple terms).
+ Multi-block definitions no longer need a
column before each block (indeed, this will now cause
multiple definitions).
+ The marker no longer needs to be flush with the left margin,
but can be indented at or two spaces. Also, ~ as well as :
can be used as the marker (this suggestion due to David
Wheeler.)
+ There can now be a blank line between the term and
the definitions.
git-svn-id: https://pandoc.googlecode.com/svn/trunk@1656 788f1e2b-df1e-0410-8736-df70ead52e1b
|
|
Resolves Issue #108.
git-svn-id: https://pandoc.googlecode.com/svn/trunk@1645 788f1e2b-df1e-0410-8736-df70ead52e1b
|
|
(Added a needed try.)
git-svn-id: https://pandoc.googlecode.com/svn/trunk@1621 788f1e2b-df1e-0410-8736-df70ead52e1b
|
|
Example:
- a
<!--
- b
-->
- c
Resolves Issue #142.
git-svn-id: https://pandoc.googlecode.com/svn/trunk@1615 788f1e2b-df1e-0410-8736-df70ead52e1b
|
|
git-svn-id: https://pandoc.googlecode.com/svn/trunk@1608 788f1e2b-df1e-0410-8736-df70ead52e1b
|
|
Resolves Issue #157. ('try' in the wrong place.)
git-svn-id: https://pandoc.googlecode.com/svn/trunk@1605 788f1e2b-df1e-0410-8736-df70ead52e1b
|
|
git-svn-id: https://pandoc.googlecode.com/svn/trunk@1567 788f1e2b-df1e-0410-8736-df70ead52e1b
|
|
git-svn-id: https://pandoc.googlecode.com/svn/trunk@1528 788f1e2b-df1e-0410-8736-df70ead52e1b
|
|
git-svn-id: https://pandoc.googlecode.com/svn/trunk@1104 788f1e2b-df1e-0410-8736-df70ead52e1b
|
|
are more trouble than they're worth.
git-svn-id: https://pandoc.googlecode.com/svn/trunk@1064 788f1e2b-df1e-0410-8736-df70ead52e1b
|
|
+ Added library Text.Pandoc.Include, with a template haskell
function $(includeStrFrom fname) to include a file as a string
constant at compile time.
+ This removes the need for the 'templates' directory or Makefile
target. These have been removed.
+ The base source directory has been changed from src to .
+ A new 'data' directory has been added, containing the ASCIIMathML.js
script, writer headers, and S5 files.
+ The src/wrappers directory has been moved to 'wrappers'.
+ The Text.Pandoc.ASCIIMathML library is no longer needed, since
Text.Pandoc.Writers.HTML can use includeStrFrom to include the
ASCIIMathML.js code directly. It has been removed.
git-svn-id: https://pandoc.googlecode.com/svn/trunk@1063 788f1e2b-df1e-0410-8736-df70ead52e1b
|
|
from contents of <pre>...</pre> in codeBlock parser.
git-svn-id: https://pandoc.googlecode.com/svn/trunk@1023 788f1e2b-df1e-0410-8736-df70ead52e1b
|
|
+ <code> tag is no longer needed. <pre> suffices.
+ all HTML tags in the code block (e.g. for syntax highlighting)
are skipped, because they are not portable to other output formats.
git-svn-id: https://pandoc.googlecode.com/svn/trunk@1022 788f1e2b-df1e-0410-8736-df70ead52e1b
|
|
git-svn-id: https://pandoc.googlecode.com/svn/trunk@1016 788f1e2b-df1e-0410-8736-df70ead52e1b
|
|
+ <code>...</code> not surrounded by <pre> should count as
inline HTML, not code block.
+ parser for minimized attributes should not swallow trailing spaces
git-svn-id: https://pandoc.googlecode.com/svn/trunk@1015 788f1e2b-df1e-0410-8736-df70ead52e1b
|
|
(HTML reader).
git-svn-id: https://pandoc.googlecode.com/svn/trunk@863 788f1e2b-df1e-0410-8736-df70ead52e1b
|
|
+ LaTeX reader: skip anything after \end{document}
+ HTML reader: fixed bug skipping material after </html> -- previously,
stuff at the end was skipped even if no </html> was present, which
meant only part of the file would be parsed and no error issued
+ HTML reader: added new constant eitherBlockOrInline with elements that
may count either as block-level or inline
+ Modified isInline and isBlock to take this into account
+ modified rawHtmlBlock to accept any tag (even an inline tag);
this is innocuous, because rawHtmlBlock is tried only if a regular
inline element can't be parsed.
git-svn-id: https://pandoc.googlecode.com/svn/trunk@862 788f1e2b-df1e-0410-8736-df70ead52e1b
|
|
git-svn-id: https://pandoc.googlecode.com/svn/trunk@844 788f1e2b-df1e-0410-8736-df70ead52e1b
|
|
reformatted, etc.) More major changes are documented below:
+ Removed Text.Pandoc.ParserCombinators and moved all its definitions
to Text.Pandoc.Shared.
+ In Text.Pandoc.Shared:
- Removed unneeded 'try' in blanklines.
- Removed endsWith function and rewrote functions to use isSuffixOf instead.
- Added >>~ combinator.
- Rewrote stripTrailingNewlines, removeLeadingSpaces.
+ Moved Text.Pandoc.Entities -> Text.Pandoc.CharacterReferences.
- Removed unneeded functions charToEntity, charToNumericalEntity.
- Renamed functions using proper terminology (character references,
not entities). decodeEntities -> decodeCharacterReferences,
characterEntity -> characterReference.
- Moved escapeStringToXML to Docbook writer, which is the only thing
that uses it.
- Removed old entity parser in HTML and Markdown readers; replaced with
new charRef parser in Text.Pandoc.Shared.
+ Fixed accent bug in Text.Pandoc.Readers.LaTeX: \^{} now correctly
parses as a '^' character.
+ Text.Pandoc.ASCIIMathML is no longer an exported module.
git-svn-id: https://pandoc.googlecode.com/svn/trunk@835 788f1e2b-df1e-0410-8736-df70ead52e1b
|
|
+ The changes are documented in README, under Lists.
+ The OrderedList block element now stores information
about list number style, list number delimiter, and
starting number.
+ The readers parse this information, when possible.
+ The writers use this information to style ordered
lists.
+ Test suites have been changed accordingly.
Motivation: It's often useful to start lists with
numbers other than 1, and to have control over the
style of the list.
Added to Text.Pandoc.Shared:
+ camelCaseToHyphenated
+ toRomanNumeral
+ anyOrderedListMarker
+ orderedListMarker
+ orderedListMarkers
Added to Text.Pandoc.ParserCombinators:
+ charsInBalanced'
+ withHorizDisplacement
+ romanNumeral
RST writer:
+ Force blank line before lists, so that sublists will be handled
correctly.
LaTeX reader:
+ Fixed bug in parsing of footnotes containing multiple paragraphs,
introduced by use of charsInBalanced. Fix: use charsInBalanced'
instead.
LaTeX header:
+ use mathletters option in ucs package, so that basic unicode Greek
letters will work properly.
git-svn-id: https://pandoc.googlecode.com/svn/trunk@834 788f1e2b-df1e-0410-8736-df70ead52e1b
|
|
be caused by raw HTML when the parse-raw option isn't selected.)
git-svn-id: https://pandoc.googlecode.com/svn/trunk@787 788f1e2b-df1e-0410-8736-df70ead52e1b
|
|
git-svn-id: https://pandoc.googlecode.com/svn/trunk@786 788f1e2b-df1e-0410-8736-df70ead52e1b
|