pandoc - Conversion between markup formats

Age	Commit message (Collapse)	Author	Files	Lines
2013-01-12	HTML reader: Added html5 tags to list of block-level tags.	John MacFarlane	1	-5/+8

2013-01-09	Added Attr field to Header.	John MacFarlane	1	-2/+4
	Previously header ids were autogenerated by the writers. Now they are generated (unless supplied explicitly) in the markdown parser, if the `header_identifiers` extension is selected. In addition, the textile reader now supports id attributes on headers.
2012-09-15	HTML reader: Modified htmlTag for fewer false positives.	John MacFarlane	1	-1/+1
	A tag must start with `<` followed by `!`,`?`, `/`, or a letter. This makes it more useful in the wikimedia and markdown parsers.
2012-09-13	MediaWiki reader: Use MWState instead of ParserState.	John MacFarlane	1	-1/+1

2012-09-09	HTML reader: Handle nested `<q>` tags properly.	John MacFarlane	1	-1/+9

2012-09-09	HTML reader: Parse <q> as Quoted DoubleQuote.	John MacFarlane	1	-0/+4

2012-08-15	Moved renderTags' from HTML reader & SelfContained to Shared.	John MacFarlane	1	-13/+1
	Improved removal of markdown="1" attribute in Markdow reader.
2012-07-26	Fixed whitespace errors.	John MacFarlane	1	-5/+5

2012-07-26	Use readerExtensions instead of readerStrict in readers.	John MacFarlane	1	-26/+19
	Test individually for the extensions.
2012-07-25	Changed reader parameters from ParserState to ReaderOptions.	John MacFarlane	1	-3/+3

2012-07-25	Moved ParseRaw from ParserState to ReaderOptions.	John MacFarlane	1	-4/+4

2012-07-25	Options -> ReaderOptions.	John MacFarlane	1	-2/+2
	Better to keep reader and writer options separate.
2012-07-25	Put smart, strict in separate options field in state.	John MacFarlane	1	-2/+3
	This is the beginning of a larger transition that will make Options, not ParserState, the parameter of the read functions. (Options will also be used in writers, in place of WriterOptions.) Next step is to remove strict, replacing it with granular tests for different extensions.
2012-07-24	HTML reader: Fixed bug in htmlBalanced.	John MacFarlane	1	-2/+1
	This caused hangs in parsing certain markdown input using --strict.
2012-07-20	Use Parser as type synonym for Parsec.	John MacFarlane	1	-8/+8

2012-07-20	Text.Pandoc.Parsing: Export all Parsec functions used in pandoc code.	John MacFarlane	1	-2/+0
	No other module directly imports Parsec. This will make it easier to change the parsing backend in the future, if we want to.
2012-07-20	Use Text.Parsec instead of Text.ParserCombinators.Parsec.	John MacFarlane	1	-12/+12

2012-04-29	HTML reader: Support `<col>` and `<caption>` in tables.	John MacFarlane	1	-1/+3
	Closes #486.
2012-04-28	HTML reader: Don't skip nonbreaking spaces.	John MacFarlane	1	-1/+7
	Previously a paragraph containing just ` ` would be rendered as an empty paragraph. Thanks to Paul Vorbach for pointing out the bug.
2012-02-17	Don't escape `<` in `<style>` tags with `--self-contained`.	John MacFarlane	1	-2/+10
	Closes #422: highlighting lost using `--self-contained`.
2012-01-12	Added "title" to list of docbook block-level tags.	John MacFarlane	1	-1/+1

2011-12-29	Better smart quote parsing.	John MacFarlane	1	-2/+6
	* Added stateLastStrPos to ParserState. This lets us keep track of whether we're parsing the position immediately after a 'str'. If we encounter a ' in such a location, it must be an apostrophe, and can't be a single quote start. * Set this in the markdown, textile, html, and rst str parsers. * Closes #360.
2011-10-25	HTML reader now recognizes DocBook block and inline tags.	John MacFarlane	1	-5/+24
	It was always possible to include raw DocBook tags in a markdown document, but now pandoc will be able to distinguish block from inline tags and behave accordingly. Thus, for example, <sidebar> hello </sidebar> will not be wrapped in `<para>` tags.
2011-08-01	HTML reader: Fixed bug parsing tables w both thead and tbody.	John MacFarlane	1	-0/+1
	See bug #274, which was not completely fixed by the last patch.
2011-07-23	Properly handle characters in the 128..159 range.	John MacFarlane	1	-2/+41
	These aren't valid in HTML, but many HTML files produced by Windows tools contain them. We substitute correct unicode characters.
2011-07-16	HTML reader: treat Plain as Para when needed.	John MacFarlane	1	-9/+12
	For example, in Just a few glitches remaining. <ul><li> In this situation, one loses the list. </ul> And in this, the preformatting. <pre>Preformatted text not starting with its own blank line. </pre> Thansk to Dirk Laurie for noticing the issue.
2011-07-15	HTML reader: Handle tbody, thead in simple tables.	John MacFarlane	1	-7/+17
	Closes #274.
2011-07-10	Make HTML reader more forgiving of bad HTML.	John MacFarlane	1	-4/+16
	* Skip spaces after <b>, <emph>, etc. * Convert Plain elements into Para when they're in a list item with Para, Pre, BlockQuote, CodeBlock. An example of HTML that pandoc handles better now: ~~~~ <h4> Testing html to markdown </h4> <ul> <li> <b> An item in a list </b> <p> An introductory sentence. <pre> Some preformatted text at this stage comes next. But alas! much havoc is wrought by Pandoc. </pre> </ul> ~~~~ Thanks to Dirk Laurie for reporting the issues.
2011-01-26	Add support for attributes in inline Code.	John MacFarlane	1	-2/+6
	Additional related changes: * URLs in Code in autolinks now use class "url". * Require highlighting-kate 0.2.8.2, which omits the final <br/> tag, essential for inline code.
2011-01-26	Bumped version to 1.8; depend on pandoc-types 1.8.	John MacFarlane	1	-2/+2
	The old TeX, HtmlInline and RawHtml elements have been removed and replaced by generic RawInline and RawBlock elements. All modules updated to use the new raw elements.
2011-01-14	HTML reader: parse simple tables.	John MacFarlane	1	-2/+22
	Resolves Issue #106. Thanks to Rodja Trappe for the idea and some sample code.
2011-01-14	HTML reader: parse location tags in pSatisfy.	John MacFarlane	1	-13/+17
	This avoids the need for manual parsing all over the place.
2011-01-06	HTML reader: Fixed bug in htmlTag for comments.	John MacFarlane	1	-2/+9

2010-12-30	HTML reader: Fixed some parsing bugs.	John MacFarlane	1	-22/+28

2010-12-30	New HTML reader using tagsoup as a lexer.	John MacFarlane	1	-582/+379
	* The new reader is faster and more accurate. * API changes for Text.Pandoc.Readers.HTML: - removed rawHtmlBlock, anyHtmlBlockTag, anyHtmlInlineTag, anyHtmlTag, anyHtmlEndTag, htmlEndTag, extractTagType, htmlBlockElement, htmlComment - added htmlTag, htmlInBalanced, isInlineTag, isBlockTag, isTextTag * tagsoup is a new dependency. * Text.Pandoc.Parsing: Generalized type on readWith. * Benchmark.hs: Added length calculation to force full evaluation. * Updated HTML reader tests. * Updated markdown and textile readers to use the functions from the HTML reader. * Note: The markdown reader now correctly handles some cases it did not before. For example: <hr/> is reproduced without adding a space. <script> a = '<b>'; </script> is parsed correctly.
2010-12-22	HTML reader: Simplified parsing of <script> sections.	John MacFarlane	1	-24/+1
	I had previously assumed that we needed to ignore </script> occuring in a string literal or javascript comment. It turns out, though, that browsers aren't that smart.
2010-12-22	Made --smart work with HTML reader.	John MacFarlane	1	-4/+13
	It did not work before, because - and quotes were gobbled up by the str parser.
2010-12-15	HTML reader: allow : in tags.	John MacFarlane	1	-2/+6
	Resolves Issue #274.
2010-12-10	Removed HTML sanitization.	John MacFarlane	1	-90/+5
	This is better done on the resulting HTML; use the xss-sanitize library for this. xss-sanitize is based on pandoc's sanitization, but improves it. - Removed stateSanitize from ParserState. - Removed --sanitize-html option.
2010-12-07	Make --smart work in HTML reader.	John MacFarlane	1	-2/+3

2010-12-03	Basic Textile Reader	paul.rivier	1	-1/+2

2010-11-11	HTML reader: don't parse raw HTML inside <code> tag.	John MacFarlane	1	-2/+2
	Previously '<code><a>x</a></code>' would be parsed as Code "<a>x</a>", which is not what you want.
2010-07-14	HTML reader: code cleanup + parse <tt> as Code.	John MacFarlane	1	-34/+47
	Partially resolves Issue #247.
2010-07-05	Moved parsing functions from Text.Pandoc.Shared to new module.	John MacFarlane	1	-1/+2
	+ Text.Pandoc.Parsing
2010-03-23	Properly escape URIs in all readers.	John MacFarlane	1	-3/+3

2010-03-23	Updated copyright notices.	John MacFarlane	1	-2/+2

2010-03-23	Fixed treatment of unicode characters in URIs.	John MacFarlane	1	-1/+1
	* Added stringToURI to Shared. This is used in the HTML writer for all URIs. It properly URI-encodes high characters (> 127), leaving everything else (including symbols and spaces) the same. * Modified unsanitaryURI to allow UTF8 characters in a URI. (First, we convert the URI to URI-encoded octets, then we pass through parseURIReference.) This resolves gitit Issue #99. Previously '[abc](http://gitit.net/测试)' would not be rendered as a link when --sanitize was selected.
2010-02-12	HTML reader: handle spaces before <html>.	fiddlosopher	1	-0/+1
	Resolves Issue #216. git-svn-id: https://pandoc.googlecode.com/svn/trunk@1837 788f1e2b-df1e-0410-8736-df70ead52e1b
2010-02-12	HTML reader: Be forgiving in parsing a bare list within a list.	fiddlosopher	1	-2/+6
	The following is not valid xhtml, but the intent is clear: <ol> <li>one</li> <ol><li>sub</li></ol> <li>two</li> </ol> We'll treat the <ol> as if it's in a <li>. Resolves Issue #215. git-svn-id: https://pandoc.googlecode.com/svn/trunk@1836 788f1e2b-df1e-0410-8736-df70ead52e1b
2010-02-02	Made HTML reader much more forgiving.	fiddlosopher	1	-29/+106
	+ Incorporated idea (from HXT) that an element can be closed by an open tag for another element. + Javascript is partially parsed to make sure that a <script> section is not closed by a </script> in a comment or string. + More lenient non-quoted attribute values. Now we accept anything but a space character, quote, or <>. This helps in parsing e.g. www.google.com! + Bare & signs are now parsed as a string. This is a common HTML mistake. + Skip a bare < in malformed HTML. git-svn-id: https://pandoc.googlecode.com/svn/trunk@1825 788f1e2b-df1e-0410-8736-df70ead52e1b