pandoc - Conversion between markup formats

Age	Commit message (Collapse)	Author	Files	Lines
2011-07-23	Properly handle characters in the 128..159 range.	John MacFarlane	1	-2/+41
	These aren't valid in HTML, but many HTML files produced by Windows tools contain them. We substitute correct unicode characters.
2011-07-16	HTML reader: treat Plain as Para when needed.	John MacFarlane	1	-9/+12
	For example, in Just a few glitches remaining. <ul><li> In this situation, one loses the list. </ul> And in this, the preformatting. <pre>Preformatted text not starting with its own blank line. </pre> Thansk to Dirk Laurie for noticing the issue.
2011-07-15	HTML reader: Handle tbody, thead in simple tables.	John MacFarlane	1	-7/+17
	Closes #274.
2011-07-10	Make HTML reader more forgiving of bad HTML.	John MacFarlane	1	-4/+16
	* Skip spaces after <b>, <emph>, etc. * Convert Plain elements into Para when they're in a list item with Para, Pre, BlockQuote, CodeBlock. An example of HTML that pandoc handles better now: ~~~~ <h4> Testing html to markdown </h4> <ul> <li> <b> An item in a list </b> <p> An introductory sentence. <pre> Some preformatted text at this stage comes next. But alas! much havoc is wrought by Pandoc. </pre> </ul> ~~~~ Thanks to Dirk Laurie for reporting the issues.
2011-01-26	Add support for attributes in inline Code.	John MacFarlane	1	-2/+6
	Additional related changes: * URLs in Code in autolinks now use class "url". * Require highlighting-kate 0.2.8.2, which omits the final <br/> tag, essential for inline code.
2011-01-26	Bumped version to 1.8; depend on pandoc-types 1.8.	John MacFarlane	1	-2/+2
	The old TeX, HtmlInline and RawHtml elements have been removed and replaced by generic RawInline and RawBlock elements. All modules updated to use the new raw elements.
2011-01-14	HTML reader: parse simple tables.	John MacFarlane	1	-2/+22
	Resolves Issue #106. Thanks to Rodja Trappe for the idea and some sample code.
2011-01-14	HTML reader: parse location tags in pSatisfy.	John MacFarlane	1	-13/+17
	This avoids the need for manual parsing all over the place.
2011-01-06	HTML reader: Fixed bug in htmlTag for comments.	John MacFarlane	1	-2/+9

2010-12-30	HTML reader: Fixed some parsing bugs.	John MacFarlane	1	-22/+28

2010-12-30	New HTML reader using tagsoup as a lexer.	John MacFarlane	1	-582/+379
	* The new reader is faster and more accurate. * API changes for Text.Pandoc.Readers.HTML: - removed rawHtmlBlock, anyHtmlBlockTag, anyHtmlInlineTag, anyHtmlTag, anyHtmlEndTag, htmlEndTag, extractTagType, htmlBlockElement, htmlComment - added htmlTag, htmlInBalanced, isInlineTag, isBlockTag, isTextTag * tagsoup is a new dependency. * Text.Pandoc.Parsing: Generalized type on readWith. * Benchmark.hs: Added length calculation to force full evaluation. * Updated HTML reader tests. * Updated markdown and textile readers to use the functions from the HTML reader. * Note: The markdown reader now correctly handles some cases it did not before. For example: <hr/> is reproduced without adding a space. <script> a = '<b>'; </script> is parsed correctly.
2010-12-22	HTML reader: Simplified parsing of <script> sections.	John MacFarlane	1	-24/+1
	I had previously assumed that we needed to ignore </script> occuring in a string literal or javascript comment. It turns out, though, that browsers aren't that smart.
2010-12-22	Made --smart work with HTML reader.	John MacFarlane	1	-4/+13
	It did not work before, because - and quotes were gobbled up by the str parser.
2010-12-15	HTML reader: allow : in tags.	John MacFarlane	1	-2/+6
	Resolves Issue #274.
2010-12-10	Removed HTML sanitization.	John MacFarlane	1	-90/+5
	This is better done on the resulting HTML; use the xss-sanitize library for this. xss-sanitize is based on pandoc's sanitization, but improves it. - Removed stateSanitize from ParserState. - Removed --sanitize-html option.
2010-12-07	Make --smart work in HTML reader.	John MacFarlane	1	-2/+3

2010-12-03	Basic Textile Reader	paul.rivier	1	-1/+2

2010-11-11	HTML reader: don't parse raw HTML inside <code> tag.	John MacFarlane	1	-2/+2
	Previously '<code><a>x</a></code>' would be parsed as Code "<a>x</a>", which is not what you want.
2010-07-14	HTML reader: code cleanup + parse <tt> as Code.	John MacFarlane	1	-34/+47
	Partially resolves Issue #247.
2010-07-05	Moved parsing functions from Text.Pandoc.Shared to new module.	John MacFarlane	1	-1/+2
	+ Text.Pandoc.Parsing
2010-03-23	Properly escape URIs in all readers.	John MacFarlane	1	-3/+3

2010-03-23	Updated copyright notices.	John MacFarlane	1	-2/+2

2010-03-23	Fixed treatment of unicode characters in URIs.	John MacFarlane	1	-1/+1
	* Added stringToURI to Shared. This is used in the HTML writer for all URIs. It properly URI-encodes high characters (> 127), leaving everything else (including symbols and spaces) the same. * Modified unsanitaryURI to allow UTF8 characters in a URI. (First, we convert the URI to URI-encoded octets, then we pass through parseURIReference.) This resolves gitit Issue #99. Previously '[abc](http://gitit.net/测试)' would not be rendered as a link when --sanitize was selected.
2010-02-12	HTML reader: handle spaces before <html>.	fiddlosopher	1	-0/+1
	Resolves Issue #216. git-svn-id: https://pandoc.googlecode.com/svn/trunk@1837 788f1e2b-df1e-0410-8736-df70ead52e1b
2010-02-12	HTML reader: Be forgiving in parsing a bare list within a list.	fiddlosopher	1	-2/+6
	The following is not valid xhtml, but the intent is clear: <ol> <li>one</li> <ol><li>sub</li></ol> <li>two</li> </ol> We'll treat the <ol> as if it's in a <li>. Resolves Issue #215. git-svn-id: https://pandoc.googlecode.com/svn/trunk@1836 788f1e2b-df1e-0410-8736-df70ead52e1b
2010-02-02	Made HTML reader much more forgiving.	fiddlosopher	1	-29/+106
	+ Incorporated idea (from HXT) that an element can be closed by an open tag for another element. + Javascript is partially parsed to make sure that a <script> section is not closed by a </script> in a comment or string. + More lenient non-quoted attribute values. Now we accept anything but a space character, quote, or <>. This helps in parsing e.g. www.google.com! + Bare & signs are now parsed as a string. This is a common HTML mistake. + Skip a bare < in malformed HTML. git-svn-id: https://pandoc.googlecode.com/svn/trunk@1825 788f1e2b-df1e-0410-8736-df70ead52e1b
2009-12-31	Removed redundant imports (found by ghc 6.12).	fiddlosopher	1	-1/+1
	git-svn-id: https://pandoc.googlecode.com/svn/trunk@1750 788f1e2b-df1e-0410-8736-df70ead52e1b
2009-12-31	Changed Meta author and date types to Inline lists instead of Strings.	fiddlosopher	1	-4/+4
	Meta [Inline] [[Inline]] [Inline] rather than Meta [Inline] [String] String. This is a breaking change for libraries that use pandoc and manipulate the metadata. Changed .native files in test suite for new Meta format. git-svn-id: https://pandoc.googlecode.com/svn/trunk@1699 788f1e2b-df1e-0410-8736-df70ead52e1b
2009-12-07	Improved syntax for markdown definition lists.	fiddlosopher	1	-2/+2
	Definition lists are now more compatible with PHP Markdown Extra. Resolves Issue #24. + You can have multiple definitions for a term (but still not multiple terms). + Multi-block definitions no longer need a column before each block (indeed, this will now cause multiple definitions). + The marker no longer needs to be flush with the left margin, but can be indented at or two spaces. Also, ~ as well as : can be used as the marker (this suggestion due to David Wheeler.) + There can now be a blank line between the term and the definitions. git-svn-id: https://pandoc.googlecode.com/svn/trunk@1656 788f1e2b-df1e-0410-8736-df70ead52e1b
2009-12-05	Added "head" to list of HTML block-level tags.	fiddlosopher	1	-1/+1
	Resolves Issue #108. git-svn-id: https://pandoc.googlecode.com/svn/trunk@1645 788f1e2b-df1e-0410-8736-df70ead52e1b
2009-11-21	Fixed htmlComment parser.	fiddlosopher	1	-1/+1
	(Added a needed try.) git-svn-id: https://pandoc.googlecode.com/svn/trunk@1621 788f1e2b-df1e-0410-8736-df70ead52e1b
2009-11-01	Properly handle commented-out list items in markdown.	fiddlosopher	1	-0/+1
	Example: - a <!-- - b --> - c Resolves Issue #142. git-svn-id: https://pandoc.googlecode.com/svn/trunk@1615 788f1e2b-df1e-0410-8736-df70ead52e1b
2009-10-04	Added haddock comments warning that readers assume \n line endings.	fiddlosopher	1	-1/+1
	git-svn-id: https://pandoc.googlecode.com/svn/trunk@1608 788f1e2b-df1e-0410-8736-df70ead52e1b
2009-07-21	Fixed bug in HTML comment parser.	fiddlosopher	1	-2/+2
	Resolves Issue #157. ('try' in the wrong place.) git-svn-id: https://pandoc.googlecode.com/svn/trunk@1605 788f1e2b-df1e-0410-8736-df70ead52e1b
2009-04-29	Made htmlComment parser more efficient.	fiddlosopher	1	-1/+3
	git-svn-id: https://pandoc.googlecode.com/svn/trunk@1567 788f1e2b-df1e-0410-8736-df70ead52e1b
2009-01-24	Moved all haskell source to src subdirectory.	fiddlosopher	1	-0/+675
	git-svn-id: https://pandoc.googlecode.com/svn/trunk@1528 788f1e2b-df1e-0410-8736-df70ead52e1b
2007-11-29	Moved everything from src into the top-level directory.	fiddlosopher	1	-496/+0
	git-svn-id: https://pandoc.googlecode.com/svn/trunk@1104 788f1e2b-df1e-0410-8736-df70ead52e1b
2007-11-03	Reverted back to state as of r1062. The template haskell changes	fiddlosopher	1	-0/+496
	are more trouble than they're worth. git-svn-id: https://pandoc.googlecode.com/svn/trunk@1064 788f1e2b-df1e-0410-8736-df70ead52e1b
2007-11-03	Use template haskell to avoid the need for templates:	fiddlosopher	1	-496/+0
	+ Added library Text.Pandoc.Include, with a template haskell function $(includeStrFrom fname) to include a file as a string constant at compile time. + This removes the need for the 'templates' directory or Makefile target. These have been removed. + The base source directory has been changed from src to . + A new 'data' directory has been added, containing the ASCIIMathML.js script, writer headers, and S5 files. + The src/wrappers directory has been moved to 'wrappers'. + The Text.Pandoc.ASCIIMathML library is no longer needed, since Text.Pandoc.Writers.HTML can use includeStrFrom to include the ASCIIMathML.js code directly. It has been removed. git-svn-id: https://pandoc.googlecode.com/svn/trunk@1063 788f1e2b-df1e-0410-8736-df70ead52e1b
2007-09-17	Remove just one leading and one trailing newline	fiddlosopher	1	-3/+11
	from contents of <pre>...</pre> in codeBlock parser. git-svn-id: https://pandoc.googlecode.com/svn/trunk@1023 788f1e2b-df1e-0410-8736-df70ead52e1b
2007-09-17	Changed parsing of code blocks in HTML reader:	fiddlosopher	1	-7/+8
	+ <code> tag is no longer needed. <pre> suffices. + all HTML tags in the code block (e.g. for syntax highlighting) are skipped, because they are not portable to other output formats. git-svn-id: https://pandoc.googlecode.com/svn/trunk@1022 788f1e2b-df1e-0410-8736-df70ead52e1b
2007-09-15	Simplified HTML attribute parsing (HTML reader).	fiddlosopher	1	-10/+5
	git-svn-id: https://pandoc.googlecode.com/svn/trunk@1016 788f1e2b-df1e-0410-8736-df70ead52e1b
2007-09-14	Fixed two bugs in HTML reader:	fiddlosopher	1	-11/+4
	+ <code>...</code> not surrounded by <pre> should count as inline HTML, not code block. + parser for minimized attributes should not swallow trailing spaces git-svn-id: https://pandoc.googlecode.com/svn/trunk@1015 788f1e2b-df1e-0410-8736-df70ead52e1b
2007-08-19	Added a necessary "try" in definition of "para"	fiddlosopher	1	-1/+2
	(HTML reader). git-svn-id: https://pandoc.googlecode.com/svn/trunk@863 788f1e2b-df1e-0410-8736-df70ead52e1b
2007-08-18	Bug fixes in readers:	fiddlosopher	1	-6/+18
	+ LaTeX reader: skip anything after \end{document} + HTML reader: fixed bug skipping material after </html> -- previously, stuff at the end was skipped even if no </html> was present, which meant only part of the file would be parsed and no error issued + HTML reader: added new constant eitherBlockOrInline with elements that may count either as block-level or inline + Modified isInline and isBlock to take this into account + modified rawHtmlBlock to accept any tag (even an inline tag); this is innocuous, because rawHtmlBlock is tried only if a regular inline element can't be parsed. git-svn-id: https://pandoc.googlecode.com/svn/trunk@862 788f1e2b-df1e-0410-8736-df70ead52e1b
2007-08-15	Allow htmlComments as rawHtmlInline in HTML reader.	fiddlosopher	1	-2/+3
	git-svn-id: https://pandoc.googlecode.com/svn/trunk@844 788f1e2b-df1e-0410-8736-df70ead52e1b
2007-08-15	Major code cleanup in all modules. (Removed unneeded imports,	fiddlosopher	1	-202/+160
	reformatted, etc.) More major changes are documented below: + Removed Text.Pandoc.ParserCombinators and moved all its definitions to Text.Pandoc.Shared. + In Text.Pandoc.Shared: - Removed unneeded 'try' in blanklines. - Removed endsWith function and rewrote functions to use isSuffixOf instead. - Added >>~ combinator. - Rewrote stripTrailingNewlines, removeLeadingSpaces. + Moved Text.Pandoc.Entities -> Text.Pandoc.CharacterReferences. - Removed unneeded functions charToEntity, charToNumericalEntity. - Renamed functions using proper terminology (character references, not entities). decodeEntities -> decodeCharacterReferences, characterEntity -> characterReference. - Moved escapeStringToXML to Docbook writer, which is the only thing that uses it. - Removed old entity parser in HTML and Markdown readers; replaced with new charRef parser in Text.Pandoc.Shared. + Fixed accent bug in Text.Pandoc.Readers.LaTeX: \^{} now correctly parses as a '^' character. + Text.Pandoc.ASCIIMathML is no longer an exported module. git-svn-id: https://pandoc.googlecode.com/svn/trunk@835 788f1e2b-df1e-0410-8736-df70ead52e1b
2007-08-08	Major change in the way ordered lists are handled:	fiddlosopher	1	-2/+17
	+ The changes are documented in README, under Lists. + The OrderedList block element now stores information about list number style, list number delimiter, and starting number. + The readers parse this information, when possible. + The writers use this information to style ordered lists. + Test suites have been changed accordingly. Motivation: It's often useful to start lists with numbers other than 1, and to have control over the style of the list. Added to Text.Pandoc.Shared: + camelCaseToHyphenated + toRomanNumeral + anyOrderedListMarker + orderedListMarker + orderedListMarkers Added to Text.Pandoc.ParserCombinators: + charsInBalanced' + withHorizDisplacement + romanNumeral RST writer: + Force blank line before lists, so that sublists will be handled correctly. LaTeX reader: + Fixed bug in parsing of footnotes containing multiple paragraphs, introduced by use of charsInBalanced. Fix: use charsInBalanced' instead. LaTeX header: + use mathletters option in ucs package, so that basic unicode Greek letters will work properly. git-svn-id: https://pandoc.googlecode.com/svn/trunk@834 788f1e2b-df1e-0410-8736-df70ead52e1b
2007-07-23	In HTML reader, filter Nulls in lists of blocks. (These can	fiddlosopher	1	-2/+2
	be caused by raw HTML when the parse-raw option isn't selected.) git-svn-id: https://pandoc.googlecode.com/svn/trunk@787 788f1e2b-df1e-0410-8736-df70ead52e1b
2007-07-23	Fixed bug in spanStrikeout: case was not exhaustive.	fiddlosopher	1	-1/+1
	git-svn-id: https://pandoc.googlecode.com/svn/trunk@786 788f1e2b-df1e-0410-8736-df70ead52e1b