pandoc - Conversion between markup formats

Age	Commit message (Collapse)	Author	Files	Lines
2010-12-30	Textile reader: Slight speed improvement.	John MacFarlane	1	-5/+5

2010-12-30	New HTML reader using tagsoup as a lexer.	John MacFarlane	3	-625/+421
	* The new reader is faster and more accurate. * API changes for Text.Pandoc.Readers.HTML: - removed rawHtmlBlock, anyHtmlBlockTag, anyHtmlInlineTag, anyHtmlTag, anyHtmlEndTag, htmlEndTag, extractTagType, htmlBlockElement, htmlComment - added htmlTag, htmlInBalanced, isInlineTag, isBlockTag, isTextTag * tagsoup is a new dependency. * Text.Pandoc.Parsing: Generalized type on readWith. * Benchmark.hs: Added length calculation to force full evaluation. * Updated HTML reader tests. * Updated markdown and textile readers to use the functions from the HTML reader. * Note: The markdown reader now correctly handles some cases it did not before. For example: <hr/> is reproduced without adding a space. <script> a = '<b>'; </script> is parsed correctly.
2010-12-24	Use functions from Text.Pandoc.Generic instead of processWith(M).	John MacFarlane	1	-1/+2

2010-12-22	HTML reader: Simplified parsing of <script> sections.	John MacFarlane	1	-24/+1
	I had previously assumed that we needed to ignore </script> occuring in a string literal or javascript comment. It turns out, though, that browsers aren't that smart.
2010-12-22	Made --smart work with HTML reader.	John MacFarlane	1	-4/+13
	It did not work before, because - and quotes were gobbled up by the str parser.
2010-12-22	RST reader: Added unicode quote characters to specialChars.	John MacFarlane	1	-1/+1
	(So they can trigger Quoted environments.)
2010-12-22	RST reader: recouped speed loss due to addition of --smart.	John MacFarlane	1	-4/+4
	This was achieved by rearranging the parsers in inline. Benchmarks went from 500ms to 307ms -- not quite back to the 279ms we had in 1.6, before supporting smart punctuation and footnotes, but close.
2010-12-21	Shared: Made splitBy take a test instead of an element.	John MacFarlane	1	-1/+1

2010-12-15	HTML reader: allow : in tags.	John MacFarlane	1	-2/+6
	Resolves Issue #274.
2010-12-14	Fixed preamble parsing in LaTeX reader.	John MacFarlane	1	-2/+8

2010-12-14	Fixed regression in parsing _emph_	John MacFarlane	1	-1/+1
	There was a bug in parsing '_emph_, ...': when followed by a comma, underscore emphasis did not register. (Thanks to gwern for pointing this out.) This bug was introduced by the change in c66921f2acea456af527b93e2daa1d8594798642
2010-12-13	Moved special handling of punctuation in suffix out of markdown reader.	Nathan Gass	1	-7/+2
	This allows different writers to handle punctuation in the suffix differently.
2010-12-13	Added support for latex cite commands in latex reader.	Nathan Gass	1	-8/+109

2010-12-13	Markdown reader: Further fix to abbrevs.	John MacFarlane	1	-1/+1

2010-12-13	Markdown reader: Fixed abbrev handler to allow abbrev at end of line.	John MacFarlane	1	-2/+2
	E.g., Mr. Frank.
2010-12-13	Markdown reader: Fixed referenceKey parser to allow space after newline.	John MacFarlane	1	-2/+1

2010-12-13	Markdown reader: Fixed regression in reference key parser.	John MacFarlane	1	-0/+1
	* The recent change allowing spaces and newlines in the URL caused problems when reference keys are stacked up without blank lines between. This is now fixed. * Added test.
2010-12-12	Markdown reader: fix superscripts with links.	John MacFarlane	1	-1/+1
	Moved inlineNote parser after superscript parser, so ^[link](/foo)^ gets recognized as a superscripted link, not an inline note followed by garbage. Thanks to Conal Elliott for pointing out the problem.
2010-12-10	LaTeX reader: Improved parsing of preamble.	John MacFarlane	1	-11/+6
	Previously you'd get unexpected behavior on a document that contained '\begin{document}' in, say, a verbatim block.
2010-12-10	Markdown reader: small cosmetic code improvements.	John MacFarlane	1	-8/+6

2010-12-10	Removed HTML sanitization.	John MacFarlane	2	-101/+10
	This is better done on the resulting HTML; use the xss-sanitize library for this. xss-sanitize is based on pandoc's sanitization, but improves it. - Removed stateSanitize from ParserState. - Removed --sanitize-html option.
2010-12-10	Markdown reader: Allow linebreaks in URLs (treat as spaces).	John MacFarlane	1	-6/+21
	Also, a string of consecutive spaces or tabs is now parsed as a single space. If you have multiple spaces in your URL, use %20%20.
2010-12-10	Markdown reader: Rewrote para parser for better efficiency.	John MacFarlane	1	-10/+8
	This change avoids repeated parsing of inline lists for 'plain' blocks.
2010-12-09	textile redcloth definition lists	paul.rivier	1	-2/+29

2010-12-09	Textile reader: better treatment of acronyms.	John MacFarlane	1	-1/+1
	We now parse PBS(Public Broadcasting System) as if it were "PBS (Public Broadcasting System)".
2010-12-08	RST reader: Added footnote suppport.	John MacFarlane	1	-3/+51
	Resolves issue #258. Note that there are some differences in how docutils and pandoc treat footnotes. Currently pandoc ignores the numeral or symbol used in the note; footnotes are put in an auto-numbered ordered list.
2010-12-08	Markdown reader: minor footnote changes.	John MacFarlane	1	-2/+3
	Don't skipNonindentSpaces in noteMarker, since it's also used in the inline note parser.
2010-12-08	Textile reader: Implemented footnotes.	John MacFarlane	1	-4/+43

2010-12-07	Made --smart work with RST reader.	John MacFarlane	1	-2/+3

2010-12-07	Make --smart work in HTML reader.	John MacFarlane	1	-2/+3

2010-12-07	Smart punctuation: recognize entities.	John MacFarlane	1	-1/+1
	Now “Hi” gets parsed as a Quoted DoubleQuote inline.
2010-12-07	Markdown reader: Moved smartPunctuation parser, for slight speed bump.	John MacFarlane	1	-1/+1

2010-12-07	Moved smartPunctuation from Markdown to Parsing.	John MacFarlane	2	-99/+7
	+ Parameterized smartPunctuation on an inline parser. + Handle smartPunctuation in Textile reader.
2010-12-07	Textile reader: implemented acronyms, (tm), (r), (c).	John MacFarlane	1	-6/+29

2010-12-06	Markdown reader: better handling of intraword _.	John MacFarlane	1	-3/+5
	The 'str' parser now reads internal _'s as part of the string. This prevents pandoc from getting started looking for an emphasized block, which can cause exponential slowdowns in some cases. Resolves Issue #182.
2010-12-06	Markdown reader: handle curly quotes better.	John MacFarlane	1	-15/+14
	Previously, curly quotes were just parsed literally, leading to problems in some output formats. Now they are parsed as Quoted inlines, if --smart is specified. Resolves Issue #270.
2010-12-05	Fix regression: markdown references should be case-insensitive.	John MacFarlane	2	-9/+10
	This broke when we added the Key type. We had assumed that the custom case-insensitive Ord instance would ensure case-insensitive matching, but that is not how Data.Map works. * Added a test case for case-insensitivity in markdown-reader-more * Removed old refsMatch from Text.Pandoc.Parsing module; * hid the 'Key' constructor; * dropped the custom Ord and Eq instances, deriving instead; * added fromKey and toKey to convert between Keys and Inline lists; * toKey ensures that keys are case-insensitive, since this is the only way the API provides to construct a Key. Resolves Issue #272.
2010-12-03	Merge branch 'citeproc' into master.	John MacFarlane	1	-37/+92
	Conflicts: src/Text/Pandoc/Definition.hs
2010-12-03	Textile reader: temporarily removed smartPunctuation.	John MacFarlane	1	-2/+2
	The smartPuncutation parser from the markdown parser was being used, but this creates two problems: * smart punctuation rules are slightly different in textile, for example, a single dash wish space around becomes an En dash. * the following gets parsed as a double quoted string followed by a colon, rather than as a link: "emphasized text":http://my.url.com This needs rethinking.
2010-12-03	Textile reader: added hrule parser.	John MacFarlane	1	-0/+13

2010-12-03	Textile reader: Turn on smart punctuation by default.	John MacFarlane	1	-2/+2

2010-12-03	Textile reader: drop leading, trailing newline in pre block.	John MacFarlane	1	-2/+10
	This is consistent with how the other readers work.
2010-12-03	Textile reader: modified str to handle acronyms, hyphens.	John MacFarlane	1	-3/+16
	* A single hyphen between two word characters is no longer a potential strikeout-starter. * Acronym explanations are dropped.
2010-12-03	Textile reader: parse raw by default.	John MacFarlane	1	-0/+2
	It's part of the textile spec to allow raw HTML, just as with markdown. -R is no longer needed in test suite.
2010-12-03	punctuation handling, and more html-specific handling	paul.rivier	2	-8/+33

2010-12-03	html inlines and html blocks handling in textile reader	Paul Rivier	1	-17/+26

2010-12-03	textile reader now ignores html/css attributes	Paul Rivier	1	-8/+34

2010-12-03	removed support for textile Inserted construct	Paul Rivier	1	-5/+1

2010-12-03	fix autolink by promoting it in the parser list, fix table parabreak	Paul Rivier	1	-7/+5

2010-12-03	more support for Textile reader (explicit links, images), tests and cabal ↵	Paul Rivier	1	-17/+44
	entries