pandoc - Conversion between markup formats

Age	Commit message (Collapse)	Author	Files	Lines
2014-07-07	Revamped raw HTML block parsing in markdown.	John MacFarlane	1	-7/+22
	- We no longer include trailing spaces and newlines in the raw blocks. - We look for closing tags for elements (but without backtracking). - Each block-level tag is its own RawBlock; we no longer try to consolidate them (though `--normalize` will do so). Closes #1330.
2013-08-18	Adjusted writers and tests for change in parsing of div/span.	John MacFarlane	1	-14/+4
	Textile, MediaWiki, Markdown, Org, RST will emit raw HTML div tags for divs. Otherwise Div and Span are "transparent" block containers.
2013-08-16	Updated tests for latest pandoc-types changes.	John MacFarlane	1	-1/+1

2013-08-14	Updated for removed unMeta, unFormat in pandoc-types.	John MacFarlane	1	-19/+19

2013-08-10	Updated tests for new Format.	John MacFarlane	1	-18/+18

2013-06-25	Some test suite fixes for new metadata.	John MacFarlane	1	-2/+2

2013-06-24	Use new flexible metadata type.	John MacFarlane	1	-2/+2
	* Depend on pandoc 1.12. * Added yaml dependency. * `Text.Pandoc.XML`: Removed `stripTags`. (API change.) * `Text.Pandoc.Shared`: Added `metaToJSON`. This will be used in writers to create a JSON object for use in the templates from the pandoc metadata. * Revised readers and writers to use the new Meta type. * `Text.Pandoc.Options`: Added `Ext_yaml_title_block`. * Markdown reader: Added support for YAML metadata block. Note that it must come at the beginning of the document. * `Text.Pandoc.Parsing.ParserState`: Replace `stateTitle`, `stateAuthors`, `stateDate` with `stateMeta`. * RST reader: Improved metadata. Treat initial field list as metadata when standalone specified. Previously ALL fields "title", "author", "date" in field lists were treated as metadata, even if not at the beginning. Use `subtitle` metadata field for subtitle. * `Text.Pandoc.Templates`: Export `renderTemplate'` that takes a string instead of a compiled template.. * OPML template: Use 'for' loop for authors. * Org template: '#+TITLE:' is inserted before the title. Previously the writer did this.
2013-01-15	Use 'fig:' instead of '\SOH' in title to indicate figure.	John MacFarlane	1	-1/+1
	Revises 1a4b47e93368bfbd31daccdfedbd9527ee740201
2013-01-14	Implemented Ext_implicit_figures.	John MacFarlane	1	-1/+1
	* In markdown reader, add a '\1' character to the beginning of the title of an image that is alone in its paragraph, if implicit_figures extension is selected. * In writers, check for Para [Image alt (src,'\1':tit)] and treat it as a figure if possible. * Updated tests. This is a bit of a hack, but it allows us to make implicit_figures an extension of the markdown reader, rather than the writers.
2013-01-09	Added Attr field to Header.	John MacFarlane	1	-31/+31
	Previously header ids were autogenerated by the writers. Now they are generated (unless supplied explicitly) in the markdown parser, if the `header_identifiers` extension is selected. In addition, the textile reader now supports id attributes on headers.
2013-01-06	Don't put the text of an autolink in Code font.	John MacFarlane	1	-4/+4

2012-08-01	Major rewrite of markdown reader.	John MacFarlane	1	-134/+134
	* Use Builder's Inlines/Blocks instead of lists. * Return values in the reader monad, which are then run (at the end of parsing) against the final parser state. This allows links, notes, and example numbers to be resolved without a second parser pass. * An effect of using Builder is that everything is normalized automatically. * New exports from Text.Pandoc.Parsing: widthsFromIndices, NoteTable', KeyTable', Key', toKey', withQuoteContext, singleQuoteStart, singleQuoteEnd, doubleQuoteStart, doubleQuoteEnd, ellipses, apostrophe, dash * Updated opendocument tests. * Don't derive Show for ParserState. * Benchmarks: markdown reader takes 82% of the time it took before. Markdown writer takes 92% of the time (here the speedup is probably due to the fact that everything is normalized by default).
2011-12-27	Replaced Apostrophe, Ellipses, EmDash, EnDash w/ unicode strings.	John MacFarlane	1	-20/+20

2011-01-26	Add support for attributes in inline Code.	John MacFarlane	1	-10/+10
	Additional related changes: * URLs in Code in autolinks now use class "url". * Require highlighting-kate 0.2.8.2, which omits the final <br/> tag, essential for inline code.
2011-01-26	Updated tests.	John MacFarlane	1	-1/+1

2011-01-26	Bumped version to 1.8; depend on pandoc-types 1.8.	John MacFarlane	1	-18/+18
	The old TeX, HtmlInline and RawHtml elements have been removed and replaced by generic RawInline and RawBlock elements. All modules updated to use the new raw elements.
2011-01-21	Make sure native output ends in newline with --standalone.	John MacFarlane	1	-1/+1

2011-01-20	Updated tests for new native format.	John MacFarlane	1	-416/+395

2011-01-01	Fixed regression in markdown reader.	John MacFarlane	1	-97/+97
	'(_hi_)' was being parsed with literal underscores (no emphasis). The fix: the 'str' parser now only parses alphanumerics and embedded underscores. All other symbols are handled by the 'symbol' parser. This has a slight effect on the AST, since you'll get [Str "hi",Str ":"] insntead of [Str "hi:"]. But there should not be a visible effect in any of the writers. Thanks to gwern for pointing out the regression.
2010-07-20	Made spacing at end of output more consistent.	John MacFarlane	1	-1/+0
	Previously some of the writers added spurious whitespace. This has been removed, resolving Issue #232. NOTE: If your application combines pandoc's output with other text, for example in a template, you may need to add spacing. For example, a pandoc-generated markdown file will not have a blank line after the final block element. If you are inserting it into another markdown file, you will need to make sure there is a blank line between it and the next block element.
2010-02-28	Added accessors (docTitle, docAuthors, docDate) to Meta type.	fiddlosopher	1	-1/+1
	git-svn-id: https://pandoc.googlecode.com/svn/trunk@1853 788f1e2b-df1e-0410-8736-df70ead52e1b
2009-12-31	Changed Meta author and date types to Inline lists instead of Strings.	fiddlosopher	1	-1/+1
	Meta [Inline] [[Inline]] [Inline] rather than Meta [Inline] [String] String. This is a breaking change for libraries that use pandoc and manipulate the metadata. Changed .native files in test suite for new Meta format. git-svn-id: https://pandoc.googlecode.com/svn/trunk@1699 788f1e2b-df1e-0410-8736-df70ead52e1b
2009-12-07	Improved syntax for markdown definition lists.	fiddlosopher	1	-36/+67
	Definition lists are now more compatible with PHP Markdown Extra. Resolves Issue #24. + You can have multiple definitions for a term (but still not multiple terms). + Multi-block definitions no longer need a column before each block (indeed, this will now cause multiple definitions). + The marker no longer needs to be flush with the left margin, but can be indented at or two spaces. Also, ~ as well as : can be used as the marker (this suggestion due to David Wheeler.) + There can now be a blank line between the term and the definitions. git-svn-id: https://pandoc.googlecode.com/svn/trunk@1656 788f1e2b-df1e-0410-8736-df70ead52e1b
2009-04-30	Markdown reader: improved efficiency of abbreviation parsing.	fiddlosopher	1	-1/+1
	Instead of a separate abbrev parser, we just check for abbreviations each time we parse a string. This gives a huge performance boost with -S. Resolves Issue #141. git-svn-id: https://pandoc.googlecode.com/svn/trunk@1570 788f1e2b-df1e-0410-8736-df70ead52e1b
2008-08-13	Support for display math; changed ASCIIMathML -> LaTeXMathML:	fiddlosopher	1	-8/+7
	Resolves Issue #47. + Added a DisplayMath/InlineMath selector to Math inlines. + Markdown parser yields DisplayMath for $$...$$. + LaTeX parser yields DisplayMath when appropriate. Removed mathBlock parsers, since the same effect is achieved by the math inline parsers, now that they handle display math. + Writers handle DisplayMath as appropriate for the format. + Changed -m option to use LaTeXMathML rather than ASCIIMathML. LaTeXMathML is closer to LaTeX in its display of math, and supports many non-math LaTeX environments. + Modified HTML writer to print raw TeX when LaTeXMathML is being used instead of suppressing it. + Removed ASCIIMathML files from data/ and added LaTeXMathML. + Replaced ASCIIMathML with LaTeXMathML in source files. + Modified README and pandoc man page source. + Modified web page. + Added --latexmathml option (kept --asciimathml as a synonym for backwards compatibility) + Modified tests accordingly; added new tests for display math. git-svn-id: https://pandoc.googlecode.com/svn/trunk@1409 788f1e2b-df1e-0410-8736-df70ead52e1b
2008-07-15	Fixed bug in Markdown parser: regular $s triggering math mode.	fiddlosopher	1	-0/+1
	For example: "shoes ($20) and socks ($5)." The fix consists in two new restrictions: + the $ that ends a math span may not be directly followed by a digit. + no blank lines may be included within a math span. Thanks to Joseph Reagle for noticing the bug. git-svn-id: https://pandoc.googlecode.com/svn/trunk@1326 788f1e2b-df1e-0410-8736-df70ead52e1b
2008-07-11	In smart mode, use nonbreaking spaces after abbreviations in markdown parser.	fiddlosopher	1	-1/+1
	Thus, for example, "Mr. Brown" comes out as "Mr.~Brown" in LaTeX, and does not produce a sentence-separating space. Resolves Issue #75. git-svn-id: https://pandoc.googlecode.com/svn/trunk@1298 788f1e2b-df1e-0410-8736-df70ead52e1b
2008-07-11	Treat '\ ' in (extended) markdown as nonbreaking space.	fiddlosopher	1	-2/+2
	Print nonbreaking space appropriately in each writer (e.g. ~ in LaTeX). git-svn-id: https://pandoc.googlecode.com/svn/trunk@1297 788f1e2b-df1e-0410-8736-df70ead52e1b
2008-06-08	Markdown smart typography: Em dashes no longer eat surrounding whitespace.	fiddlosopher	1	-1/+1
	Resolves Issue #69. git-svn-id: https://pandoc.googlecode.com/svn/trunk@1279 788f1e2b-df1e-0410-8736-df70ead52e1b
2008-02-09	Updated test suite to new baseline (but no tests yet for new code block syntax).	fiddlosopher	1	-11/+11
	git-svn-id: https://pandoc.googlecode.com/svn/trunk@1210 788f1e2b-df1e-0410-8736-df70ead52e1b
2008-02-09	Modified tests for new argument in CodeBlock.	fiddlosopher	1	-11/+11
	git-svn-id: https://pandoc.googlecode.com/svn/trunk@1201 788f1e2b-df1e-0410-8736-df70ead52e1b
2007-12-08	Removed support for "box-style" block quotes in markdown.	fiddlosopher	1	-14/+0
	This adds unneeded complexity and makes pandoc diverge further than necessary from other markdown extensions. Brought documentation, tests, and debian/changelog up to date. git-svn-id: https://pandoc.googlecode.com/svn/trunk@1141 788f1e2b-df1e-0410-8736-df70ead52e1b
2007-11-29	Fixed small error in testsuite.native.	fiddlosopher	1	-1/+1
	git-svn-id: https://pandoc.googlecode.com/svn/trunk@1116 788f1e2b-df1e-0410-8736-df70ead52e1b
2007-11-29	Changed tests to use new Math block element.	fiddlosopher	1	-7/+7
	git-svn-id: https://pandoc.googlecode.com/svn/trunk@1111 788f1e2b-df1e-0410-8736-df70ead52e1b
2007-08-23	Added new rule for enhanced markdown ordered lists: if the list marker	fiddlosopher	1	-0/+1
	is a capital letter followed by a period (including a single-letter capital roman numeral), then it must be followed by at least two spaces. The point of this is to avoid accidentally treating people's initials as list markers: a paragraph may begin: B. Russell was an English philosopher. and this shouldn't be treated as a list. Modified Markdown reader and README documentation. Added a test case. git-svn-id: https://pandoc.googlecode.com/svn/trunk@880 788f1e2b-df1e-0410-8736-df70ead52e1b
2007-08-18	+ Fixed bug in markdown ordered list parsing. The problem was	fiddlosopher	1	-0/+2
	that anyOrderedListStart did not check for a space following the ordered list marker. So, 'A.B. 2007' would be parsed as a list item, then fail because of the lack of space after 'A.' (required by orderedListStart). Resolves Issue #22. + Fixed a similar problem in RST reader. + Added regression test. git-svn-id: https://pandoc.googlecode.com/svn/trunk@861 788f1e2b-df1e-0410-8736-df70ead52e1b
2007-08-08	Major change in the way ordered lists are handled:	fiddlosopher	1	-10/+38
	+ The changes are documented in README, under Lists. + The OrderedList block element now stores information about list number style, list number delimiter, and starting number. + The readers parse this information, when possible. + The writers use this information to style ordered lists. + Test suites have been changed accordingly. Motivation: It's often useful to start lists with numbers other than 1, and to have control over the style of the list. Added to Text.Pandoc.Shared: + camelCaseToHyphenated + toRomanNumeral + anyOrderedListMarker + orderedListMarker + orderedListMarkers Added to Text.Pandoc.ParserCombinators: + charsInBalanced' + withHorizDisplacement + romanNumeral RST writer: + Force blank line before lists, so that sublists will be handled correctly. LaTeX reader: + Fixed bug in parsing of footnotes containing multiple paragraphs, introduced by use of charsInBalanced. Fix: use charsInBalanced' instead. LaTeX header: + use mathletters option in ucs package, so that basic unicode Greek letters will work properly. git-svn-id: https://pandoc.googlecode.com/svn/trunk@834 788f1e2b-df1e-0410-8736-df70ead52e1b
2007-07-28	Brought test suite up to date.	fiddlosopher	1	-2/+3
	git-svn-id: https://pandoc.googlecode.com/svn/trunk@828 788f1e2b-df1e-0410-8736-df70ead52e1b
2007-07-28	Updated testsuite.native - autolinks should have the	fiddlosopher	1	-2/+2
	URL in Code, not Str. git-svn-id: https://pandoc.googlecode.com/svn/trunk@812 788f1e2b-df1e-0410-8736-df70ead52e1b
2007-07-22	Updated test suite for writers, adding tests for	fiddlosopher	1	-0/+4
	strikeout, superscript, subscript. git-svn-id: https://pandoc.googlecode.com/svn/trunk@766 788f1e2b-df1e-0410-8736-df70ead52e1b
2007-07-09	Added a test case with an inline link containing bracketed text.	fiddlosopher	1	-1/+1
	git-svn-id: https://pandoc.googlecode.com/svn/trunk@667 788f1e2b-df1e-0410-8736-df70ead52e1b
2007-05-10	Updated test suite with new tests for definition lists.	fiddlosopher	1	-0/+45
	git-svn-id: https://pandoc.googlecode.com/svn/trunk@597 788f1e2b-df1e-0410-8736-df70ead52e1b
2007-04-10	Extensive changes stemming from a rethinking of the Pandoc data	fiddlosopher	1	-62/+36
	structure. Key and Note blocks have been removed. Link and image URLs are now stored directly in Link and Image inlines, and note blocks are stored in Note inlines. This requires changes in both parsers and writers. Markdown and RST parsers need to extract data from key and note blocks and insert them into the relevant inline elements. Other parsers can be simplified, since there is no longer any need to construct separate key and note blocks. Markdown, RST, and HTML writers need to construct lists of notes; Markdown and RST writers need to construct lists of link references (when the --reference-links option is specified); and the RST writer needs to construct a list of image substitution references. All writers have been rewritten to use the State monad when state is required. This rewrite yields a small speed boost and considerably cleaner code. * Text/Pandoc/Definition.hs: + blocks: removed Key and Note + inlines: removed NoteRef, added Note + modified Target: there is no longer a 'Ref' target; all targets are explicit URL, title pairs * Text/Pandoc/Shared.hs: + Added 'Reference', 'isNoteBlock', 'isKeyBlock', 'isLineClump', used in some of the readers. + Removed 'generateReference', 'keyTable', 'replaceReferenceLinks', 'replaceRefLinksBlockList', along with some auxiliary functions used only by them. These are no longer needed, since reference links are resolved in the Markdown and RST readers. + Moved 'inTags', 'selfClosingTag', 'inTagsSimple', and 'inTagsIndented' to the Docbook writer, since that is now the only module that uses them. + Changed name of 'escapeSGMLString' to 'escapeStringForXML' + Added KeyTable and NoteTable types + Removed fields from ParserState; 'stateKeyBlocks', 'stateKeysUsed', 'stateNoteBlocks', 'stateNoteIdentifiers', 'stateInlineLinks'. Added 'stateKeys' and 'stateNotes'. + Added clause for Note to 'prettyBlock'. + Added 'writerNotes', 'writerReferenceLinks' fields to WriterOptions. * Text/Pandoc/Entities.hs: Renamed 'escapeSGMLChar' and 'escapeSGMLString' to 'escapeCharForXML' and 'escapeStringForXML' * Text/ParserCombinators/Pandoc.hs: Added lineClump parser: parses a raw line block up to and including following blank lines. * Main.hs: Replaced --inline-links with --reference-links. * README: + Documented --reference-links and removed description of --inline-links. + Added note that footnotes may occur anywhere in the document, but must be at the outer level, not embedded in block elements. * man/man1/pandoc.1, man/man1/html2markdown.1: Removed --inline-links option, added --reference-links option * Markdown and RST readers: + Rewrote to fit new Pandoc definition. Since there are no longer Note or Key blocks, all note and key blocks are parsed on a first pass through the document. Once tables of notes and keys have been constructed, the remaining parts of the document are reassembled and parsed. + Refactored link parsers. * LaTeX and HTML readers: Rewrote to fit new Pandoc definition. Since there are no longer Note or Key blocks, notes and references can be parsed in a single pass through the document. * RST, Markdown, and HTML writers: Rewrote using state monad new Pandoc and definition. State is used to hold lists of references footnotes to and be printed at the end of the document. * RTF and LaTeX writers: Rewrote using new Pandoc definition. (Because of the different treatment of footnotes, the "notes" parameter is no longer needed in the block and inline conversion functions.) * Docbook writer: + Moved the functions 'attributeList', 'inTags', 'selfClosingTag', 'inTagsSimple', 'inTagsIndented' from Text/Pandoc/Shared, since they are now used only by the Docbook writer. + Rewrote using new Pandoc definition. (Because of the different treatment of footnotes, the "notes" parameter is no longer needed in the block and inline conversion functions.) * Updated test suite * Throughout: old haskell98 module names replaced by hierarchical module names, e.g. List by Data.List. * debian/control: Include libghc6-xhtml-dev instead of libghc6-html-dev in "Build-Depends." * cabalize: + Remove haskell98 from BASE_DEPENDS (since now the new hierarchical module names are being used throughout) + Added mtl to BASE_DEPENDS (needed for state monad) + Removed html from GHC66_DEPENDS (not needed since xhtml is now used) git-svn-id: https://pandoc.googlecode.com/svn/trunk@580 788f1e2b-df1e-0410-8736-df70ead52e1b
2007-01-27	Cleaned up handling of embedded quotes in link titles.	fiddlosopher	1	-3/+3
	Now these are stored as a '"' character, not as '"'. The function escapeLinkTitle in the Markdown writer is unnecessary and was removed. Tests modified accordingly. git-svn-id: https://pandoc.googlecode.com/svn/trunk@517 788f1e2b-df1e-0410-8736-df70ead52e1b
2007-01-27	Changes in entity handling:	fiddlosopher	1	-3/+3
	+ Entities are parsed (and unicode characters returned) in both Markdown and HTML readers. + Parsers characterEntity, namedEntity, decimalEntity, hexEntity added to Entities.hs; these parse a string and return a unicode character. + Changed 'entity' parser in HTML reader to use the 'characterEntity' parser from Entities.hs. + Added new 'entity' parser to Markdown reader, and added '&' as a special character. Adjusted test suite accordingly since now we get 'Str "AT",Str "&",Str "T"' instead of 'Str "AT&T".. + stringToSGML moved to Entities.hs. escapeSGML removed as redundant, given encodeEntities. + stringToSGML, encodeEntities, and specialCharToEntity are given a boolean parameter that causes only numerical entities to be used. This is used in the docbook writer. The HTML writer uses named entities where possible, but not all docbook-consumers know about the named entities without special instructions, so it seems safer to use numerical entities there. + decodeEntities is rewritten in a way that avoids Text.Regex, using the new parsers. + charToEntity and charToNumericalEntity added to Entities.hs. + Moved specialCharToEntity from Shared.hs to Entities.hs. + Removed unneeded 'decodeEntities' from 'str' parser in HTML and Markdown readers. + Removed sgmlHexEntity, sgmlDecimalEntity, sgmlNamedEntity, and sgmlCharacterEntity from Shared.hs. + Modified Docbook writer so that it doesn't rely on Text.Regex for detecting "mailto" links. git-svn-id: https://pandoc.googlecode.com/svn/trunk@515 788f1e2b-df1e-0410-8736-df70ead52e1b
2007-01-06	Fixed bug in Markdown reader's handling of underscores and other	fiddlosopher	1	-0/+1
	inline formatting markers inside reference labels: for example, in '[A_B]: /url/a_b', the material between underscores was being parsed as emphasized inlines. git-svn-id: https://pandoc.googlecode.com/svn/trunk@442 788f1e2b-df1e-0410-8736-df70ead52e1b
2007-01-06	Merged changes from 'quotes' branch since r431. Smart typography	fiddlosopher	1	-57/+57
	is now handled in the Markdown and LaTeX readers, rather than in the writers. The HTML writer has been rewritten to use the prettyprinting library. git-svn-id: https://pandoc.googlecode.com/svn/trunk@436 788f1e2b-df1e-0410-8736-df70ead52e1b
2006-12-30	Merged 'strict' branch from r324. This adds a '--strict'	fiddlosopher	1	-8/+8
	option to pandoc, which forces it to stay as close as possible to official Markdown syntax. git-svn-id: https://pandoc.googlecode.com/svn/trunk@347 788f1e2b-df1e-0410-8736-df70ead52e1b
2006-12-21	+ Added regression tests with footnotes in quote blocks and lists.	fiddlosopher	1	-0/+12
	+ This uncovered an existing bug in the RTF writer, which got indentation wrong on footnotes occuring in indented blocks like lists. Fixed this bug. git-svn-id: https://pandoc.googlecode.com/svn/trunk@263 788f1e2b-df1e-0410-8736-df70ead52e1b
2006-12-19	Merged changes to footnotes branch r219-r240.	fiddlosopher	1	-5/+9
	git-svn-id: https://pandoc.googlecode.com/svn/trunk@241 788f1e2b-df1e-0410-8736-df70ead52e1b