pandoc - Conversion between markup formats

Age	Commit message (Collapse)	Author	Files	Lines
2007-04-15	Moved escape and nullBlock parsers from ParserCombinators/Pandoc	fiddlosopher	1	-3/+18
	to Pandoc/Shared. Reason: ParserCombinators/Pandoc is for general-purpose parsers that don't require Pandoc.Definition. Also removed some unnecessary imports from Pandoc/Shared. git-svn-id: https://pandoc.googlecode.com/svn/trunk@584 788f1e2b-df1e-0410-8736-df70ead52e1b
2007-04-13	Added Table to prettyBlock in Shared.hs.	fiddlosopher	1	-0/+7
	git-svn-id: https://pandoc.googlecode.com/svn/trunk@582 788f1e2b-df1e-0410-8736-df70ead52e1b
2007-04-10	Extensive changes stemming from a rethinking of the Pandoc data	fiddlosopher	13	-1095/+996
	structure. Key and Note blocks have been removed. Link and image URLs are now stored directly in Link and Image inlines, and note blocks are stored in Note inlines. This requires changes in both parsers and writers. Markdown and RST parsers need to extract data from key and note blocks and insert them into the relevant inline elements. Other parsers can be simplified, since there is no longer any need to construct separate key and note blocks. Markdown, RST, and HTML writers need to construct lists of notes; Markdown and RST writers need to construct lists of link references (when the --reference-links option is specified); and the RST writer needs to construct a list of image substitution references. All writers have been rewritten to use the State monad when state is required. This rewrite yields a small speed boost and considerably cleaner code. * Text/Pandoc/Definition.hs: + blocks: removed Key and Note + inlines: removed NoteRef, added Note + modified Target: there is no longer a 'Ref' target; all targets are explicit URL, title pairs * Text/Pandoc/Shared.hs: + Added 'Reference', 'isNoteBlock', 'isKeyBlock', 'isLineClump', used in some of the readers. + Removed 'generateReference', 'keyTable', 'replaceReferenceLinks', 'replaceRefLinksBlockList', along with some auxiliary functions used only by them. These are no longer needed, since reference links are resolved in the Markdown and RST readers. + Moved 'inTags', 'selfClosingTag', 'inTagsSimple', and 'inTagsIndented' to the Docbook writer, since that is now the only module that uses them. + Changed name of 'escapeSGMLString' to 'escapeStringForXML' + Added KeyTable and NoteTable types + Removed fields from ParserState; 'stateKeyBlocks', 'stateKeysUsed', 'stateNoteBlocks', 'stateNoteIdentifiers', 'stateInlineLinks'. Added 'stateKeys' and 'stateNotes'. + Added clause for Note to 'prettyBlock'. + Added 'writerNotes', 'writerReferenceLinks' fields to WriterOptions. * Text/Pandoc/Entities.hs: Renamed 'escapeSGMLChar' and 'escapeSGMLString' to 'escapeCharForXML' and 'escapeStringForXML' * Text/ParserCombinators/Pandoc.hs: Added lineClump parser: parses a raw line block up to and including following blank lines. * Main.hs: Replaced --inline-links with --reference-links. * README: + Documented --reference-links and removed description of --inline-links. + Added note that footnotes may occur anywhere in the document, but must be at the outer level, not embedded in block elements. * man/man1/pandoc.1, man/man1/html2markdown.1: Removed --inline-links option, added --reference-links option * Markdown and RST readers: + Rewrote to fit new Pandoc definition. Since there are no longer Note or Key blocks, all note and key blocks are parsed on a first pass through the document. Once tables of notes and keys have been constructed, the remaining parts of the document are reassembled and parsed. + Refactored link parsers. * LaTeX and HTML readers: Rewrote to fit new Pandoc definition. Since there are no longer Note or Key blocks, notes and references can be parsed in a single pass through the document. * RST, Markdown, and HTML writers: Rewrote using state monad new Pandoc and definition. State is used to hold lists of references footnotes to and be printed at the end of the document. * RTF and LaTeX writers: Rewrote using new Pandoc definition. (Because of the different treatment of footnotes, the "notes" parameter is no longer needed in the block and inline conversion functions.) * Docbook writer: + Moved the functions 'attributeList', 'inTags', 'selfClosingTag', 'inTagsSimple', 'inTagsIndented' from Text/Pandoc/Shared, since they are now used only by the Docbook writer. + Rewrote using new Pandoc definition. (Because of the different treatment of footnotes, the "notes" parameter is no longer needed in the block and inline conversion functions.) * Updated test suite * Throughout: old haskell98 module names replaced by hierarchical module names, e.g. List by Data.List. * debian/control: Include libghc6-xhtml-dev instead of libghc6-html-dev in "Build-Depends." * cabalize: + Remove haskell98 from BASE_DEPENDS (since now the new hierarchical module names are being used throughout) + Added mtl to BASE_DEPENDS (needed for state monad) + Removed html from GHC66_DEPENDS (not needed since xhtml is now used) git-svn-id: https://pandoc.googlecode.com/svn/trunk@580 788f1e2b-df1e-0410-8736-df70ead52e1b
2007-04-08	Fixed bug in email obfuscation (issue #15). If the text to be obfuscated	fiddlosopher	1	-1/+2
	contains an entity, this needs to be decoded before obfuscation. Thanks to thsutton for the patch. git-svn-id: https://pandoc.googlecode.com/svn/trunk@579 788f1e2b-df1e-0410-8736-df70ead52e1b
2007-03-17	Removed Blank block element as unnecessary.	fiddlosopher	10	-12/+4
	git-svn-id: https://pandoc.googlecode.com/svn/trunk@578 788f1e2b-df1e-0410-8736-df70ead52e1b
2007-03-17	Consolidated 'text', 'special', and 'inline' into 'inline'.	fiddlosopher	1	-8/+23
	git-svn-id: https://pandoc.googlecode.com/svn/trunk@577 788f1e2b-df1e-0410-8736-df70ead52e1b
2007-03-16	Added trys to two list start routines. Reason:	fiddlosopher	1	-4/+4
	<\|> only parses second parser when first hasn't consumed input. git-svn-id: https://pandoc.googlecode.com/svn/trunk@576 788f1e2b-df1e-0410-8736-df70ead52e1b
2007-03-12	Added clauses for DefinitionList and Table to replaceReferenceLinks in	fiddlosopher	1	-0/+8
	Text/Pandoc/Shared.hs. This ensures that reference-style links inside tables and definition lists will be handled properly. git-svn-id: https://pandoc.googlecode.com/svn/trunk@575 788f1e2b-df1e-0410-8736-df70ead52e1b
2007-03-12	Simplified keyTable, using assumption that key blocks are not	fiddlosopher	1	-20/+0
	inside other block elements (an assumption that the Markdown reader uses in making its initial pass anyway). git-svn-id: https://pandoc.googlecode.com/svn/trunk@574 788f1e2b-df1e-0410-8736-df70ead52e1b
2007-03-11	Changes to Markdown reader relating to definition lists:	fiddlosopher	1	-5/+11
	+ fixed bug in indentSpaces (which didn't properly handle cases with mixed spaces and tabs) + rewrote definition list code to conform to new syntax + include definition lists in list block + failIfStrict on definition lists git-svn-id: https://pandoc.googlecode.com/svn/trunk@572 788f1e2b-df1e-0410-8736-df70ead52e1b
2007-03-11	Added support for DefinitionList blocks to HTML writer.	fiddlosopher	1	-5/+11
	Cleaned up bullet and ordered list code by using ordList and unordList instead of raw olist and ulist. git-svn-id: https://pandoc.googlecode.com/svn/trunk@571 788f1e2b-df1e-0410-8736-df70ead52e1b
2007-03-11	Fixed bug in HTML email obfuscation using --strict mode.	fiddlosopher	1	-1/+3
	The problem is that the "href" function escapes &, so (href "l") is 'href="&#108;"'. Fixed by using primHtml for the whole link. Resolves issue 9. git-svn-id: https://pandoc.googlecode.com/svn/trunk@569 788f1e2b-df1e-0410-8736-df70ead52e1b
2007-03-10	Changed syntax of definition lists in Markdown parser:	fiddlosopher	1	-6/+7
	+ definition blocks must be indented throughout (not just in first line) + compact lists can be formed by leaving no blank line between a definition and the next term git-svn-id: https://pandoc.googlecode.com/svn/trunk@568 788f1e2b-df1e-0410-8736-df70ead52e1b
2007-03-10	Added parser for definition lists, derived from reStructuredText	fiddlosopher	1	-3/+32
	syntax: term 1 Definition 1 Paragraph 2 of definition 1. term 2 There must be whitespace between entries. Any kind of block may serve as a definition, but the first line of each block must be indented. terms can contain any inline elements If you want to be lazy, you can just indent the first line of the definition block. git-svn-id: https://pandoc.googlecode.com/svn/trunk@566 788f1e2b-df1e-0410-8736-df70ead52e1b
2007-03-10	Modified prettyPandoc to handle DefinitionList elements.	fiddlosopher	1	-0/+4
	git-svn-id: https://pandoc.googlecode.com/svn/trunk@565 788f1e2b-df1e-0410-8736-df70ead52e1b
2007-03-10	Added definition for DefinitionList block element.	fiddlosopher	1	-0/+3
	git-svn-id: https://pandoc.googlecode.com/svn/trunk@564 788f1e2b-df1e-0410-8736-df70ead52e1b
2007-03-09	Change in ordered lists in Markdown reader:	fiddlosopher	1	-6/+12
	+ Lists may begin with lowercase letters only, and only 'a' through 'n'. Otherwise first initials and page references (e.g., p. 400) are too easily parsed as lists. + Numbers beginning list items must end with '.' (not ')', which is now allowed only after letters). NOTE: This change may cause documents to be parsed differently. Users should take care in upgrading. git-svn-id: https://pandoc.googlecode.com/svn/trunk@561 788f1e2b-df1e-0410-8736-df70ead52e1b
2007-03-07	More smart quote adjustments:	fiddlosopher	1	-4/+2
	+ remove support for all-caps contractions (too much potential for conflict with things like 'M. Mitterand') + add support for 'm as a contraction git-svn-id: https://pandoc.googlecode.com/svn/trunk@560 788f1e2b-df1e-0410-8736-df70ead52e1b
2007-03-07	Smart quote parsing in Markdown reader:	fiddlosopher	1	-2/+4
	treat ' followed by ll, re, ve, then a non-letter, as a contraction. (e.g. I've, you're, he'll) git-svn-id: https://pandoc.googlecode.com/svn/trunk@559 788f1e2b-df1e-0410-8736-df70ead52e1b
2007-03-04	Fixed bug in noscript part of email obfuscation:	fiddlosopher	1	-1/+1
	& instead of & git-svn-id: https://pandoc.googlecode.com/svn/trunk@557 788f1e2b-df1e-0410-8736-df70ead52e1b
2007-03-03	Made image parsing in HTML reader sensitive to the	fiddlosopher	1	-3/+6
	--inline-links option. git-svn-id: https://pandoc.googlecode.com/svn/trunk@556 788f1e2b-df1e-0410-8736-df70ead52e1b
2007-03-03	Added --inline-links option to force links in HTML to be parsed	fiddlosopher	2	-3/+8
	as inline links, rather than reference links. (Addresses Issue #4.) git-svn-id: https://pandoc.googlecode.com/svn/trunk@554 788f1e2b-df1e-0410-8736-df70ead52e1b
2007-02-27	Changes to test suite for new XHTML output.	fiddlosopher	1	-8/+7
	git-svn-id: https://pandoc.googlecode.com/svn/trunk@550 788f1e2b-df1e-0410-8736-df70ead52e1b
2007-02-26	Modified HTML writer to use the Text.XHtml library. This results	fiddlosopher	1	-153/+142
	in cleaner, faster code, and it makes it easier to use Pandoc in other projects, like wikis, that use Text.XHtml. Two functions are now provided, writeHtml and writeHtmlString: the former outputs an Html structure, the latter a rendered string. The S5 writer is also changed, in parallel ways (writeS5, writeS5String). The Html header is now written programmatically, so it has been removed from the 'headers' directory. The S5 header is still needed, but the doctype and some of the meta declarations have been removed, since they are written programatically. The INSTALL file and cabalize have been updated to reflect the new dependency on the xhtml package. git-svn-id: https://pandoc.googlecode.com/svn/trunk@549 788f1e2b-df1e-0410-8736-df70ead52e1b
2007-02-21	Added defaultWriterOptions to Shared.hs.	fiddlosopher	1	-0/+15
	git-svn-id: https://pandoc.googlecode.com/svn/trunk@545 788f1e2b-df1e-0410-8736-df70ead52e1b
2007-02-17	In writing Markdown, print unicode nonbreaking space	fiddlosopher	1	-1/+8
	(160) as " ", since otherwise it is hard to distinguish from a regular space. (Addresses Issue #3.) git-svn-id: https://pandoc.googlecode.com/svn/trunk@541 788f1e2b-df1e-0410-8736-df70ead52e1b
2007-02-17	Escape non-breaking space in SGML as ' ' instead of	fiddlosopher	1	-1/+2
	printing a unicode non-breaking space, which is hard to distinguish visually from a regular space. (Resolves issue #3.) git-svn-id: https://pandoc.googlecode.com/svn/trunk@540 788f1e2b-df1e-0410-8736-df70ead52e1b
2007-02-15	Refactored str and strong in Markdown reader, for clarity.	fiddlosopher	1	-5/+9
	git-svn-id: https://pandoc.googlecode.com/svn/trunk@539 788f1e2b-df1e-0410-8736-df70ead52e1b
2007-02-15	Got rid of two unneeded 'getState's. Note that	fiddlosopher	2	-4/+3
	lookAhead automatically saves and restores the state. git-svn-id: https://pandoc.googlecode.com/svn/trunk@538 788f1e2b-df1e-0410-8736-df70ead52e1b
2007-02-15	Use lookAhead instead of getInput/setInput in RST reader.	fiddlosopher	1	-3/+1
	git-svn-id: https://pandoc.googlecode.com/svn/trunk@537 788f1e2b-df1e-0410-8736-df70ead52e1b
2007-02-15	Use lookAhead parser for the "first pass" looking for	fiddlosopher	1	-5/+3
	reference keys in Markdown parser, instead of parsing normally, then using setInput to reset input. Slight performance improvement. git-svn-id: https://pandoc.googlecode.com/svn/trunk@536 788f1e2b-df1e-0410-8736-df70ead52e1b
2007-02-15	Removed followedBy' parser from Text/ParserCombinators/Pandoc,	fiddlosopher	3	-8/+8
	replacing it with the 'lookAhead' parser from Text/ParserCombinators/Parsec. git-svn-id: https://pandoc.googlecode.com/svn/trunk@535 788f1e2b-df1e-0410-8736-df70ead52e1b
2007-02-14	Introduced a new map, reverseEntityTable, for lookups	fiddlosopher	1	-6/+8
	of entity by character, in Entities.hs. This yields a small performance improvement. git-svn-id: https://pandoc.googlecode.com/svn/trunk@534 788f1e2b-df1e-0410-8736-df70ead52e1b
2007-02-14	Changed Entities.hs to use Data.Map rather than	fiddlosopher	1	-9/+7
	an association list, for a slight performance boost. git-svn-id: https://pandoc.googlecode.com/svn/trunk@532 788f1e2b-df1e-0410-8736-df70ead52e1b
2007-02-14	Fixed issue #8: slow performance in parsing inline literals in	fiddlosopher	1	-0/+2
	RST reader. The problem was that ``#`` was seen by 'inline' as a potential link or image. Fix: insert 'notFollowedBy (char '`')' in link parsers. git-svn-id: https://pandoc.googlecode.com/svn/trunk@529 788f1e2b-df1e-0410-8736-df70ead52e1b
2007-02-12	Replaced "choice [(try (string ...), ...]" idiom with	fiddlosopher	1	-5/+5
	"oneOfStrings" in LaTeX reader. git-svn-id: https://pandoc.googlecode.com/svn/trunk@528 788f1e2b-df1e-0410-8736-df70ead52e1b
2007-02-12	+ Added some needed "try"s before multicharacter parsers,	fiddlosopher	3	-7/+7
	especially in "option" contexts. + Removed the "try" from the "end" parser in "enclosed" (Text.Pandoc.Shared). Now "enclosed" behaves like "option", "manyTill", etc. git-svn-id: https://pandoc.googlecode.com/svn/trunk@527 788f1e2b-df1e-0410-8736-df70ead52e1b
2007-02-12	Added 'try' in front of 'string', where needed, or	fiddlosopher	1	-6/+8
	used a different parser, in RST reader. This fixes a bug where ````` would not be correctly parsed as a verbatim `. git-svn-id: https://pandoc.googlecode.com/svn/trunk@526 788f1e2b-df1e-0410-8736-df70ead52e1b
2007-02-12	Allow the URI in a RST hyperlink target to start on the line	fiddlosopher	1	-0/+3
	after the reference key. git-svn-id: https://pandoc.googlecode.com/svn/trunk@525 788f1e2b-df1e-0410-8736-df70ead52e1b
2007-01-28	Changed 'encodeEntities' to 'escapeSGMLString'.	fiddlosopher	4	-26/+26
	git-svn-id: https://pandoc.googlecode.com/svn/trunk@520 788f1e2b-df1e-0410-8736-df70ead52e1b
2007-01-28	+ Simplified entity handling by removing stringToSGML from Entities.hs.	fiddlosopher	6	-43/+25
	It is no longer needed now that all entities are processed in the markdown and HTML readers. All calls to stringToSGML have been replaced by calls to encodeEntities. + Since inTag's attribute handling already encodes entities, calls to encodeEntities are no longer needed for attribute values, so they've been removed. + The HTML and Markdown readers now call decodeEntities on all raw strings (e.g. authors, dates, link titles), to ensure that no unprocessed entities are included in the native representation of the document. (In the HTML reader, most of this work is done by a change in extractAttributeName.) + The result is a small speed improvement (around 5% on my benchmark) and cleaner code. git-svn-id: https://pandoc.googlecode.com/svn/trunk@519 788f1e2b-df1e-0410-8736-df70ead52e1b
2007-01-27	Use encodeEntities rather than stringToSGML for contents of	fiddlosopher	2	-2/+2
	Str inline in Docbook and HTML writers, since now these strings should not contain literal entity references. git-svn-id: https://pandoc.googlecode.com/svn/trunk@518 788f1e2b-df1e-0410-8736-df70ead52e1b
2007-01-27	Cleaned up handling of embedded quotes in link titles.	fiddlosopher	2	-9/+4
	Now these are stored as a '"' character, not as '"'. The function escapeLinkTitle in the Markdown writer is unnecessary and was removed. Tests modified accordingly. git-svn-id: https://pandoc.googlecode.com/svn/trunk@517 788f1e2b-df1e-0410-8736-df70ead52e1b
2007-01-27	More changes in entity handling: Instead of using entities for characters	fiddlosopher	4	-52/+51
	above 128 in HTML and Docbook output, we now just use unicode. After all, we're declaring UTF-8 content in the header. This makes the HTML and docbook files produced by pandoc much more readable and editable. Changes to Entities.hs: + Removed specialCharToEntity + Added escapeSGMLChar (which just escapes the basic four, <>&") + Modified encodeEntities and stringToSGML to use escapeSGMLChar + Removed encodeEntitiesNumerical + Rewrote encodeEntities for better performance + Rewrote stringToSGML for better performance git-svn-id: https://pandoc.googlecode.com/svn/trunk@516 788f1e2b-df1e-0410-8736-df70ead52e1b
2007-01-27	Changes in entity handling:	fiddlosopher	7	-141/+133
	+ Entities are parsed (and unicode characters returned) in both Markdown and HTML readers. + Parsers characterEntity, namedEntity, decimalEntity, hexEntity added to Entities.hs; these parse a string and return a unicode character. + Changed 'entity' parser in HTML reader to use the 'characterEntity' parser from Entities.hs. + Added new 'entity' parser to Markdown reader, and added '&' as a special character. Adjusted test suite accordingly since now we get 'Str "AT",Str "&",Str "T"' instead of 'Str "AT&T".. + stringToSGML moved to Entities.hs. escapeSGML removed as redundant, given encodeEntities. + stringToSGML, encodeEntities, and specialCharToEntity are given a boolean parameter that causes only numerical entities to be used. This is used in the docbook writer. The HTML writer uses named entities where possible, but not all docbook-consumers know about the named entities without special instructions, so it seems safer to use numerical entities there. + decodeEntities is rewritten in a way that avoids Text.Regex, using the new parsers. + charToEntity and charToNumericalEntity added to Entities.hs. + Moved specialCharToEntity from Shared.hs to Entities.hs. + Removed unneeded 'decodeEntities' from 'str' parser in HTML and Markdown readers. + Removed sgmlHexEntity, sgmlDecimalEntity, sgmlNamedEntity, and sgmlCharacterEntity from Shared.hs. + Modified Docbook writer so that it doesn't rely on Text.Regex for detecting "mailto" links. git-svn-id: https://pandoc.googlecode.com/svn/trunk@515 788f1e2b-df1e-0410-8736-df70ead52e1b
2007-01-24	Rewrote functions in Text/Pandoc/Shared so as not to use Text.Regex,	fiddlosopher	1	-28/+51
	which does not support unicode: - escapePreservingRegex removed - stringToSGML rewritten using Parsec parser - new parsers for SGML character entities - escapeSGML rewritten using specialCharToEntity - new function specialCharToEntity git-svn-id: https://pandoc.googlecode.com/svn/trunk@514 788f1e2b-df1e-0410-8736-df70ead52e1b
2007-01-24	Changed Markdown autoLink parsing to conform better to	fiddlosopher	1	-6/+6
	Markdown.pl's behavior. <google.com> is not treated as a link, but <http://google.com>, <ftp://google.com>, and <mailto:google@google.com> are. git-svn-id: https://pandoc.googlecode.com/svn/trunk@513 788f1e2b-df1e-0410-8736-df70ead52e1b
2007-01-24	Fixed bug in 'extractTagType' in HTML reader: previous	fiddlosopher	1	-1/+4
	version was not skipping / in close tags. git-svn-id: https://pandoc.googlecode.com/svn/trunk@512 788f1e2b-df1e-0410-8736-df70ead52e1b
2007-01-24	Refactored markdown reader so that Text.Regex is not used.	fiddlosopher	1	-14/+19
	Replaced email regex test with a custom email autolink parser (autoLinkEmail). Also replaced 'selfClosingTag' with a custom function 'isSelfClosingTag'. git-svn-id: https://pandoc.googlecode.com/svn/trunk@511 788f1e2b-df1e-0410-8736-df70ead52e1b
2007-01-24	Fixed a bug in extractTagType in HTML Reader: the previous	fiddlosopher	1	-6/+2
	version extracted the attributes, too, which is not wanted. git-svn-id: https://pandoc.googlecode.com/svn/trunk@510 788f1e2b-df1e-0410-8736-df70ead52e1b