aboutsummaryrefslogtreecommitdiff
path: root/src
AgeCommit message (Collapse)AuthorFilesLines
2007-05-03Add -asxhtml flag to tidy in html2markdown. This willfiddlosopher1-2/+2
perhaps help the parser. git-svn-id: https://pandoc.googlecode.com/svn/trunk@590 788f1e2b-df1e-0410-8736-df70ead52e1b
2007-05-03Changed definition list syntax in markdown reader and simplifiedfiddlosopher1-10/+9
the parsing code. A colon is now required before every block in a definition. This fixes a problem with the old syntax, in which the last block in the following was ambiguous between a regular paragraph in the definition and a code block following the definition list: term : definition is this code or more definition? git-svn-id: https://pandoc.googlecode.com/svn/trunk@589 788f1e2b-df1e-0410-8736-df70ead52e1b
2007-04-22Resolved issue #10: instead of adding "\n\n" to thefiddlosopher3-3/+3
end of strings in Main, do it in readMarkdown and readRST. (Note: the point of this is to ensure that a block at the end of the file gets treated as if it has blank space after it, which is generally what is wanted.) git-svn-id: https://pandoc.googlecode.com/svn/trunk@588 788f1e2b-df1e-0410-8736-df70ead52e1b
2007-04-22Fixed bug in anyLine parser. Previously anyLine would parse anfiddlosopher1-1/+2
empty string "". But it should fail on an empty string, or we get an error from its use inside "many" combinators. git-svn-id: https://pandoc.googlecode.com/svn/trunk@587 788f1e2b-df1e-0410-8736-df70ead52e1b
2007-04-21Support for definition lists in LaTeX writer.fiddlosopher1-1/+6
git-svn-id: https://pandoc.googlecode.com/svn/trunk@586 788f1e2b-df1e-0410-8736-df70ead52e1b
2007-04-15Fixed export declarations; removed unneeded import offiddlosopher1-3/+0
Pandoc.Shared. git-svn-id: https://pandoc.googlecode.com/svn/trunk@585 788f1e2b-df1e-0410-8736-df70ead52e1b
2007-04-15Moved escape and nullBlock parsers from ParserCombinators/Pandocfiddlosopher2-19/+18
to Pandoc/Shared. Reason: ParserCombinators/Pandoc is for general-purpose parsers that don't require Pandoc.Definition. Also removed some unnecessary imports from Pandoc/Shared. git-svn-id: https://pandoc.googlecode.com/svn/trunk@584 788f1e2b-df1e-0410-8736-df70ead52e1b
2007-04-13Added Table to prettyBlock in Shared.hs.fiddlosopher1-0/+7
git-svn-id: https://pandoc.googlecode.com/svn/trunk@582 788f1e2b-df1e-0410-8736-df70ead52e1b
2007-04-11Added Text.Pandoc module that exports basic readers, writers,fiddlosopher1-0/+79
definitions, and utility functions. git-svn-id: https://pandoc.googlecode.com/svn/trunk@581 788f1e2b-df1e-0410-8736-df70ead52e1b
2007-04-10Extensive changes stemming from a rethinking of the Pandoc datafiddlosopher15-1108/+1016
structure. Key and Note blocks have been removed. Link and image URLs are now stored directly in Link and Image inlines, and note blocks are stored in Note inlines. This requires changes in both parsers and writers. Markdown and RST parsers need to extract data from key and note blocks and insert them into the relevant inline elements. Other parsers can be simplified, since there is no longer any need to construct separate key and note blocks. Markdown, RST, and HTML writers need to construct lists of notes; Markdown and RST writers need to construct lists of link references (when the --reference-links option is specified); and the RST writer needs to construct a list of image substitution references. All writers have been rewritten to use the State monad when state is required. This rewrite yields a small speed boost and considerably cleaner code. * Text/Pandoc/Definition.hs: + blocks: removed Key and Note + inlines: removed NoteRef, added Note + modified Target: there is no longer a 'Ref' target; all targets are explicit URL, title pairs * Text/Pandoc/Shared.hs: + Added 'Reference', 'isNoteBlock', 'isKeyBlock', 'isLineClump', used in some of the readers. + Removed 'generateReference', 'keyTable', 'replaceReferenceLinks', 'replaceRefLinksBlockList', along with some auxiliary functions used only by them. These are no longer needed, since reference links are resolved in the Markdown and RST readers. + Moved 'inTags', 'selfClosingTag', 'inTagsSimple', and 'inTagsIndented' to the Docbook writer, since that is now the only module that uses them. + Changed name of 'escapeSGMLString' to 'escapeStringForXML' + Added KeyTable and NoteTable types + Removed fields from ParserState; 'stateKeyBlocks', 'stateKeysUsed', 'stateNoteBlocks', 'stateNoteIdentifiers', 'stateInlineLinks'. Added 'stateKeys' and 'stateNotes'. + Added clause for Note to 'prettyBlock'. + Added 'writerNotes', 'writerReferenceLinks' fields to WriterOptions. * Text/Pandoc/Entities.hs: Renamed 'escapeSGMLChar' and 'escapeSGMLString' to 'escapeCharForXML' and 'escapeStringForXML' * Text/ParserCombinators/Pandoc.hs: Added lineClump parser: parses a raw line block up to and including following blank lines. * Main.hs: Replaced --inline-links with --reference-links. * README: + Documented --reference-links and removed description of --inline-links. + Added note that footnotes may occur anywhere in the document, but must be at the outer level, not embedded in block elements. * man/man1/pandoc.1, man/man1/html2markdown.1: Removed --inline-links option, added --reference-links option * Markdown and RST readers: + Rewrote to fit new Pandoc definition. Since there are no longer Note or Key blocks, all note and key blocks are parsed on a first pass through the document. Once tables of notes and keys have been constructed, the remaining parts of the document are reassembled and parsed. + Refactored link parsers. * LaTeX and HTML readers: Rewrote to fit new Pandoc definition. Since there are no longer Note or Key blocks, notes and references can be parsed in a single pass through the document. * RST, Markdown, and HTML writers: Rewrote using state monad new Pandoc and definition. State is used to hold lists of references footnotes to and be printed at the end of the document. * RTF and LaTeX writers: Rewrote using new Pandoc definition. (Because of the different treatment of footnotes, the "notes" parameter is no longer needed in the block and inline conversion functions.) * Docbook writer: + Moved the functions 'attributeList', 'inTags', 'selfClosingTag', 'inTagsSimple', 'inTagsIndented' from Text/Pandoc/Shared, since they are now used only by the Docbook writer. + Rewrote using new Pandoc definition. (Because of the different treatment of footnotes, the "notes" parameter is no longer needed in the block and inline conversion functions.) * Updated test suite * Throughout: old haskell98 module names replaced by hierarchical module names, e.g. List by Data.List. * debian/control: Include libghc6-xhtml-dev instead of libghc6-html-dev in "Build-Depends." * cabalize: + Remove haskell98 from BASE_DEPENDS (since now the new hierarchical module names are being used throughout) + Added mtl to BASE_DEPENDS (needed for state monad) + Removed html from GHC66_DEPENDS (not needed since xhtml is now used) git-svn-id: https://pandoc.googlecode.com/svn/trunk@580 788f1e2b-df1e-0410-8736-df70ead52e1b
2007-04-08Fixed bug in email obfuscation (issue #15). If the text to be obfuscatedfiddlosopher1-1/+2
contains an entity, this needs to be decoded before obfuscation. Thanks to thsutton for the patch. git-svn-id: https://pandoc.googlecode.com/svn/trunk@579 788f1e2b-df1e-0410-8736-df70ead52e1b
2007-03-17Removed Blank block element as unnecessary.fiddlosopher11-19/+4
git-svn-id: https://pandoc.googlecode.com/svn/trunk@578 788f1e2b-df1e-0410-8736-df70ead52e1b
2007-03-17Consolidated 'text', 'special', and 'inline' into 'inline'.fiddlosopher1-8/+23
git-svn-id: https://pandoc.googlecode.com/svn/trunk@577 788f1e2b-df1e-0410-8736-df70ead52e1b
2007-03-16Added trys to two list start routines. Reason:fiddlosopher1-4/+4
<|> only parses second parser when first hasn't consumed input. git-svn-id: https://pandoc.googlecode.com/svn/trunk@576 788f1e2b-df1e-0410-8736-df70ead52e1b
2007-03-12Added clauses for DefinitionList and Table to replaceReferenceLinks infiddlosopher1-0/+8
Text/Pandoc/Shared.hs. This ensures that reference-style links inside tables and definition lists will be handled properly. git-svn-id: https://pandoc.googlecode.com/svn/trunk@575 788f1e2b-df1e-0410-8736-df70ead52e1b
2007-03-12Simplified keyTable, using assumption that key blocks are notfiddlosopher1-20/+0
inside other block elements (an assumption that the Markdown reader uses in making its initial pass anyway). git-svn-id: https://pandoc.googlecode.com/svn/trunk@574 788f1e2b-df1e-0410-8736-df70ead52e1b
2007-03-11Changes to Markdown reader relating to definition lists:fiddlosopher1-5/+11
+ fixed bug in indentSpaces (which didn't properly handle cases with mixed spaces and tabs) + rewrote definition list code to conform to new syntax + include definition lists in list block + failIfStrict on definition lists git-svn-id: https://pandoc.googlecode.com/svn/trunk@572 788f1e2b-df1e-0410-8736-df70ead52e1b
2007-03-11Added support for DefinitionList blocks to HTML writer.fiddlosopher1-5/+11
Cleaned up bullet and ordered list code by using ordList and unordList instead of raw olist and ulist. git-svn-id: https://pandoc.googlecode.com/svn/trunk@571 788f1e2b-df1e-0410-8736-df70ead52e1b
2007-03-11Fixed bug in HTML email obfuscation using --strict mode.fiddlosopher1-1/+3
The problem is that the "href" function escapes &, so (href "&#108;") is 'href="&amp;#108;"'. Fixed by using primHtml for the whole link. Resolves issue 9. git-svn-id: https://pandoc.googlecode.com/svn/trunk@569 788f1e2b-df1e-0410-8736-df70ead52e1b
2007-03-10Changed syntax of definition lists in Markdown parser:fiddlosopher1-6/+7
+ definition blocks must be indented throughout (not just in first line) + compact lists can be formed by leaving no blank line between a definition and the next term git-svn-id: https://pandoc.googlecode.com/svn/trunk@568 788f1e2b-df1e-0410-8736-df70ead52e1b
2007-03-10Added parser for definition lists, derived from reStructuredTextfiddlosopher1-3/+32
syntax: term 1 Definition 1 Paragraph 2 of definition 1. term 2 There must be whitespace between entries. Any kind of block may serve as a definition, but the first line of each block must be indented. terms can contain any *inline* elements If you want to be lazy, you can just indent the first line of the definition block. git-svn-id: https://pandoc.googlecode.com/svn/trunk@566 788f1e2b-df1e-0410-8736-df70ead52e1b
2007-03-10Modified prettyPandoc to handle DefinitionList elements.fiddlosopher1-0/+4
git-svn-id: https://pandoc.googlecode.com/svn/trunk@565 788f1e2b-df1e-0410-8736-df70ead52e1b
2007-03-10Added definition for DefinitionList block element.fiddlosopher1-0/+3
git-svn-id: https://pandoc.googlecode.com/svn/trunk@564 788f1e2b-df1e-0410-8736-df70ead52e1b
2007-03-09Change in ordered lists in Markdown reader:fiddlosopher1-6/+12
+ Lists may begin with lowercase letters only, and only 'a' through 'n'. Otherwise first initials and page references (e.g., p. 400) are too easily parsed as lists. + Numbers beginning list items must end with '.' (not ')', which is now allowed only after letters). NOTE: This change may cause documents to be parsed differently. Users should take care in upgrading. git-svn-id: https://pandoc.googlecode.com/svn/trunk@561 788f1e2b-df1e-0410-8736-df70ead52e1b
2007-03-07More smart quote adjustments:fiddlosopher1-4/+2
+ remove support for all-caps contractions (too much potential for conflict with things like 'M. Mitterand') + add support for 'm as a contraction git-svn-id: https://pandoc.googlecode.com/svn/trunk@560 788f1e2b-df1e-0410-8736-df70ead52e1b
2007-03-07Smart quote parsing in Markdown reader:fiddlosopher1-2/+4
treat ' followed by ll, re, ve, then a non-letter, as a contraction. (e.g. I've, you're, he'll) git-svn-id: https://pandoc.googlecode.com/svn/trunk@559 788f1e2b-df1e-0410-8736-df70ead52e1b
2007-03-04Fixed bug in noscript part of email obfuscation:fiddlosopher1-1/+1
& instead of &amp; git-svn-id: https://pandoc.googlecode.com/svn/trunk@557 788f1e2b-df1e-0410-8736-df70ead52e1b
2007-03-03Made image parsing in HTML reader sensitive to thefiddlosopher1-3/+6
--inline-links option. git-svn-id: https://pandoc.googlecode.com/svn/trunk@556 788f1e2b-df1e-0410-8736-df70ead52e1b
2007-03-03Added --inline-links option to force links in HTML to be parsedfiddlosopher3-9/+23
as inline links, rather than reference links. (Addresses Issue #4.) git-svn-id: https://pandoc.googlecode.com/svn/trunk@554 788f1e2b-df1e-0410-8736-df70ead52e1b
2007-02-27Changes to test suite for new XHTML output.fiddlosopher1-8/+7
git-svn-id: https://pandoc.googlecode.com/svn/trunk@550 788f1e2b-df1e-0410-8736-df70ead52e1b
2007-02-26Modified HTML writer to use the Text.XHtml library. This resultsfiddlosopher6-180/+165
in cleaner, faster code, and it makes it easier to use Pandoc in other projects, like wikis, that use Text.XHtml. Two functions are now provided, writeHtml and writeHtmlString: the former outputs an Html structure, the latter a rendered string. The S5 writer is also changed, in parallel ways (writeS5, writeS5String). The Html header is now written programmatically, so it has been removed from the 'headers' directory. The S5 header is still needed, but the doctype and some of the meta declarations have been removed, since they are written programatically. The INSTALL file and cabalize have been updated to reflect the new dependency on the xhtml package. git-svn-id: https://pandoc.googlecode.com/svn/trunk@549 788f1e2b-df1e-0410-8736-df70ead52e1b
2007-02-21Added defaultWriterOptions to Shared.hs.fiddlosopher1-0/+15
git-svn-id: https://pandoc.googlecode.com/svn/trunk@545 788f1e2b-df1e-0410-8736-df70ead52e1b
2007-02-17In writing Markdown, print unicode nonbreaking spacefiddlosopher1-1/+8
(160) as "&nbsp;", since otherwise it is hard to distinguish from a regular space. (Addresses Issue #3.) git-svn-id: https://pandoc.googlecode.com/svn/trunk@541 788f1e2b-df1e-0410-8736-df70ead52e1b
2007-02-17Escape non-breaking space in SGML as '&nbsp;' instead offiddlosopher1-1/+2
printing a unicode non-breaking space, which is hard to distinguish visually from a regular space. (Resolves issue #3.) git-svn-id: https://pandoc.googlecode.com/svn/trunk@540 788f1e2b-df1e-0410-8736-df70ead52e1b
2007-02-15Refactored str and strong in Markdown reader, for clarity.fiddlosopher1-5/+9
git-svn-id: https://pandoc.googlecode.com/svn/trunk@539 788f1e2b-df1e-0410-8736-df70ead52e1b
2007-02-15Got rid of two unneeded 'getState's. Note thatfiddlosopher2-4/+3
lookAhead automatically saves and restores the state. git-svn-id: https://pandoc.googlecode.com/svn/trunk@538 788f1e2b-df1e-0410-8736-df70ead52e1b
2007-02-15Use lookAhead instead of getInput/setInput in RST reader.fiddlosopher1-3/+1
git-svn-id: https://pandoc.googlecode.com/svn/trunk@537 788f1e2b-df1e-0410-8736-df70ead52e1b
2007-02-15Use lookAhead parser for the "first pass" looking forfiddlosopher1-5/+3
reference keys in Markdown parser, instead of parsing normally, then using setInput to reset input. Slight performance improvement. git-svn-id: https://pandoc.googlecode.com/svn/trunk@536 788f1e2b-df1e-0410-8736-df70ead52e1b
2007-02-15Removed followedBy' parser from Text/ParserCombinators/Pandoc,fiddlosopher4-18/+8
replacing it with the 'lookAhead' parser from Text/ParserCombinators/Parsec. git-svn-id: https://pandoc.googlecode.com/svn/trunk@535 788f1e2b-df1e-0410-8736-df70ead52e1b
2007-02-14Introduced a new map, reverseEntityTable, for lookupsfiddlosopher1-6/+8
of entity by character, in Entities.hs. This yields a small performance improvement. git-svn-id: https://pandoc.googlecode.com/svn/trunk@534 788f1e2b-df1e-0410-8736-df70ead52e1b
2007-02-14Changed Entities.hs to use Data.Map rather thanfiddlosopher1-9/+7
an association list, for a slight performance boost. git-svn-id: https://pandoc.googlecode.com/svn/trunk@532 788f1e2b-df1e-0410-8736-df70ead52e1b
2007-02-14Fixed issue #8: slow performance in parsing inline literals in fiddlosopher1-0/+2
RST reader. The problem was that ``#`` was seen by 'inline' as a potential link or image. Fix: insert 'notFollowedBy (char '`')' in link parsers. git-svn-id: https://pandoc.googlecode.com/svn/trunk@529 788f1e2b-df1e-0410-8736-df70ead52e1b
2007-02-12Replaced "choice [(try (string ...), ...]" idiom withfiddlosopher1-5/+5
"oneOfStrings" in LaTeX reader. git-svn-id: https://pandoc.googlecode.com/svn/trunk@528 788f1e2b-df1e-0410-8736-df70ead52e1b
2007-02-12+ Added some needed "try"s before multicharacter parsers,fiddlosopher4-9/+9
especially in "option" contexts. + Removed the "try" from the "end" parser in "enclosed" (Text.Pandoc.Shared). Now "enclosed" behaves like "option", "manyTill", etc. git-svn-id: https://pandoc.googlecode.com/svn/trunk@527 788f1e2b-df1e-0410-8736-df70ead52e1b
2007-02-12Added 'try' in front of 'string', where needed, orfiddlosopher1-6/+8
used a different parser, in RST reader. This fixes a bug where ````` would not be correctly parsed as a verbatim `. git-svn-id: https://pandoc.googlecode.com/svn/trunk@526 788f1e2b-df1e-0410-8736-df70ead52e1b
2007-02-12Allow the URI in a RST hyperlink target to start on the linefiddlosopher1-0/+3
after the reference key. git-svn-id: https://pandoc.googlecode.com/svn/trunk@525 788f1e2b-df1e-0410-8736-df70ead52e1b
2007-01-31Use "gaps" in copyrightMessage string for cleaner code formatting.fiddlosopher1-1/+4
git-svn-id: https://pandoc.googlecode.com/svn/trunk@521 788f1e2b-df1e-0410-8736-df70ead52e1b
2007-01-28Changed 'encodeEntities' to 'escapeSGMLString'.fiddlosopher4-26/+26
git-svn-id: https://pandoc.googlecode.com/svn/trunk@520 788f1e2b-df1e-0410-8736-df70ead52e1b
2007-01-28+ Simplified entity handling by removing stringToSGML from Entities.hs.fiddlosopher6-43/+25
It is no longer needed now that all entities are processed in the markdown and HTML readers. All calls to stringToSGML have been replaced by calls to encodeEntities. + Since inTag's attribute handling already encodes entities, calls to encodeEntities are no longer needed for attribute values, so they've been removed. + The HTML and Markdown readers now call decodeEntities on all raw strings (e.g. authors, dates, link titles), to ensure that no unprocessed entities are included in the native representation of the document. (In the HTML reader, most of this work is done by a change in extractAttributeName.) + The result is a small speed improvement (around 5% on my benchmark) and cleaner code. git-svn-id: https://pandoc.googlecode.com/svn/trunk@519 788f1e2b-df1e-0410-8736-df70ead52e1b
2007-01-27Use encodeEntities rather than stringToSGML for contents offiddlosopher2-2/+2
Str inline in Docbook and HTML writers, since now these strings should not contain literal entity references. git-svn-id: https://pandoc.googlecode.com/svn/trunk@518 788f1e2b-df1e-0410-8736-df70ead52e1b