aboutsummaryrefslogtreecommitdiff
path: root/src/Text/Pandoc/Readers
AgeCommit message (Collapse)AuthorFilesLines
2010-02-12HTML reader: handle spaces before <html>.fiddlosopher1-0/+1
Resolves Issue #216. git-svn-id: https://pandoc.googlecode.com/svn/trunk@1837 788f1e2b-df1e-0410-8736-df70ead52e1b
2010-02-12HTML reader: Be forgiving in parsing a bare list within a list.fiddlosopher1-2/+6
The following is not valid xhtml, but the intent is clear: <ol> <li>one</li> <ol><li>sub</li></ol> <li>two</li> </ol> We'll treat the <ol> as if it's in a <li>. Resolves Issue #215. git-svn-id: https://pandoc.googlecode.com/svn/trunk@1836 788f1e2b-df1e-0410-8736-df70ead52e1b
2010-02-03Require two spaces after capital letter + period for list item.fiddlosopher1-2/+2
Otherwise "E. coli" starts a list. This might change the semantics of some existing documents, since previously the two-space requirement was only enforced when the second word started with a capital letter. But it is consistent with the existing documentation and follows the principle of least surprise. Resolves Issue #212. git-svn-id: https://pandoc.googlecode.com/svn/trunk@1829 788f1e2b-df1e-0410-8736-df70ead52e1b
2010-02-02Made HTML reader much more forgiving.fiddlosopher1-29/+106
+ Incorporated idea (from HXT) that an element can be closed by an open tag for another element. + Javascript is partially parsed to make sure that a <script> section is not closed by a </script> in a comment or string. + More lenient non-quoted attribute values. Now we accept anything but a space character, quote, or <>. This helps in parsing e.g. www.google.com! + Bare & signs are now parsed as a string. This is a common HTML mistake. + Skip a bare < in malformed HTML. git-svn-id: https://pandoc.googlecode.com/svn/trunk@1825 788f1e2b-df1e-0410-8736-df70ead52e1b
2009-12-31Removed redundant imports (found by ghc 6.12).fiddlosopher2-2/+2
git-svn-id: https://pandoc.googlecode.com/svn/trunk@1750 788f1e2b-df1e-0410-8736-df70ead52e1b
2009-12-31Removed unneeded LANGUAGE pragmas.fiddlosopher1-1/+0
(CPP is enabled globally in the cabal file.) git-svn-id: https://pandoc.googlecode.com/svn/trunk@1747 788f1e2b-df1e-0410-8736-df70ead52e1b
2009-12-31LaTeX reader: use \\ to separate multiple authors.fiddlosopher1-3/+3
git-svn-id: https://pandoc.googlecode.com/svn/trunk@1727 788f1e2b-df1e-0410-8736-df70ead52e1b
2009-12-31Markdown reader: use ; as separator between authors.fiddlosopher1-2/+2
This allows you to use ',' within author names: e.g. "John Jones, Jr." git-svn-id: https://pandoc.googlecode.com/svn/trunk@1726 788f1e2b-df1e-0410-8736-df70ead52e1b
2009-12-31Changed Meta author and date types to Inline lists instead of Strings.fiddlosopher4-24/+25
Meta [Inline] [[Inline]] [Inline] rather than Meta [Inline] [String] String. This is a breaking change for libraries that use pandoc and manipulate the metadata. Changed .native files in test suite for new Meta format. git-svn-id: https://pandoc.googlecode.com/svn/trunk@1699 788f1e2b-df1e-0410-8736-df70ead52e1b
2009-12-22RST reader: Allow :: before lhs code block.fiddlosopher1-0/+1
The RST spec requires the :: before verbatim blocks. This :: should not be treated as literal colons. Resolves Issue #189. git-svn-id: https://pandoc.googlecode.com/svn/trunk@1668 788f1e2b-df1e-0410-8736-df70ead52e1b
2009-12-07Improved syntax for markdown definition lists.fiddlosopher4-19/+42
Definition lists are now more compatible with PHP Markdown Extra. Resolves Issue #24. + You can have multiple definitions for a term (but still not multiple terms). + Multi-block definitions no longer need a column before each block (indeed, this will now cause multiple definitions). + The marker no longer needs to be flush with the left margin, but can be indented at or two spaces. Also, ~ as well as : can be used as the marker (this suggestion due to David Wheeler.) + There can now be a blank line between the term and the definitions. git-svn-id: https://pandoc.googlecode.com/svn/trunk@1656 788f1e2b-df1e-0410-8736-df70ead52e1b
2009-12-05Allow markdown tables without headers.fiddlosopher1-28/+52
Resolves Issue #50. The new syntax is described in README. Also allow optional line of dashes at bottom of simple tables. git-svn-id: https://pandoc.googlecode.com/svn/trunk@1652 788f1e2b-df1e-0410-8736-df70ead52e1b
2009-12-05Markdown reader: Compensate for width of final table column.fiddlosopher1-1/+11
Resolves Issue #144. git-svn-id: https://pandoc.googlecode.com/svn/trunk@1649 788f1e2b-df1e-0410-8736-df70ead52e1b
2009-12-05Markdown reader: Treat a backslash followed by a newline as hard linebreak.fiddlosopher1-4/+4
Resolves Issue #154. git-svn-id: https://pandoc.googlecode.com/svn/trunk@1646 788f1e2b-df1e-0410-8736-df70ead52e1b
2009-12-05Added "head" to list of HTML block-level tags.fiddlosopher1-1/+1
Resolves Issue #108. git-svn-id: https://pandoc.googlecode.com/svn/trunk@1645 788f1e2b-df1e-0410-8736-df70ead52e1b
2009-12-05Changed --default-code-classes -> --indented-code-classes.fiddlosopher1-1/+2
Also changed stateDefaultCodeClasses -> stateIndentedCodeClasses. git-svn-id: https://pandoc.googlecode.com/svn/trunk@1643 788f1e2b-df1e-0410-8736-df70ead52e1b
2009-12-01Added --default-code-classes option.fiddlosopher1-1/+2
This specifies classes to use for indented code blocks. Thanks to buttock for the (slightly modified) patch. Resolves Issue #87. git-svn-id: https://pandoc.googlecode.com/svn/trunk@1637 788f1e2b-df1e-0410-8736-df70ead52e1b
2009-11-29Markdown reader: treat 4 or more * or _ in a row as literal text.fiddlosopher1-0/+7
(Instead of trying to parse as strong or emph, which leads to exponential performance problems.) git-svn-id: https://pandoc.googlecode.com/svn/trunk@1634 788f1e2b-df1e-0410-8736-df70ead52e1b
2009-11-29Markdown reader: Use + rather than %20 for spaces in URLs.fiddlosopher1-2/+2
git-svn-id: https://pandoc.googlecode.com/svn/trunk@1633 788f1e2b-df1e-0410-8736-df70ead52e1b
2009-11-28Better looking simple tables. Resolves Issue #180.fiddlosopher1-1/+4
* Markdown reader: simple tables are now given column widths of 0. * Column width of 0 is interpreted as meaning: use default column width. * Writers now include explicit column width information only for multiline tables. (Exception: RTF writer, which requires column widths. In this case, columns are given equal widths, adding up to the text width.) * Simple tables should now look better in most output formats. git-svn-id: https://pandoc.googlecode.com/svn/trunk@1631 788f1e2b-df1e-0410-8736-df70ead52e1b
2009-11-28Markdown reader: parse refs and notes in the same pass.fiddlosopher1-20/+13
Previously the markdown reader made one pass for references, a second pass for notes (which it parsed and stored in the parser state), and a third pass for the rest. This patch achieves a 10% speed improvement by storing the raw notes on the first (reference) pass, then parsing them when the notes are inserted into the AST. This eliminates the need for a second pass to parse notes. git-svn-id: https://pandoc.googlecode.com/svn/trunk@1629 788f1e2b-df1e-0410-8736-df70ead52e1b
2009-11-28Added \int to characters handled as unicode in tex math.fiddlosopher1-0/+1
Resolves Issue #177. git-svn-id: https://pandoc.googlecode.com/svn/trunk@1628 788f1e2b-df1e-0410-8736-df70ead52e1b
2009-11-21Fixed htmlComment parser.fiddlosopher1-1/+1
(Added a needed try.) git-svn-id: https://pandoc.googlecode.com/svn/trunk@1621 788f1e2b-df1e-0410-8736-df70ead52e1b
2009-11-17Support for "..code-block" directive in RST reader.fiddlosopher1-0/+11
Not core RST, but used in Sphinx for code blocks annotated with syntax information. Thanks to Luke Plant for the patch. git-svn-id: https://pandoc.googlecode.com/svn/trunk@1619 788f1e2b-df1e-0410-8736-df70ead52e1b
2009-11-03Specially mark code blocks that were "literate" in the input.fiddlosopher3-3/+3
They can then be treated differently in the writers. This allows authors to distinguish bits of the literate program they are writing from source code examples, even if the examples are marked as Haskell for highlighting. Resolves Issue #174. git-svn-id: https://pandoc.googlecode.com/svn/trunk@1618 788f1e2b-df1e-0410-8736-df70ead52e1b
2009-11-01Properly handle commented-out list items in markdown.fiddlosopher2-3/+4
Example: - a <!-- - b --> - c Resolves Issue #142. git-svn-id: https://pandoc.googlecode.com/svn/trunk@1615 788f1e2b-df1e-0410-8736-df70ead52e1b
2009-10-29Added % as an rst underline character.fiddlosopher1-1/+1
Resolves Issue #173. git-svn-id: https://pandoc.googlecode.com/svn/trunk@1612 788f1e2b-df1e-0410-8736-df70ead52e1b
2009-10-12Fix inline math parser so that \$ is allowed in math.fiddlosopher1-3/+8
Resolves Issue #169. git-svn-id: https://pandoc.googlecode.com/svn/trunk@1609 788f1e2b-df1e-0410-8736-df70ead52e1b
2009-10-04Added haddock comments warning that readers assume \n line endings.fiddlosopher5-5/+10
git-svn-id: https://pandoc.googlecode.com/svn/trunk@1608 788f1e2b-df1e-0410-8736-df70ead52e1b
2009-07-21Fixed bug in HTML comment parser.fiddlosopher1-2/+2
Resolves Issue #157. ('try' in the wrong place.) git-svn-id: https://pandoc.googlecode.com/svn/trunk@1605 788f1e2b-df1e-0410-8736-df70ead52e1b
2009-07-11Improved clarity of titleTransform in RST reader.fiddlosopher1-8/+6
git-svn-id: https://pandoc.googlecode.com/svn/trunk@1592 788f1e2b-df1e-0410-8736-df70ead52e1b
2009-07-03Allow -, _, :, . in markdown attribute names.fiddlosopher1-1/+1
These are legal in XML attribute names. git-svn-id: https://pandoc.googlecode.com/svn/trunk@1586 788f1e2b-df1e-0410-8736-df70ead52e1b
2009-06-25Allow continuation lines in line blocks.fiddlosopher1-4/+5
Also added test cases for line blocks for RST reader. Resolves Issue #149. git-svn-id: https://pandoc.googlecode.com/svn/trunk@1583 788f1e2b-df1e-0410-8736-df70ead52e1b
2009-06-03Improved LaTeX reader's coverage of math modes.fiddlosopher1-3/+7
Remove displaymath* (which is not in LaTeX) and recognize all the amsmath environments that are alternatives to eqnarray, namely equation, equation*, gather, gather*, gathered, multline, multline*, align, align*, alignat, alignat*, aligned, alignedat, split Resolves Issue #103. Thanks to shreevatsa.public for the patch. git-svn-id: https://pandoc.googlecode.com/svn/trunk@1577 788f1e2b-df1e-0410-8736-df70ead52e1b
2009-05-04RST reader: Allow explicit links with spaces in URL: `link <to this>`_fiddlosopher1-1/+1
git-svn-id: https://pandoc.googlecode.com/svn/trunk@1576 788f1e2b-df1e-0410-8736-df70ead52e1b
2009-05-01Markdown reader: change ' ' to '\160' in abbreviations.fiddlosopher1-1/+2
git-svn-id: https://pandoc.googlecode.com/svn/trunk@1571 788f1e2b-df1e-0410-8736-df70ead52e1b
2009-04-30Markdown reader: improved efficiency of abbreviation parsing.fiddlosopher1-29/+26
Instead of a separate abbrev parser, we just check for abbreviations each time we parse a string. This gives a huge performance boost with -S. Resolves Issue #141. git-svn-id: https://pandoc.googlecode.com/svn/trunk@1570 788f1e2b-df1e-0410-8736-df70ead52e1b
2009-04-29Made htmlComment parser more efficient.fiddlosopher1-1/+3
git-svn-id: https://pandoc.googlecode.com/svn/trunk@1567 788f1e2b-df1e-0410-8736-df70ead52e1b
2009-04-29Improved efficiency of whitespace parser.fiddlosopher1-13/+8
git-svn-id: https://pandoc.googlecode.com/svn/trunk@1565 788f1e2b-df1e-0410-8736-df70ead52e1b
2009-04-29Use more efficient skipNonindentSpaces instead of nonidentSpaces where possible.fiddlosopher1-7/+16
git-svn-id: https://pandoc.googlecode.com/svn/trunk@1564 788f1e2b-df1e-0410-8736-df70ead52e1b
2009-04-29Took out unneeded 'try' in indentSpaces parser.fiddlosopher1-2/+2
git-svn-id: https://pandoc.googlecode.com/svn/trunk@1563 788f1e2b-df1e-0410-8736-df70ead52e1b
2009-01-31Gobble space after Plain blocks containing only raw html inline.fiddlosopher1-1/+1
Otherwise following header blocks are not parsed correctly, since the parser sees blank space before them. Resolves Issue #124. git-svn-id: https://pandoc.googlecode.com/svn/trunk@1534 788f1e2b-df1e-0410-8736-df70ead52e1b
2009-01-24Moved all haskell source to src subdirectory.fiddlosopher5-0/+3632
git-svn-id: https://pandoc.googlecode.com/svn/trunk@1528 788f1e2b-df1e-0410-8736-df70ead52e1b
2007-11-29Moved everything from src into the top-level directory.fiddlosopher4-2704/+0
git-svn-id: https://pandoc.googlecode.com/svn/trunk@1104 788f1e2b-df1e-0410-8736-df70ead52e1b
2007-11-23Reverted changes in r1086 (implicit section header references).fiddlosopher1-93/+74
This caused too much of a performance hit. git-svn-id: https://pandoc.googlecode.com/svn/trunk@1093 788f1e2b-df1e-0410-8736-df70ead52e1b
2007-11-22Improved and simplified setextHeader parser in markdown reader.fiddlosopher1-8/+8
git-svn-id: https://pandoc.googlecode.com/svn/trunk@1092 788f1e2b-df1e-0410-8736-df70ead52e1b
2007-11-22Implemented implicit reference-style links to section headers in markdown.fiddlosopher1-75/+94
For example, if you have a header '# Supported architectures', you can link to it with '[Supported architectures]'. If there are multiple headers with this label, the link will point to the first of them. Implicit references are always overridden by explicitly specified references. Addresses Issue #20. + Moved isPunctuation, uniqueIdentifiers, and inlineListToIdentifier from Text.Pandoc.Writers.HTML to Text.Pandoc.Shared. + Added stHeaders to ParserState. This holds a list of header texts used in the document, and is used to construct implicit header references. + In Text.Pandoc.Readers.Markdown, added call to headerReference parser in initial parsing pass, constructing a list of section header labels. This is then passed to uniqueIdentifiers to produce identifiers, and a list of implicit references is constructed. This is added to the end of the explicitly specified references, so it will be overridden by explicitly specified references. All of this processing is skipped if --strict was specified. + Modified documentation in README. git-svn-id: https://pandoc.googlecode.com/svn/trunk@1086 788f1e2b-df1e-0410-8736-df70ead52e1b
2007-11-15Fixed logic in markdown smart quote parsing:fiddlosopher1-11/+14
+ Added some needed 'try' statements. + Unicode right single-quote can double as apostrophe, so treat it as a quote-end only when not followed by an alphanumeric character. git-svn-id: https://pandoc.googlecode.com/svn/trunk@1077 788f1e2b-df1e-0410-8736-df70ead52e1b
2007-11-15Fixed smart quote parsing in markdown reader so that unicodefiddlosopher1-6/+10
characters 8216 and 8217 are recognized as single quotes, and 8220 and 8221 as double quotes. git-svn-id: https://pandoc.googlecode.com/svn/trunk@1075 788f1e2b-df1e-0410-8736-df70ead52e1b
2007-11-10Fixed bug in LaTeX reader (pointed out by Mark Eli Kalderon):fiddlosopher1-1/+2
needed a "try" before "string" in parser for \[ math blocks. git-svn-id: https://pandoc.googlecode.com/svn/trunk@1068 788f1e2b-df1e-0410-8736-df70ead52e1b