aboutsummaryrefslogtreecommitdiff
path: root/src/Text/Pandoc/Readers
AgeCommit message (Collapse)AuthorFilesLines
2010-07-05Moved parsing functions from Text.Pandoc.Shared to new module.John MacFarlane4-4/+8
+ Text.Pandoc.Parsing
2010-05-08Made KeyTable a map instead of an association list.John MacFarlane2-23/+28
* This affects the RST and Markdown readers. * The type for stateKeys in ParserState has also changed. * Pandoc, Meta, Inline, and Block have been given Ord instances. * Reference keys now have a type of their own (Key), with its own Ord instance for case-insensitive comparison.
2010-04-26Changed rawLaTeXInline to accept '\section', '\begin', etc.John MacFarlane1-3/+8
Use new rawLaTeXInline' in LaTeX reader, and export rawLaTeXInline for use in markdown reader. Fixes bug wherein '\section{foo}' was not recognized as raw TeX in markdown document.
2010-04-25Use texmath's parser in TexMath module.John MacFarlane1-197/+53
* This replaces a lot of custom parser code, and expands the tex -> unicode conversion. * The behavior has also changed: if the whole formula can't be converted, the whole formula is left in raw TeX. Previously, pandoc converted parts of the formula to unicode and left other parts in raw TeX. * Added (but not yet exported) readTeXMath', which returns a Maybe. * Updated tests
2010-04-10In parsing smart quotes, leave unicode curly quotes alone.John MacFarlane1-14/+12
Resolves Issue #143.
2010-03-23Properly escape URIs in all readers.John MacFarlane4-44/+37
2010-03-23Updated copyright notices.John MacFarlane5-10/+10
2010-03-23Fixed treatment of unicode characters in URIs.John MacFarlane1-1/+1
* Added stringToURI to Shared. This is used in the HTML writer for all URIs. It properly URI-encodes high characters (> 127), leaving everything else (including symbols and spaces) the same. * Modified unsanitaryURI to allow UTF8 characters in a URI. (First, we convert the URI to URI-encoded octets, then we pass through parseURIReference.) This resolves gitit Issue #99. Previously '[abc](http://gitit.net/测试)' would not be rendered as a link when --sanitize was selected.
2010-03-14Markdown(+lhs) reader: handle "inverse bird tracks"fiddlosopher1-7/+15
Inverse bird tracks (<) are used for haskell example code that is not part of the literate Haskell program. Resolves Issue #211. git-svn-id: https://pandoc.googlecode.com/svn/trunk@1888 788f1e2b-df1e-0410-8736-df70ead52e1b
2010-03-14LaTeX reader: ignore \section, \pdfannot, \pdfstringdef.fiddlosopher1-15/+17
Resolves Issue #202. git-svn-id: https://pandoc.googlecode.com/svn/trunk@1887 788f1e2b-df1e-0410-8736-df70ead52e1b
2010-03-14LaTeX reader: Ignore alt title in section headers.fiddlosopher1-0/+1
Partially resolves Issue #202. git-svn-id: https://pandoc.googlecode.com/svn/trunk@1886 788f1e2b-df1e-0410-8736-df70ead52e1b
2010-03-13LaTeX reader: don't treat \section as inline LaTeX.fiddlosopher1-1/+2
git-svn-id: https://pandoc.googlecode.com/svn/trunk@1885 788f1e2b-df1e-0410-8736-df70ead52e1b
2010-03-13LaTeX reader: recognize nonbreaking space ~.fiddlosopher1-1/+5
git-svn-id: https://pandoc.googlecode.com/svn/trunk@1884 788f1e2b-df1e-0410-8736-df70ead52e1b
2010-03-06Markdown reader: Added p., pp., sec., ch., as abbreviations.fiddlosopher1-1/+2
git-svn-id: https://pandoc.googlecode.com/svn/trunk@1861 788f1e2b-df1e-0410-8736-df70ead52e1b
2010-03-06Disallow blank lines in inline code span.fiddlosopher1-1/+1
Also added additional test cases for markdown code spans. git-svn-id: https://pandoc.googlecode.com/svn/trunk@1860 788f1e2b-df1e-0410-8736-df70ead52e1b
2010-03-01Markdown reader: Allow footnotes to be indented < 4 spaces.fiddlosopher1-2/+2
This fixes a regression. A test case has been added in testsuite.txt. git-svn-id: https://pandoc.googlecode.com/svn/trunk@1859 788f1e2b-df1e-0410-8736-df70ead52e1b
2010-02-28Allow multi-line titles and authors in meta block.fiddlosopher1-3/+11
Based on a patch by Justin Bogner. Titles may span multiple lines, provided continuation lines begin with a space character. Separate authors may be put on multiple lines, provided each line after the first begins with a space character. Each author must fit on one line. Multiple authors on a single line may still be separated by a semicolon. git-svn-id: https://pandoc.googlecode.com/svn/trunk@1854 788f1e2b-df1e-0410-8736-df70ead52e1b
2010-02-27RST reader: Improved grid tables.fiddlosopher1-20/+34
+ Table cells can now contain multiple block elements, such as lists or paragraphs. + Table parser is now forgiving of spaces at ends of lines. + Added test cases. git-svn-id: https://pandoc.googlecode.com/svn/trunk@1852 788f1e2b-df1e-0410-8736-df70ead52e1b
2010-02-27Markdown reader: Use simpler approach for URLs - just escape spaces.fiddlosopher1-9/+5
Markdown.pl doesn't URI-escape anything, so we won't do that either, except for spaces, which can cause problems if not escaped. Resolves Issue #220 and partially reverts r1847. git-svn-id: https://pandoc.googlecode.com/svn/trunk@1851 788f1e2b-df1e-0410-8736-df70ead52e1b
2010-02-27Markdown reader: properly escape URIs.fiddlosopher1-2/+10
+ Resolves Issue #220. + Added escapeURI function to Markdown reader. This escapes links in a way that makes sense for markdown. If they've used URI escapes like %20 in their link, these will be preserved. But if they've used a special character or space without escaping it, it will be escaped. This should make sense in most cases. + Previously pandoc collapsed adjacent spaces and replaced these sequences of spaces with + characters. That isn't correct for a URI path (+ is to be used only in the query part). We've also removed the space-collapsing behavior. git-svn-id: https://pandoc.googlecode.com/svn/trunk@1847 788f1e2b-df1e-0410-8736-df70ead52e1b
2010-02-27LaTeX reader: handle \ (interword space).fiddlosopher1-5/+9
git-svn-id: https://pandoc.googlecode.com/svn/trunk@1846 788f1e2b-df1e-0410-8736-df70ead52e1b
2010-02-26LaTeX reader: allow any special character to be escaped.fiddlosopher1-1/+1
Resolves Issue #221. git-svn-id: https://pandoc.googlecode.com/svn/trunk@1845 788f1e2b-df1e-0410-8736-df70ead52e1b
2010-02-20Incomplete support for RST tables (simple and grid).fiddlosopher1-2/+193
Thanks to Eric Kow. Note TODO for future improvement in RST reader code comments. git-svn-id: https://pandoc.googlecode.com/svn/trunk@1840 788f1e2b-df1e-0410-8736-df70ead52e1b
2010-02-12LaTeX reader: treat \paragraph and \subparagraph as level 4, 5 headers.fiddlosopher1-2/+2
Resolves Issue #207. git-svn-id: https://pandoc.googlecode.com/svn/trunk@1838 788f1e2b-df1e-0410-8736-df70ead52e1b
2010-02-12HTML reader: handle spaces before <html>.fiddlosopher1-0/+1
Resolves Issue #216. git-svn-id: https://pandoc.googlecode.com/svn/trunk@1837 788f1e2b-df1e-0410-8736-df70ead52e1b
2010-02-12HTML reader: Be forgiving in parsing a bare list within a list.fiddlosopher1-2/+6
The following is not valid xhtml, but the intent is clear: <ol> <li>one</li> <ol><li>sub</li></ol> <li>two</li> </ol> We'll treat the <ol> as if it's in a <li>. Resolves Issue #215. git-svn-id: https://pandoc.googlecode.com/svn/trunk@1836 788f1e2b-df1e-0410-8736-df70ead52e1b
2010-02-03Require two spaces after capital letter + period for list item.fiddlosopher1-2/+2
Otherwise "E. coli" starts a list. This might change the semantics of some existing documents, since previously the two-space requirement was only enforced when the second word started with a capital letter. But it is consistent with the existing documentation and follows the principle of least surprise. Resolves Issue #212. git-svn-id: https://pandoc.googlecode.com/svn/trunk@1829 788f1e2b-df1e-0410-8736-df70ead52e1b
2010-02-02Made HTML reader much more forgiving.fiddlosopher1-29/+106
+ Incorporated idea (from HXT) that an element can be closed by an open tag for another element. + Javascript is partially parsed to make sure that a <script> section is not closed by a </script> in a comment or string. + More lenient non-quoted attribute values. Now we accept anything but a space character, quote, or <>. This helps in parsing e.g. www.google.com! + Bare & signs are now parsed as a string. This is a common HTML mistake. + Skip a bare < in malformed HTML. git-svn-id: https://pandoc.googlecode.com/svn/trunk@1825 788f1e2b-df1e-0410-8736-df70ead52e1b
2009-12-31Removed redundant imports (found by ghc 6.12).fiddlosopher2-2/+2
git-svn-id: https://pandoc.googlecode.com/svn/trunk@1750 788f1e2b-df1e-0410-8736-df70ead52e1b
2009-12-31Removed unneeded LANGUAGE pragmas.fiddlosopher1-1/+0
(CPP is enabled globally in the cabal file.) git-svn-id: https://pandoc.googlecode.com/svn/trunk@1747 788f1e2b-df1e-0410-8736-df70ead52e1b
2009-12-31LaTeX reader: use \\ to separate multiple authors.fiddlosopher1-3/+3
git-svn-id: https://pandoc.googlecode.com/svn/trunk@1727 788f1e2b-df1e-0410-8736-df70ead52e1b
2009-12-31Markdown reader: use ; as separator between authors.fiddlosopher1-2/+2
This allows you to use ',' within author names: e.g. "John Jones, Jr." git-svn-id: https://pandoc.googlecode.com/svn/trunk@1726 788f1e2b-df1e-0410-8736-df70ead52e1b
2009-12-31Changed Meta author and date types to Inline lists instead of Strings.fiddlosopher4-24/+25
Meta [Inline] [[Inline]] [Inline] rather than Meta [Inline] [String] String. This is a breaking change for libraries that use pandoc and manipulate the metadata. Changed .native files in test suite for new Meta format. git-svn-id: https://pandoc.googlecode.com/svn/trunk@1699 788f1e2b-df1e-0410-8736-df70ead52e1b
2009-12-22RST reader: Allow :: before lhs code block.fiddlosopher1-0/+1
The RST spec requires the :: before verbatim blocks. This :: should not be treated as literal colons. Resolves Issue #189. git-svn-id: https://pandoc.googlecode.com/svn/trunk@1668 788f1e2b-df1e-0410-8736-df70ead52e1b
2009-12-07Improved syntax for markdown definition lists.fiddlosopher4-19/+42
Definition lists are now more compatible with PHP Markdown Extra. Resolves Issue #24. + You can have multiple definitions for a term (but still not multiple terms). + Multi-block definitions no longer need a column before each block (indeed, this will now cause multiple definitions). + The marker no longer needs to be flush with the left margin, but can be indented at or two spaces. Also, ~ as well as : can be used as the marker (this suggestion due to David Wheeler.) + There can now be a blank line between the term and the definitions. git-svn-id: https://pandoc.googlecode.com/svn/trunk@1656 788f1e2b-df1e-0410-8736-df70ead52e1b
2009-12-05Allow markdown tables without headers.fiddlosopher1-28/+52
Resolves Issue #50. The new syntax is described in README. Also allow optional line of dashes at bottom of simple tables. git-svn-id: https://pandoc.googlecode.com/svn/trunk@1652 788f1e2b-df1e-0410-8736-df70ead52e1b
2009-12-05Markdown reader: Compensate for width of final table column.fiddlosopher1-1/+11
Resolves Issue #144. git-svn-id: https://pandoc.googlecode.com/svn/trunk@1649 788f1e2b-df1e-0410-8736-df70ead52e1b
2009-12-05Markdown reader: Treat a backslash followed by a newline as hard linebreak.fiddlosopher1-4/+4
Resolves Issue #154. git-svn-id: https://pandoc.googlecode.com/svn/trunk@1646 788f1e2b-df1e-0410-8736-df70ead52e1b
2009-12-05Added "head" to list of HTML block-level tags.fiddlosopher1-1/+1
Resolves Issue #108. git-svn-id: https://pandoc.googlecode.com/svn/trunk@1645 788f1e2b-df1e-0410-8736-df70ead52e1b
2009-12-05Changed --default-code-classes -> --indented-code-classes.fiddlosopher1-1/+2
Also changed stateDefaultCodeClasses -> stateIndentedCodeClasses. git-svn-id: https://pandoc.googlecode.com/svn/trunk@1643 788f1e2b-df1e-0410-8736-df70ead52e1b
2009-12-01Added --default-code-classes option.fiddlosopher1-1/+2
This specifies classes to use for indented code blocks. Thanks to buttock for the (slightly modified) patch. Resolves Issue #87. git-svn-id: https://pandoc.googlecode.com/svn/trunk@1637 788f1e2b-df1e-0410-8736-df70ead52e1b
2009-11-29Markdown reader: treat 4 or more * or _ in a row as literal text.fiddlosopher1-0/+7
(Instead of trying to parse as strong or emph, which leads to exponential performance problems.) git-svn-id: https://pandoc.googlecode.com/svn/trunk@1634 788f1e2b-df1e-0410-8736-df70ead52e1b
2009-11-29Markdown reader: Use + rather than %20 for spaces in URLs.fiddlosopher1-2/+2
git-svn-id: https://pandoc.googlecode.com/svn/trunk@1633 788f1e2b-df1e-0410-8736-df70ead52e1b
2009-11-28Better looking simple tables. Resolves Issue #180.fiddlosopher1-1/+4
* Markdown reader: simple tables are now given column widths of 0. * Column width of 0 is interpreted as meaning: use default column width. * Writers now include explicit column width information only for multiline tables. (Exception: RTF writer, which requires column widths. In this case, columns are given equal widths, adding up to the text width.) * Simple tables should now look better in most output formats. git-svn-id: https://pandoc.googlecode.com/svn/trunk@1631 788f1e2b-df1e-0410-8736-df70ead52e1b
2009-11-28Markdown reader: parse refs and notes in the same pass.fiddlosopher1-20/+13
Previously the markdown reader made one pass for references, a second pass for notes (which it parsed and stored in the parser state), and a third pass for the rest. This patch achieves a 10% speed improvement by storing the raw notes on the first (reference) pass, then parsing them when the notes are inserted into the AST. This eliminates the need for a second pass to parse notes. git-svn-id: https://pandoc.googlecode.com/svn/trunk@1629 788f1e2b-df1e-0410-8736-df70ead52e1b
2009-11-28Added \int to characters handled as unicode in tex math.fiddlosopher1-0/+1
Resolves Issue #177. git-svn-id: https://pandoc.googlecode.com/svn/trunk@1628 788f1e2b-df1e-0410-8736-df70ead52e1b
2009-11-21Fixed htmlComment parser.fiddlosopher1-1/+1
(Added a needed try.) git-svn-id: https://pandoc.googlecode.com/svn/trunk@1621 788f1e2b-df1e-0410-8736-df70ead52e1b
2009-11-17Support for "..code-block" directive in RST reader.fiddlosopher1-0/+11
Not core RST, but used in Sphinx for code blocks annotated with syntax information. Thanks to Luke Plant for the patch. git-svn-id: https://pandoc.googlecode.com/svn/trunk@1619 788f1e2b-df1e-0410-8736-df70ead52e1b
2009-11-03Specially mark code blocks that were "literate" in the input.fiddlosopher3-3/+3
They can then be treated differently in the writers. This allows authors to distinguish bits of the literate program they are writing from source code examples, even if the examples are marked as Haskell for highlighting. Resolves Issue #174. git-svn-id: https://pandoc.googlecode.com/svn/trunk@1618 788f1e2b-df1e-0410-8736-df70ead52e1b
2009-11-01Properly handle commented-out list items in markdown.fiddlosopher2-3/+4
Example: - a <!-- - b --> - c Resolves Issue #142. git-svn-id: https://pandoc.googlecode.com/svn/trunk@1615 788f1e2b-df1e-0410-8736-df70ead52e1b