aboutsummaryrefslogtreecommitdiff
path: root/src/Text/Pandoc/Readers
AgeCommit message (Collapse)AuthorFilesLines
2010-07-06Minor comment change.John MacFarlane2-2/+1
2010-07-06Allow language-neutral table captions.John MacFarlane1-1/+1
+ Captions may now begin simply with ':', instead of 'Table:' + Captions may now appear either above or below the table. + Resolves Issue #227.
2010-07-05More refactoring of grid table code.John MacFarlane2-99/+25
2010-07-05Moved generic grid table functions from RST reader -> Parsing.John MacFarlane1-73/+6
Here they can be used by the Markdown reader as well.
2010-07-05Moved parsing functions from Text.Pandoc.Shared to new module.John MacFarlane4-4/+8
+ Text.Pandoc.Parsing
2010-05-08Made KeyTable a map instead of an association list.John MacFarlane2-23/+28
* This affects the RST and Markdown readers. * The type for stateKeys in ParserState has also changed. * Pandoc, Meta, Inline, and Block have been given Ord instances. * Reference keys now have a type of their own (Key), with its own Ord instance for case-insensitive comparison.
2010-04-26Changed rawLaTeXInline to accept '\section', '\begin', etc.John MacFarlane1-3/+8
Use new rawLaTeXInline' in LaTeX reader, and export rawLaTeXInline for use in markdown reader. Fixes bug wherein '\section{foo}' was not recognized as raw TeX in markdown document.
2010-04-25Use texmath's parser in TexMath module.John MacFarlane1-197/+53
* This replaces a lot of custom parser code, and expands the tex -> unicode conversion. * The behavior has also changed: if the whole formula can't be converted, the whole formula is left in raw TeX. Previously, pandoc converted parts of the formula to unicode and left other parts in raw TeX. * Added (but not yet exported) readTeXMath', which returns a Maybe. * Updated tests
2010-04-10In parsing smart quotes, leave unicode curly quotes alone.John MacFarlane1-14/+12
Resolves Issue #143.
2010-03-23Properly escape URIs in all readers.John MacFarlane4-44/+37
2010-03-23Updated copyright notices.John MacFarlane5-10/+10
2010-03-23Fixed treatment of unicode characters in URIs.John MacFarlane1-1/+1
* Added stringToURI to Shared. This is used in the HTML writer for all URIs. It properly URI-encodes high characters (> 127), leaving everything else (including symbols and spaces) the same. * Modified unsanitaryURI to allow UTF8 characters in a URI. (First, we convert the URI to URI-encoded octets, then we pass through parseURIReference.) This resolves gitit Issue #99. Previously '[abc](http://gitit.net/测试)' would not be rendered as a link when --sanitize was selected.
2010-03-14Markdown(+lhs) reader: handle "inverse bird tracks"fiddlosopher1-7/+15
Inverse bird tracks (<) are used for haskell example code that is not part of the literate Haskell program. Resolves Issue #211. git-svn-id: https://pandoc.googlecode.com/svn/trunk@1888 788f1e2b-df1e-0410-8736-df70ead52e1b
2010-03-14LaTeX reader: ignore \section, \pdfannot, \pdfstringdef.fiddlosopher1-15/+17
Resolves Issue #202. git-svn-id: https://pandoc.googlecode.com/svn/trunk@1887 788f1e2b-df1e-0410-8736-df70ead52e1b
2010-03-14LaTeX reader: Ignore alt title in section headers.fiddlosopher1-0/+1
Partially resolves Issue #202. git-svn-id: https://pandoc.googlecode.com/svn/trunk@1886 788f1e2b-df1e-0410-8736-df70ead52e1b
2010-03-13LaTeX reader: don't treat \section as inline LaTeX.fiddlosopher1-1/+2
git-svn-id: https://pandoc.googlecode.com/svn/trunk@1885 788f1e2b-df1e-0410-8736-df70ead52e1b
2010-03-13LaTeX reader: recognize nonbreaking space ~.fiddlosopher1-1/+5
git-svn-id: https://pandoc.googlecode.com/svn/trunk@1884 788f1e2b-df1e-0410-8736-df70ead52e1b
2010-03-06Markdown reader: Added p., pp., sec., ch., as abbreviations.fiddlosopher1-1/+2
git-svn-id: https://pandoc.googlecode.com/svn/trunk@1861 788f1e2b-df1e-0410-8736-df70ead52e1b
2010-03-06Disallow blank lines in inline code span.fiddlosopher1-1/+1
Also added additional test cases for markdown code spans. git-svn-id: https://pandoc.googlecode.com/svn/trunk@1860 788f1e2b-df1e-0410-8736-df70ead52e1b
2010-03-01Markdown reader: Allow footnotes to be indented < 4 spaces.fiddlosopher1-2/+2
This fixes a regression. A test case has been added in testsuite.txt. git-svn-id: https://pandoc.googlecode.com/svn/trunk@1859 788f1e2b-df1e-0410-8736-df70ead52e1b
2010-02-28Allow multi-line titles and authors in meta block.fiddlosopher1-3/+11
Based on a patch by Justin Bogner. Titles may span multiple lines, provided continuation lines begin with a space character. Separate authors may be put on multiple lines, provided each line after the first begins with a space character. Each author must fit on one line. Multiple authors on a single line may still be separated by a semicolon. git-svn-id: https://pandoc.googlecode.com/svn/trunk@1854 788f1e2b-df1e-0410-8736-df70ead52e1b
2010-02-27RST reader: Improved grid tables.fiddlosopher1-20/+34
+ Table cells can now contain multiple block elements, such as lists or paragraphs. + Table parser is now forgiving of spaces at ends of lines. + Added test cases. git-svn-id: https://pandoc.googlecode.com/svn/trunk@1852 788f1e2b-df1e-0410-8736-df70ead52e1b
2010-02-27Markdown reader: Use simpler approach for URLs - just escape spaces.fiddlosopher1-9/+5
Markdown.pl doesn't URI-escape anything, so we won't do that either, except for spaces, which can cause problems if not escaped. Resolves Issue #220 and partially reverts r1847. git-svn-id: https://pandoc.googlecode.com/svn/trunk@1851 788f1e2b-df1e-0410-8736-df70ead52e1b
2010-02-27Markdown reader: properly escape URIs.fiddlosopher1-2/+10
+ Resolves Issue #220. + Added escapeURI function to Markdown reader. This escapes links in a way that makes sense for markdown. If they've used URI escapes like %20 in their link, these will be preserved. But if they've used a special character or space without escaping it, it will be escaped. This should make sense in most cases. + Previously pandoc collapsed adjacent spaces and replaced these sequences of spaces with + characters. That isn't correct for a URI path (+ is to be used only in the query part). We've also removed the space-collapsing behavior. git-svn-id: https://pandoc.googlecode.com/svn/trunk@1847 788f1e2b-df1e-0410-8736-df70ead52e1b
2010-02-27LaTeX reader: handle \ (interword space).fiddlosopher1-5/+9
git-svn-id: https://pandoc.googlecode.com/svn/trunk@1846 788f1e2b-df1e-0410-8736-df70ead52e1b
2010-02-26LaTeX reader: allow any special character to be escaped.fiddlosopher1-1/+1
Resolves Issue #221. git-svn-id: https://pandoc.googlecode.com/svn/trunk@1845 788f1e2b-df1e-0410-8736-df70ead52e1b
2010-02-20Incomplete support for RST tables (simple and grid).fiddlosopher1-2/+193
Thanks to Eric Kow. Note TODO for future improvement in RST reader code comments. git-svn-id: https://pandoc.googlecode.com/svn/trunk@1840 788f1e2b-df1e-0410-8736-df70ead52e1b
2010-02-12LaTeX reader: treat \paragraph and \subparagraph as level 4, 5 headers.fiddlosopher1-2/+2
Resolves Issue #207. git-svn-id: https://pandoc.googlecode.com/svn/trunk@1838 788f1e2b-df1e-0410-8736-df70ead52e1b
2010-02-12HTML reader: handle spaces before <html>.fiddlosopher1-0/+1
Resolves Issue #216. git-svn-id: https://pandoc.googlecode.com/svn/trunk@1837 788f1e2b-df1e-0410-8736-df70ead52e1b
2010-02-12HTML reader: Be forgiving in parsing a bare list within a list.fiddlosopher1-2/+6
The following is not valid xhtml, but the intent is clear: <ol> <li>one</li> <ol><li>sub</li></ol> <li>two</li> </ol> We'll treat the <ol> as if it's in a <li>. Resolves Issue #215. git-svn-id: https://pandoc.googlecode.com/svn/trunk@1836 788f1e2b-df1e-0410-8736-df70ead52e1b
2010-02-03Require two spaces after capital letter + period for list item.fiddlosopher1-2/+2
Otherwise "E. coli" starts a list. This might change the semantics of some existing documents, since previously the two-space requirement was only enforced when the second word started with a capital letter. But it is consistent with the existing documentation and follows the principle of least surprise. Resolves Issue #212. git-svn-id: https://pandoc.googlecode.com/svn/trunk@1829 788f1e2b-df1e-0410-8736-df70ead52e1b
2010-02-02Made HTML reader much more forgiving.fiddlosopher1-29/+106
+ Incorporated idea (from HXT) that an element can be closed by an open tag for another element. + Javascript is partially parsed to make sure that a <script> section is not closed by a </script> in a comment or string. + More lenient non-quoted attribute values. Now we accept anything but a space character, quote, or <>. This helps in parsing e.g. www.google.com! + Bare & signs are now parsed as a string. This is a common HTML mistake. + Skip a bare < in malformed HTML. git-svn-id: https://pandoc.googlecode.com/svn/trunk@1825 788f1e2b-df1e-0410-8736-df70ead52e1b
2009-12-31Removed redundant imports (found by ghc 6.12).fiddlosopher2-2/+2
git-svn-id: https://pandoc.googlecode.com/svn/trunk@1750 788f1e2b-df1e-0410-8736-df70ead52e1b
2009-12-31Removed unneeded LANGUAGE pragmas.fiddlosopher1-1/+0
(CPP is enabled globally in the cabal file.) git-svn-id: https://pandoc.googlecode.com/svn/trunk@1747 788f1e2b-df1e-0410-8736-df70ead52e1b
2009-12-31LaTeX reader: use \\ to separate multiple authors.fiddlosopher1-3/+3
git-svn-id: https://pandoc.googlecode.com/svn/trunk@1727 788f1e2b-df1e-0410-8736-df70ead52e1b
2009-12-31Markdown reader: use ; as separator between authors.fiddlosopher1-2/+2
This allows you to use ',' within author names: e.g. "John Jones, Jr." git-svn-id: https://pandoc.googlecode.com/svn/trunk@1726 788f1e2b-df1e-0410-8736-df70ead52e1b
2009-12-31Changed Meta author and date types to Inline lists instead of Strings.fiddlosopher4-24/+25
Meta [Inline] [[Inline]] [Inline] rather than Meta [Inline] [String] String. This is a breaking change for libraries that use pandoc and manipulate the metadata. Changed .native files in test suite for new Meta format. git-svn-id: https://pandoc.googlecode.com/svn/trunk@1699 788f1e2b-df1e-0410-8736-df70ead52e1b
2009-12-22RST reader: Allow :: before lhs code block.fiddlosopher1-0/+1
The RST spec requires the :: before verbatim blocks. This :: should not be treated as literal colons. Resolves Issue #189. git-svn-id: https://pandoc.googlecode.com/svn/trunk@1668 788f1e2b-df1e-0410-8736-df70ead52e1b
2009-12-07Improved syntax for markdown definition lists.fiddlosopher4-19/+42
Definition lists are now more compatible with PHP Markdown Extra. Resolves Issue #24. + You can have multiple definitions for a term (but still not multiple terms). + Multi-block definitions no longer need a column before each block (indeed, this will now cause multiple definitions). + The marker no longer needs to be flush with the left margin, but can be indented at or two spaces. Also, ~ as well as : can be used as the marker (this suggestion due to David Wheeler.) + There can now be a blank line between the term and the definitions. git-svn-id: https://pandoc.googlecode.com/svn/trunk@1656 788f1e2b-df1e-0410-8736-df70ead52e1b
2009-12-05Allow markdown tables without headers.fiddlosopher1-28/+52
Resolves Issue #50. The new syntax is described in README. Also allow optional line of dashes at bottom of simple tables. git-svn-id: https://pandoc.googlecode.com/svn/trunk@1652 788f1e2b-df1e-0410-8736-df70ead52e1b
2009-12-05Markdown reader: Compensate for width of final table column.fiddlosopher1-1/+11
Resolves Issue #144. git-svn-id: https://pandoc.googlecode.com/svn/trunk@1649 788f1e2b-df1e-0410-8736-df70ead52e1b
2009-12-05Markdown reader: Treat a backslash followed by a newline as hard linebreak.fiddlosopher1-4/+4
Resolves Issue #154. git-svn-id: https://pandoc.googlecode.com/svn/trunk@1646 788f1e2b-df1e-0410-8736-df70ead52e1b
2009-12-05Added "head" to list of HTML block-level tags.fiddlosopher1-1/+1
Resolves Issue #108. git-svn-id: https://pandoc.googlecode.com/svn/trunk@1645 788f1e2b-df1e-0410-8736-df70ead52e1b
2009-12-05Changed --default-code-classes -> --indented-code-classes.fiddlosopher1-1/+2
Also changed stateDefaultCodeClasses -> stateIndentedCodeClasses. git-svn-id: https://pandoc.googlecode.com/svn/trunk@1643 788f1e2b-df1e-0410-8736-df70ead52e1b
2009-12-01Added --default-code-classes option.fiddlosopher1-1/+2
This specifies classes to use for indented code blocks. Thanks to buttock for the (slightly modified) patch. Resolves Issue #87. git-svn-id: https://pandoc.googlecode.com/svn/trunk@1637 788f1e2b-df1e-0410-8736-df70ead52e1b
2009-11-29Markdown reader: treat 4 or more * or _ in a row as literal text.fiddlosopher1-0/+7
(Instead of trying to parse as strong or emph, which leads to exponential performance problems.) git-svn-id: https://pandoc.googlecode.com/svn/trunk@1634 788f1e2b-df1e-0410-8736-df70ead52e1b
2009-11-29Markdown reader: Use + rather than %20 for spaces in URLs.fiddlosopher1-2/+2
git-svn-id: https://pandoc.googlecode.com/svn/trunk@1633 788f1e2b-df1e-0410-8736-df70ead52e1b
2009-11-28Better looking simple tables. Resolves Issue #180.fiddlosopher1-1/+4
* Markdown reader: simple tables are now given column widths of 0. * Column width of 0 is interpreted as meaning: use default column width. * Writers now include explicit column width information only for multiline tables. (Exception: RTF writer, which requires column widths. In this case, columns are given equal widths, adding up to the text width.) * Simple tables should now look better in most output formats. git-svn-id: https://pandoc.googlecode.com/svn/trunk@1631 788f1e2b-df1e-0410-8736-df70ead52e1b
2009-11-28Markdown reader: parse refs and notes in the same pass.fiddlosopher1-20/+13
Previously the markdown reader made one pass for references, a second pass for notes (which it parsed and stored in the parser state), and a third pass for the rest. This patch achieves a 10% speed improvement by storing the raw notes on the first (reference) pass, then parsing them when the notes are inserted into the AST. This eliminates the need for a second pass to parse notes. git-svn-id: https://pandoc.googlecode.com/svn/trunk@1629 788f1e2b-df1e-0410-8736-df70ead52e1b
2009-11-28Added \int to characters handled as unicode in tex math.fiddlosopher1-0/+1
Resolves Issue #177. git-svn-id: https://pandoc.googlecode.com/svn/trunk@1628 788f1e2b-df1e-0410-8736-df70ead52e1b