aboutsummaryrefslogtreecommitdiff
path: root/src/Text/Pandoc/Readers
AgeCommit message (Collapse)AuthorFilesLines
2012-01-19Added Docx writer.John MacFarlane1-4/+7
* New module `Text.Pandoc.Docx`. * New output format `docx`. * Added reference.docx. * New option `--reference-docx`. The writer includes support for highlighted code blocks and math (which is converted from TeX to OMML using texmath's new OMML module).
2012-01-12Added "title" to list of docbook block-level tags.John MacFarlane1-1/+1
2012-01-10Markdown reader: fixed bug in table/hrule parsing.John MacFarlane1-1/+1
Top line of table must not be followed by a blank line. This bug caused slowdown on some files with hrules and tables, and pandoc tried to interpret the hrules as the tops of multiline tables.
2012-01-08Markdown reader: Allow links in image captions.John MacFarlane1-13/+10
This change also means that [link with [link](/url)](/url) will turn into <p><a href="/url">link with link</a></p> instead of <p><a href="/url">link with [link](/url)</a></p>
2012-01-02Markdown reader: Fix parsing of consecutive lists.John MacFarlane1-10/+12
Pandoc previously behaved like Markdown.pl for consecutive lists of different styles. Thus, the following would be parsed as a single ordered list, rather than an ordered list followed by an unordered list: 1. one 2. two - one - two This patch makes pandoc behave more sensibly, parsing this as two lists. Any change in list type (ordered/unordered) or in list number style will trigger a new list. Thus, the following will also be parsed as two lists: 1. one 2. two a. one b. two Since we regard this as a bug in Markdown.pl, and not something anyone would ever rely on, we do not preserve the old behavior even when `--strict` is selected.
2012-01-01New treatment of dashes in --smart mode.John MacFarlane1-1/+2
* `---` is always em-dash, `--` is always en-dash. * pandoc no longer tries to guess when `-` should be en-dash. * A new option, `--old-dashes`, is provided for legacy documents. Rationale: The rules for en-dash are too complex and language-dependent for a guesser to work reliably. This change gives users greater control. The alternative of using unicode isn't very good, since unicode em- and en- dashes are barely distinguishable in a monospace font.
2011-12-31Support for math in RST reader and writer.John MacFarlane1-4/+5
Inline math uses the :math:`...` construct. Display math uses .. math:: ... or if multilin .. math:: ... These seem to be supported now by rst2latex.py.
2011-12-30Support Sphinx style math in RST reader.John MacFarlane1-4/+35
Inline: :math:`E=mc^2` Block: .. math: E = mc^2 .. math:: E = mc^2 a = b^2 (This latter will turn into a paragraph with two display math elements.) Closes #117.
2011-12-29Better smart quote parsing.John MacFarlane4-3/+15
* Added stateLastStrPos to ParserState. This lets us keep track of whether we're parsing the position immediately after a 'str'. If we encounter a ' in such a location, it must be an apostrophe, and can't be a single quote start. * Set this in the markdown, textile, html, and rst str parsers. * Closes #360.
2011-12-27Replaced Apostrophe, Ellipses, EmDash, EnDash w/ unicode strings.John MacFarlane2-11/+9
2011-12-27LaTeX reader: Return Str instead of Apostrophe.John MacFarlane1-1/+1
2011-12-27Markdown reader: Improved previous patch to allow unicode apostrophe.John MacFarlane1-1/+2
2011-12-26Modified str parser to capture apostrophes in smart mode.John MacFarlane1-2/+9
This solves a problem stemming from the fact that a parser doesn't know what came *before* in the input stream. Previously pandoc would parse D'oh l'*aide* as containing a single quoted "oh l", when both `'`s should be apostrophes. (Issue #360.) There are two issues here. (a) It is obvious that the first `'` is not an open quote, becaues of the preceding `D`. This patch solves the problem. (b) It is obvious to us that the second `'` is not an open quote, because we see that *aide* is some text. But getting a good algorithm that has good performance is a bit tricky. You can't assume that `'` followed by `*` is always an apostrophe: *'this is quoted'* This patch does not fix (b).
2011-12-05Markdown reader: Fixed backslash escapes in reference links.John MacFarlane1-4/+3
Closes #312.
2011-12-05Markdown: Better handling of escapes in link URLs and titles.John MacFarlane1-10/+8
2011-12-05Changes to fit new charsInBalanced.John MacFarlane2-8/+13
2011-12-05Markdown reader: internal changes.John MacFarlane1-5/+9
Refactored escapedChar into escapedChar', escapedChar.
2011-12-05Parsing: Changed type of escaped to return CharJohn MacFarlane2-2/+3
2011-11-12LaTeX reader: Don't crash on commands like `\itemsep`.John MacFarlane1-1/+2
Closes #314.
2011-11-12LaTeX reader: Ignore empty groups {}, { }.John MacFarlane1-0/+8
Closes #322.
2011-11-09Markdown citations: don't strip off initial space in locator.John MacFarlane1-1/+5
Previously `[@item1 and nowhere else]` yielded the locator ", and nowhere else", or, with the new citeproc-hs, "and nowhere else". Now it yields " and nowhere else".
2011-11-08TeXMath writer: Use unicode thin spaces for thin spaces.John MacFarlane1-1/+7
Partially resolves issue #333.
2011-11-06Markdown reader: allow punctuation only internally in cite keys.John MacFarlane1-1/+2
The characters '.',':',';','$','<','>','~','#','-','_' can be used only between two letters or digits in a citation key. This means that '@item1.' will be parsed as a citation, 'item1', followed by a period, instead of a citation 'item1.', as was the case previously. Thanks to David Sanson for alerting us to the problem.
2011-10-25HTML reader now recognizes DocBook block and inline tags.John MacFarlane1-5/+24
It was always possible to include raw DocBook tags in a markdown document, but now pandoc will be able to distinguish block from inline tags and behave accordingly. Thus, for example, <sidebar> hello </sidebar> will not be wrapped in `<para>` tags.
2011-08-23allow footnotes followed by newline without space charstakahashim1-2/+2
2011-08-01HTML reader: Fixed bug parsing tables w both thead and tbody.John MacFarlane1-0/+1
See bug #274, which was not completely fixed by the last patch.
2011-07-30Added PRAGMA needed for ghc 6.12.John MacFarlane1-0/+1
2011-07-30Removed applicative stuff in Markdown reader.John MacFarlane1-16/+16
It requires parsec 3, and currently pandoc can build with parsec 2.
2011-07-30Markdown reader: Improved emph/strong parsing.John MacFarlane1-13/+34
Ported code from pandoc2. Now all tests pass.
2011-07-23RST reader: Partial support for labeled footnotes.John MacFarlane1-7/+20
Also made simpleReferenceName parser more accurate, which affects several other parsers.
2011-07-23Properly handle characters in the 128..159 range.John MacFarlane1-2/+41
These aren't valid in HTML, but many HTML files produced by Windows tools contain them. We substitute correct unicode characters.
2011-07-21LaTeX reader: Handle \subtitle command.John MacFarlane1-1/+10
If there's a subtitle, it is added to the title, separated by a colon and linebreak. Closes #280.
2011-07-21LaTeX reader & writer: Use \and to separate authors.John MacFarlane1-2/+4
Closes #279.
2011-07-16HTML reader: treat Plain as Para when needed.John MacFarlane1-9/+12
For example, in Just a few glitches remaining. <ul><li> In this situation, one loses the list. </ul> And in this, the preformatting. <pre>Preformatted text not starting with its own blank line. </pre> Thansk to Dirk Laurie for noticing the issue.
2011-07-15HTML reader: Handle tbody, thead in simple tables.John MacFarlane1-7/+17
Closes #274.
2011-07-11Merge pull request #273 from qerub/masterJohn MacFarlane1-1/+1
Textile reader: Make it possible to have colons after links.
2011-07-10LaTeX reader: Gobble option & space after linebreak \\[10pt].John MacFarlane1-1/+5
2011-07-10Make HTML reader more forgiving of bad HTML.John MacFarlane1-4/+16
* Skip spaces after <b>, <emph>, etc. * Convert Plain elements into Para when they're in a list item with Para, Pre, BlockQuote, CodeBlock. An example of HTML that pandoc handles better now: ~~~~ <h4> Testing html to markdown </h4> <ul> <li> <b> An item in a list </b> <p> An introductory sentence. <pre> Some preformatted text at this stage comes next. But alas! much havoc is wrought by Pandoc. </pre> </ul> ~~~~ Thanks to Dirk Laurie for reporting the issues.
2011-07-10Textile reader: Make it possible to have colons after links.Christoffer Sawicki1-1/+1
2011-06-22Support \dots and well as \ldots in LaTeX reader.John MacFarlane1-2/+6
2011-05-22Forbid ()s in citation item keys.John MacFarlane1-1/+1
Resolves Issue #304: problems with (@item1; @item2) because the final paren was being parsed as part of the item key.
2011-04-20Disallow notes within notes in reST and markdown.John MacFarlane2-6/+19
These previously caused infinite looping and stack overflows. For example: [^1] [^1]: See [^1] Note references are allowed in reST notes, so this isn't a full implementation of reST. That can come later. For now we need to prevent the stack overflows. Partially resolves Issue #297.
2011-04-11Allow '|' followed by newline in RST line block.John MacFarlane1-2/+5
2011-03-18Changed uri parser so it doesn't include trailing punctuation.John MacFarlane1-1/+1
So, in RST, 'http://google.com.' should be parsed as a link to 'http://google.com' followed by a period. The parser is smart enough to recognize balanced parentheses, as often occur in wikipedia links: 'http://foo.bar/baz_(bam)'. Also added ()s to RST specialChars, so '(http://google.com)' will be parsed as a link in parens. Added test cases. Resolves Issue #291.
2011-03-12Fixed bug in RST field list parser.John MacFarlane1-7/+6
The bug affected field lists with multi-line items at the end of the list.
2011-03-02Markdown+lhs reader: Require space after inverse bird tracks.John MacFarlane1-1/+3
The point of the change is to allow html tags to be used freely at the left margin of a markdown+lhs document. Thanks to Conal Elliot for the suggestion.
2011-02-01Markdown reader: Simplified and corrected footnote block parser.John MacFarlane1-7/+10
2011-01-31Improved fix to markdown noteBlock parser.John MacFarlane1-1/+1
The last patch did not handle cases with > 4 spaces. Also added a more general test case.
2011-01-31Markdown reader: Fixed whitespace footnote bug (Jesse Rosenthal).John MacFarlane1-1/+2
The problem was in input like this: [^1]: note not in note. Also added a test case for this.
2011-01-30LaTeX reader: Fixed bug with whitespace at beginning of file.John MacFarlane1-2/+2
Previously a file beginning " hi" would cause a parse error. Also cleaned up comment parsing.