aboutsummaryrefslogtreecommitdiff
path: root/src/Text/Pandoc/Readers
AgeCommit message (Collapse)AuthorFilesLines
2011-12-26Modified str parser to capture apostrophes in smart mode.John MacFarlane1-2/+9
This solves a problem stemming from the fact that a parser doesn't know what came *before* in the input stream. Previously pandoc would parse D'oh l'*aide* as containing a single quoted "oh l", when both `'`s should be apostrophes. (Issue #360.) There are two issues here. (a) It is obvious that the first `'` is not an open quote, becaues of the preceding `D`. This patch solves the problem. (b) It is obvious to us that the second `'` is not an open quote, because we see that *aide* is some text. But getting a good algorithm that has good performance is a bit tricky. You can't assume that `'` followed by `*` is always an apostrophe: *'this is quoted'* This patch does not fix (b).
2011-12-05Markdown reader: Fixed backslash escapes in reference links.John MacFarlane1-4/+3
Closes #312.
2011-12-05Markdown: Better handling of escapes in link URLs and titles.John MacFarlane1-10/+8
2011-12-05Changes to fit new charsInBalanced.John MacFarlane2-8/+13
2011-12-05Markdown reader: internal changes.John MacFarlane1-5/+9
Refactored escapedChar into escapedChar', escapedChar.
2011-12-05Parsing: Changed type of escaped to return CharJohn MacFarlane2-2/+3
2011-11-12LaTeX reader: Don't crash on commands like `\itemsep`.John MacFarlane1-1/+2
Closes #314.
2011-11-12LaTeX reader: Ignore empty groups {}, { }.John MacFarlane1-0/+8
Closes #322.
2011-11-09Markdown citations: don't strip off initial space in locator.John MacFarlane1-1/+5
Previously `[@item1 and nowhere else]` yielded the locator ", and nowhere else", or, with the new citeproc-hs, "and nowhere else". Now it yields " and nowhere else".
2011-11-08TeXMath writer: Use unicode thin spaces for thin spaces.John MacFarlane1-1/+7
Partially resolves issue #333.
2011-11-06Markdown reader: allow punctuation only internally in cite keys.John MacFarlane1-1/+2
The characters '.',':',';','$','<','>','~','#','-','_' can be used only between two letters or digits in a citation key. This means that '@item1.' will be parsed as a citation, 'item1', followed by a period, instead of a citation 'item1.', as was the case previously. Thanks to David Sanson for alerting us to the problem.
2011-10-25HTML reader now recognizes DocBook block and inline tags.John MacFarlane1-5/+24
It was always possible to include raw DocBook tags in a markdown document, but now pandoc will be able to distinguish block from inline tags and behave accordingly. Thus, for example, <sidebar> hello </sidebar> will not be wrapped in `<para>` tags.
2011-08-23allow footnotes followed by newline without space charstakahashim1-2/+2
2011-08-01HTML reader: Fixed bug parsing tables w both thead and tbody.John MacFarlane1-0/+1
See bug #274, which was not completely fixed by the last patch.
2011-07-30Added PRAGMA needed for ghc 6.12.John MacFarlane1-0/+1
2011-07-30Removed applicative stuff in Markdown reader.John MacFarlane1-16/+16
It requires parsec 3, and currently pandoc can build with parsec 2.
2011-07-30Markdown reader: Improved emph/strong parsing.John MacFarlane1-13/+34
Ported code from pandoc2. Now all tests pass.
2011-07-23RST reader: Partial support for labeled footnotes.John MacFarlane1-7/+20
Also made simpleReferenceName parser more accurate, which affects several other parsers.
2011-07-23Properly handle characters in the 128..159 range.John MacFarlane1-2/+41
These aren't valid in HTML, but many HTML files produced by Windows tools contain them. We substitute correct unicode characters.
2011-07-21LaTeX reader: Handle \subtitle command.John MacFarlane1-1/+10
If there's a subtitle, it is added to the title, separated by a colon and linebreak. Closes #280.
2011-07-21LaTeX reader & writer: Use \and to separate authors.John MacFarlane1-2/+4
Closes #279.
2011-07-16HTML reader: treat Plain as Para when needed.John MacFarlane1-9/+12
For example, in Just a few glitches remaining. <ul><li> In this situation, one loses the list. </ul> And in this, the preformatting. <pre>Preformatted text not starting with its own blank line. </pre> Thansk to Dirk Laurie for noticing the issue.
2011-07-15HTML reader: Handle tbody, thead in simple tables.John MacFarlane1-7/+17
Closes #274.
2011-07-11Merge pull request #273 from qerub/masterJohn MacFarlane1-1/+1
Textile reader: Make it possible to have colons after links.
2011-07-10LaTeX reader: Gobble option & space after linebreak \\[10pt].John MacFarlane1-1/+5
2011-07-10Make HTML reader more forgiving of bad HTML.John MacFarlane1-4/+16
* Skip spaces after <b>, <emph>, etc. * Convert Plain elements into Para when they're in a list item with Para, Pre, BlockQuote, CodeBlock. An example of HTML that pandoc handles better now: ~~~~ <h4> Testing html to markdown </h4> <ul> <li> <b> An item in a list </b> <p> An introductory sentence. <pre> Some preformatted text at this stage comes next. But alas! much havoc is wrought by Pandoc. </pre> </ul> ~~~~ Thanks to Dirk Laurie for reporting the issues.
2011-07-10Textile reader: Make it possible to have colons after links.Christoffer Sawicki1-1/+1
2011-06-22Support \dots and well as \ldots in LaTeX reader.John MacFarlane1-2/+6
2011-05-22Forbid ()s in citation item keys.John MacFarlane1-1/+1
Resolves Issue #304: problems with (@item1; @item2) because the final paren was being parsed as part of the item key.
2011-04-20Disallow notes within notes in reST and markdown.John MacFarlane2-6/+19
These previously caused infinite looping and stack overflows. For example: [^1] [^1]: See [^1] Note references are allowed in reST notes, so this isn't a full implementation of reST. That can come later. For now we need to prevent the stack overflows. Partially resolves Issue #297.
2011-04-11Allow '|' followed by newline in RST line block.John MacFarlane1-2/+5
2011-03-18Changed uri parser so it doesn't include trailing punctuation.John MacFarlane1-1/+1
So, in RST, 'http://google.com.' should be parsed as a link to 'http://google.com' followed by a period. The parser is smart enough to recognize balanced parentheses, as often occur in wikipedia links: 'http://foo.bar/baz_(bam)'. Also added ()s to RST specialChars, so '(http://google.com)' will be parsed as a link in parens. Added test cases. Resolves Issue #291.
2011-03-12Fixed bug in RST field list parser.John MacFarlane1-7/+6
The bug affected field lists with multi-line items at the end of the list.
2011-03-02Markdown+lhs reader: Require space after inverse bird tracks.John MacFarlane1-1/+3
The point of the change is to allow html tags to be used freely at the left margin of a markdown+lhs document. Thanks to Conal Elliot for the suggestion.
2011-02-01Markdown reader: Simplified and corrected footnote block parser.John MacFarlane1-7/+10
2011-01-31Improved fix to markdown noteBlock parser.John MacFarlane1-1/+1
The last patch did not handle cases with > 4 spaces. Also added a more general test case.
2011-01-31Markdown reader: Fixed whitespace footnote bug (Jesse Rosenthal).John MacFarlane1-1/+2
The problem was in input like this: [^1]: note not in note. Also added a test case for this.
2011-01-30LaTeX reader: Fixed bug with whitespace at beginning of file.John MacFarlane1-2/+2
Previously a file beginning " hi" would cause a parse error. Also cleaned up comment parsing.
2011-01-29Markdown reader tables: Fixed bug in alignments.John MacFarlane1-4/+5
Previously pandoc got confused by blank rows in the header.
2011-01-28RST reader: skip blanklines at beginning, not all leading spaces.John MacFarlane1-1/+1
If you skip all spaces, it becomes impossible to start with a blockquote.
2011-01-28RST reader: Skip blank space at beginning.John MacFarlane1-0/+1
Resolves Debian Bug #611328.
2011-01-26Add support for attributes in inline Code.John MacFarlane6-13/+19
Additional related changes: * URLs in Code in autolinks now use class "url". * Require highlighting-kate 0.2.8.2, which omits the final <br/> tag, essential for inline code.
2011-01-26RST reader: Improved field lists.John MacFarlane1-59/+56
Field lists now work properly with block content. (Thanks to Lachlan Musicman for pointing out the bug.) In addition, definition list items are now always Para instead of Plain -- which matches behavior of rst2xml.py. Finally, in image blocks, the alt attribute is parsed properly and used for the alt, not also the title.
2011-01-26LaTeX reader: Fixed an incomplete pattern match.John MacFarlane1-1/+3
2011-01-26RST reader: Include line breaks in raw field list parser output.John MacFarlane1-1/+3
Note: field list items can have lists, etc. as values.
2011-01-26RST reader: Allow spaces in field list names.John MacFarlane1-1/+1
2011-01-26Markdown reader: Don't parse latex/context environments as inline.John MacFarlane1-9/+15
2011-01-26Distinguish latex & context environments; blank line after in writers.John MacFarlane1-3/+4
2011-01-26Bumped version to 1.8; depend on pandoc-types 1.8.John MacFarlane5-19/+22
The old TeX, HtmlInline and RawHtml elements have been removed and replaced by generic RawInline and RawBlock elements. All modules updated to use the new raw elements.
2011-01-23Textile writer: Don't HTML-escape between @'s.John MacFarlane1-1/+1