aboutsummaryrefslogtreecommitdiff
path: root/src/Text/Pandoc/Readers/Markdown.hs
AgeCommit message (Collapse)AuthorFilesLines
2012-07-20Use Text.Parsec instead of Text.ParserCombinators.Parsec.John MacFarlane1-137/+137
2012-06-04Markdown reader: Added cf. and cp. to list of likely abbreviations.John MacFarlane1-1/+1
2012-05-08Treat four or more `~` or `^` in an inline context as regular text.John MacFarlane1-3/+3
This avoids exponential parsing blowups with long strings of these characters. Closes #507.
2012-04-13Markdown reader: Allow lists as list items.John MacFarlane1-6/+8
So, for example: 1. * x * y 2. * z * w
2012-04-12Markdown: don't recognize references inside delimited code blocks.John MacFarlane1-0/+1
Previously pandoc would produce incorrect results on this: ~~~ [not a link]: /url ~~~ [not a link] because it would recognize "not a link" as a reference link definition on the first pass. This fix causes the first pass to skip delimited code blocks.
2012-02-21Added support for markdown-extra tables in the markdown parserFrançois Gannaz1-0/+5
Only tables whose lines begin with a "|" are supported. There are 2 warnings about unused variables when compiling.
2012-02-08Improvements to markdown attributes syntax (on code blocks).John MacFarlane1-4/+5
(1) Attributes can contain line breaks. (2) Values in key-value attributes can be surrounded by either double or single quotes, or left unquoted if they contain no spaces.
2012-02-07Limit nesting of strong/emph.John MacFarlane1-2/+14
This avoids exponential lookahead in parasitic cases, like a**a*a**a*a**a*a**a*a**a*a**a*a**a*a**. Added stateMaxNestingLevel to ParserState. We set this to 6, so you can still have Emph inside Emph, just not indefinitely.
2012-02-05Removed module Text.Pandoc.CharacterReferences.John MacFarlane1-3/+3
Moved characterReference parser to Text.Pandoc.Parsing. decodeCharacterReferences is now replaced by fromEntities in Text.Pandoc.XML.
2012-02-04Complete rewrite of LaTeX reader.John MacFarlane1-12/+9
* The new reader is more robust, accurate, and extensible. It is still quite incomplete, but it should be easier now to add features. * Text.Pandoc.Parsing: Added withRaw combinator. * Markdown reader: do escapedChar before raw latex inline. Otherwise we capture commands like \{. * Fixed latex citation tests for new citeproc. * Handle \include{} commands in latex. This is done in pandoc.hs, not the (pure) latex reader. But the reader exports the needed function, handleIncludes. * Moved err and warn from pandoc.hs to Shared. * Fixed tests - raw tex should sometimes have trailing space. * Updated lhs-test for highlighting-kate changes.
2012-01-28Removed an unnecessary `many spaceChar`.John MacFarlane1-1/+1
2012-01-28Markdown reader: Fixed bug in code block attribute parser.John MacFarlane1-3/+4
Previously the ID attribute got lost if it didn't come first. Now attributes can come in any order.
2012-01-28Support github syntax for fenced code blocks.John MacFarlane1-10/+14
You can now write ```ruby x = 2 ``` instead of ~~~ {.ruby} x = 2 ~~~~
2012-01-27Fixed table parsing with wide or combining characters.John MacFarlane1-4/+4
Closes #348. Closes #108.
2012-01-10Markdown reader: fixed bug in table/hrule parsing.John MacFarlane1-1/+1
Top line of table must not be followed by a blank line. This bug caused slowdown on some files with hrules and tables, and pandoc tried to interpret the hrules as the tops of multiline tables.
2012-01-08Markdown reader: Allow links in image captions.John MacFarlane1-13/+10
This change also means that [link with [link](/url)](/url) will turn into <p><a href="/url">link with link</a></p> instead of <p><a href="/url">link with [link](/url)</a></p>
2012-01-02Markdown reader: Fix parsing of consecutive lists.John MacFarlane1-10/+12
Pandoc previously behaved like Markdown.pl for consecutive lists of different styles. Thus, the following would be parsed as a single ordered list, rather than an ordered list followed by an unordered list: 1. one 2. two - one - two This patch makes pandoc behave more sensibly, parsing this as two lists. Any change in list type (ordered/unordered) or in list number style will trigger a new list. Thus, the following will also be parsed as two lists: 1. one 2. two a. one b. two Since we regard this as a bug in Markdown.pl, and not something anyone would ever rely on, we do not preserve the old behavior even when `--strict` is selected.
2011-12-29Better smart quote parsing.John MacFarlane1-0/+2
* Added stateLastStrPos to ParserState. This lets us keep track of whether we're parsing the position immediately after a 'str'. If we encounter a ' in such a location, it must be an apostrophe, and can't be a single quote start. * Set this in the markdown, textile, html, and rst str parsers. * Closes #360.
2011-12-27Replaced Apostrophe, Ellipses, EmDash, EnDash w/ unicode strings.John MacFarlane1-8/+6
2011-12-27Markdown reader: Improved previous patch to allow unicode apostrophe.John MacFarlane1-1/+2
2011-12-26Modified str parser to capture apostrophes in smart mode.John MacFarlane1-2/+9
This solves a problem stemming from the fact that a parser doesn't know what came *before* in the input stream. Previously pandoc would parse D'oh l'*aide* as containing a single quoted "oh l", when both `'`s should be apostrophes. (Issue #360.) There are two issues here. (a) It is obvious that the first `'` is not an open quote, becaues of the preceding `D`. This patch solves the problem. (b) It is obvious to us that the second `'` is not an open quote, because we see that *aide* is some text. But getting a good algorithm that has good performance is a bit tricky. You can't assume that `'` followed by `*` is always an apostrophe: *'this is quoted'* This patch does not fix (b).
2011-12-05Markdown reader: Fixed backslash escapes in reference links.John MacFarlane1-4/+3
Closes #312.
2011-12-05Markdown: Better handling of escapes in link URLs and titles.John MacFarlane1-10/+8
2011-12-05Changes to fit new charsInBalanced.John MacFarlane1-6/+11
2011-12-05Markdown reader: internal changes.John MacFarlane1-5/+9
Refactored escapedChar into escapedChar', escapedChar.
2011-11-09Markdown citations: don't strip off initial space in locator.John MacFarlane1-1/+5
Previously `[@item1 and nowhere else]` yielded the locator ", and nowhere else", or, with the new citeproc-hs, "and nowhere else". Now it yields " and nowhere else".
2011-11-06Markdown reader: allow punctuation only internally in cite keys.John MacFarlane1-1/+2
The characters '.',':',';','$','<','>','~','#','-','_' can be used only between two letters or digits in a citation key. This means that '@item1.' will be parsed as a citation, 'item1', followed by a period, instead of a citation 'item1.', as was the case previously. Thanks to David Sanson for alerting us to the problem.
2011-07-30Added PRAGMA needed for ghc 6.12.John MacFarlane1-0/+1
2011-07-30Removed applicative stuff in Markdown reader.John MacFarlane1-16/+16
It requires parsec 3, and currently pandoc can build with parsec 2.
2011-07-30Markdown reader: Improved emph/strong parsing.John MacFarlane1-13/+34
Ported code from pandoc2. Now all tests pass.
2011-05-22Forbid ()s in citation item keys.John MacFarlane1-1/+1
Resolves Issue #304: problems with (@item1; @item2) because the final paren was being parsed as part of the item key.
2011-04-20Disallow notes within notes in reST and markdown.John MacFarlane1-1/+8
These previously caused infinite looping and stack overflows. For example: [^1] [^1]: See [^1] Note references are allowed in reST notes, so this isn't a full implementation of reST. That can come later. For now we need to prevent the stack overflows. Partially resolves Issue #297.
2011-03-02Markdown+lhs reader: Require space after inverse bird tracks.John MacFarlane1-1/+3
The point of the change is to allow html tags to be used freely at the left margin of a markdown+lhs document. Thanks to Conal Elliot for the suggestion.
2011-02-01Markdown reader: Simplified and corrected footnote block parser.John MacFarlane1-7/+10
2011-01-31Improved fix to markdown noteBlock parser.John MacFarlane1-1/+1
The last patch did not handle cases with > 4 spaces. Also added a more general test case.
2011-01-31Markdown reader: Fixed whitespace footnote bug (Jesse Rosenthal).John MacFarlane1-1/+2
The problem was in input like this: [^1]: note not in note. Also added a test case for this.
2011-01-29Markdown reader tables: Fixed bug in alignments.John MacFarlane1-4/+5
Previously pandoc got confused by blank rows in the header.
2011-01-26Add support for attributes in inline Code.John MacFarlane1-2/+3
Additional related changes: * URLs in Code in autolinks now use class "url". * Require highlighting-kate 0.2.8.2, which omits the final <br/> tag, essential for inline code.
2011-01-26Markdown reader: Don't parse latex/context environments as inline.John MacFarlane1-9/+15
2011-01-26Distinguish latex & context environments; blank line after in writers.John MacFarlane1-3/+4
2011-01-26Bumped version to 1.8; depend on pandoc-types 1.8.John MacFarlane1-10/+10
The old TeX, HtmlInline and RawHtml elements have been removed and replaced by generic RawInline and RawBlock elements. All modules updated to use the new raw elements.
2011-01-22Markdown reader: slight speedup by moving whitespace parser.John MacFarlane1-2/+2
2011-01-19Replaced more noneOf/oneOf parsers.John MacFarlane1-5/+11
2011-01-19Replaced uses of oneOf with more efficient parsers.John MacFarlane1-12/+19
This speeds up the markdown reader.
2011-01-04Markdown reader: Removed unneeded definitions.John MacFarlane1-10/+8
specialChars, strChar, specialCharsMinusLt.
2011-01-04Moved 'macro' and 'applyMacros'' from markdown reader to Parsing.John MacFarlane1-24/+0
2011-01-01Fixed regression in markdown reader.John MacFarlane1-3/+3
'(_hi_)' was being parsed with literal underscores (no emphasis). The fix: the 'str' parser now only parses alphanumerics and embedded underscores. All other symbols are handled by the 'symbol' parser. This has a slight effect on the AST, since you'll get [Str "hi",Str ":"] insntead of [Str "hi:"]. But there should not be a visible effect in any of the writers. Thanks to gwern for pointing out the regression.
2010-12-30New HTML reader using tagsoup as a lexer.John MacFarlane1-28/+27
* The new reader is faster and more accurate. * API changes for Text.Pandoc.Readers.HTML: - removed rawHtmlBlock, anyHtmlBlockTag, anyHtmlInlineTag, anyHtmlTag, anyHtmlEndTag, htmlEndTag, extractTagType, htmlBlockElement, htmlComment - added htmlTag, htmlInBalanced, isInlineTag, isBlockTag, isTextTag * tagsoup is a new dependency. * Text.Pandoc.Parsing: Generalized type on readWith. * Benchmark.hs: Added length calculation to force full evaluation. * Updated HTML reader tests. * Updated markdown and textile readers to use the functions from the HTML reader. * Note: The markdown reader now correctly handles some cases it did not before. For example: <hr/> is reproduced without adding a space. <script> a = '<b>'; </script> is parsed correctly.
2010-12-24Use functions from Text.Pandoc.Generic instead of processWith(M).John MacFarlane1-1/+2
2010-12-14Fixed regression in parsing _emph_John MacFarlane1-1/+1
There was a bug in parsing '_emph_, ...': when followed by a comma, underscore emphasis did not register. (Thanks to gwern for pointing this out.) This bug was introduced by the change in c66921f2acea456af527b93e2daa1d8594798642