aboutsummaryrefslogtreecommitdiff
path: root/src/Text/Pandoc/Parsing.hs
AgeCommit message (Collapse)AuthorFilesLines
2018-11-01Make `uri` accept any stream with Char tokensAlexander Krotov1-1/+1
2018-11-01Rewrite "uri" without "withRaw"Alexander Krotov1-17/+16
2018-10-31Generalize gridTableWith to any streams with Char tokensAlexander Krotov1-16/+18
2018-10-31Generalize parseFromString'Alexander Krotov1-3/+3
2018-10-31Generalize parseFromString to any streams with Char tokenAlexander Krotov1-4/+5
2018-10-29LaTeX reader: allow space at end of math after `\`.John MacFarlane1-1/+1
Closes #5010. Expose trimMath from T.P.Shared.
2018-10-10Pandoc.Parsing: rewrite nonspaceChar using noneOfAlexander Krotov1-1/+1
2018-08-10Avoid incomplete pattern patch.John MacFarlane1-5/+8
2018-08-10Avoid non-exhaustive pattern match.John MacFarlane1-11/+5
2018-07-02Spellcheck commentsAlexander Krotov1-1/+1
2018-05-09Parsing: Lookahead for non-whitespace after single/double quote start.John MacFarlane1-2/+4
Closes #4637.
2018-04-19Parsing.uri: don't treat `*` characters at end as part of URI.John MacFarlane1-1/+1
This fixes #4561, a bug parsing emphasized bare links in RST.
2018-04-09Fix a commentAlexander Krotov1-1/+1
2018-03-21Parsing: Fix romanNumeral parser.John MacFarlane1-3/+3
We previously accepted 'DDC' as 1100. Closes #4480.
2018-03-18Use NoImplicitPrelude and explicitly import Prelude.John MacFarlane1-0/+2
This seems to be necessary if we are to use our custom Prelude with ghci. Closes #4464.
2018-03-16Monoid/Semiground cleanup relying on custom Prelude.John MacFarlane1-9/+0
2018-03-15Remove redundant import.John MacFarlane1-2/+0
2018-03-13Require pandoc-types 1.17.4.John MacFarlane1-2/+14
And a few tweaks related to the Semigroups/Monoid change. Closes #4448.
2018-02-23Export improved sepBy1 from Text.Pandoc.ParsingAlexander Krotov1-5/+11
2018-02-19Move manyUntil to Text.Pandoc.Parsing and use it in Txt2Tags readerAlexander Krotov1-0/+15
2018-01-31Export list marker parsers from Text.Pandoc.ParsingAlexander Krotov1-0/+5
2018-01-19hlint code improvements.John MacFarlane1-14/+10
2018-01-14Markdown reader: Improved inlinesInBalancedBrackets.John MacFarlane1-0/+1
The change both improves performance and fixes a regression whereby normal citations inside inline notes were not parsed correctly. Closes jgm/pandoc-citeproc#315.
2018-01-05Update copyright notices to include 2018Albert Krewinkel1-2/+2
2017-11-19Allow spaces after `\(` and before `\)` with `tex_math_single_backslash`.John MacFarlane1-2/+2
Previously `\( \frac{1}{a} < \frac{1}{b} \)` was not parsed as math in `markdown` or `html` `+tex_math_single_backslash`.
2017-11-14Text.Pandoc.Parsing.uri: allow `&` and `=` as word characters.John MacFarlane1-1/+1
This fixes a bug where pandoc would stop parsing a URI with an empty attribute: for example, `&a=&b=` wolud stop at `a`. (The uri parser tries to guess which punctuation characters are part of the URI and which might be punctuation after it.) Closes #4068.
2017-11-01hlintAlexander Krotov1-18/+18
2017-10-29Source code reformatting.John MacFarlane1-65/+64
2017-10-23Implemented fenced Divs.John MacFarlane1-0/+2
+ Added Ext_fenced_divs to Extensions (default for pandoc Markdown). + Document fenced_divs extension in manual. + Implemented fenced code divs in Markdown reader. + Added test. Closes #168.
2017-08-28RST reader: handle blank lines correctly in line blocks (#3881)Alexander1-1/+1
Previously pandoc would sometimes combine two line blocks separated by blanks, and ignore trailing blank lines within the line block. Test is checked to be consisted with http://rst.ninjs.org/
2017-08-19Markdown reader: use CommonMark rules for list item nesting.John MacFarlane1-8/+28
Closes #3511. Previously pandoc used the four-space rule: continuation paragraphs, sublists, and other block level content had to be indented 4 spaces. Now the indentation required is determined by the first line of the list item: to be included in the list item, blocks must be indented to the level of the first non-space content after the list marker. Exception: if are 5 or more spaces after the list marker, then the content is interpreted as an indented code block, and continuation paragraphs must be indented two spaces beyond the end of the list marker. See the CommonMark spec for more details and examples. Documents that adhere to the four-space rule should, in most cases, be parsed the same way by the new rules. Here are some examples of texts that will be parsed differently: - a - b will be parsed as a list item with a sublist; under the four-space rule, it would be a list with two items. - a code Here we have an indented code block under the list item, even though it is only indented six spaces from the margin, because it is four spaces past the point where a continuation paragraph could begin. With the four-space rule, this would be a regular paragraph rather than a code block. - a code Here the code block will start with two spaces, whereas under the four-space rule, it would start with `code`. With the four-space rule, indented code under a list item always must be indented eight spaces from the margin, while the new rules require only that it be indented four spaces from the beginning of the first non-space text after the list marker (here, `a`). This change was motivated by a slew of bug reports from people who expected lists to work differently (#3125, #2367, #2575, #2210, #1990, #1137, #744, #172, #137, #128) and by the growing prevalance of CommonMark (now used by GitHub, for example). Users who want to use the old rules can select the `four_space_rule` extension. * Added `four_space_rule` extension. * Added `Ext_four_space_rule` to `Extensions`. * `Parsing` now exports `gobbleAtMostSpaces`, and the type of `gobbleSpaces` has been changed so that a `ReaderOptions` parameter is not needed.
2017-08-08Parsing: added gobbleSpaces.John MacFarlane1-0/+12
This is a utility function to use in list parsing.
2017-07-14Fix ghc 8.2.1 compiler warnings.John MacFarlane1-23/+26
2017-07-14Revert "Fixed some ghc 8.2 compiler warnings."John MacFarlane1-14/+14
This reverts commit e22dc98a70d030cc6b4056d14ddd6462c7790f97.
2017-07-14Fixed some ghc 8.2 compiler warnings.John MacFarlane1-14/+14
(Unnecessary type constraints.)
2017-07-07Parsing: added takeP, takeWhileP for efficient parsing of [Char].John MacFarlane1-2/+33
2017-07-07Rewrote LaTeX reader with proper tokenization.John MacFarlane1-45/+22
This rewrite is primarily motivated by the need to get macros working properly. A side benefit is that the reader is significantly faster (27s -> 19s in one benchmark, and there is a lot of room for further optimization). We now tokenize the input text, then parse the token stream. Macros modify the token stream, so they should now be effective in any context, including math. Thus, we no longer need the clunky macro processing capacities of texmath. A custom state LaTeXState is used instead of ParserState. This, plus the tokenization, will require some rewriting of the exported functions rawLaTeXInline, inlineCommand, rawLaTeXBlock. * Added Text.Pandoc.Readers.LaTeX.Types (new exported module). Exports Macro, Tok, TokType, Line, Column. [API change] * Text.Pandoc.Parsing: adjusted type of `insertIncludedFile` so it can be used with token parser. * Removed old texmath macro stuff from Parsing. Use Macro from Text.Pandoc.Readers.LaTeX.Types instead. * Removed texmath macro material from Markdown reader. * Changed types for Text.Pandoc.Readers.LaTeX's rawLaTeXInline and rawLaTeXBlock. (Both now return a String, and they are polymorphic in state.) * Added orgMacros field to OrgState. [API change] * Removed readerApplyMacros from ReaderOptions. Now we just check the `latex_macros` reader extension. * Allow `\newcommand\foo{blah}` without braces. Fixes #1390. Fixes #2118. Fixes #3236. Fixes #3779. Fixes #934. Fixes #982.
2017-06-19Tracing: give less misleading line information with parseWithString.John MacFarlane1-1/+2
Previously positions would be reported past the end of the chunk. We now reset the source position within the chunk and report positions "in chunk."
2017-05-28Parsing: `many1Till`: Check for the end condition before parsingHerwig Stuetz1-2/+3
By not checking for the end condition before the first parse, the parser was applied too often, consuming too much of the input. This fixes the behaviour of `testStringWith (many1Till (oneOf "ab") (string "aa")) "aaa"` which before incorrectly returned `Right "a"`. With this change, it instead correctly fails with `Left (PandocParsecError ...)` because it is not able to parse at least one occurence of `oneOf "ab"` that is not `"aa"`. Note that this only affects `many1Till p end` where `p` matches on a prefix of `end`.
2017-05-25Markdown reader: warn for notes defined but not used.John MacFarlane1-2/+5
Closes #1718. Parsing.ParserState: Make stateNotes' a Map, add stateNoteRefs.
2017-05-24Parsing: Provide parseFromString'.John MacFarlane1-1/+17
This is a verison of parseFromString specialied to ParserState, which resets stateLastStrPos at the end. This is almost always what we want. This fixes a bug where `_hi_` wasn't treated as emphasis in the following, because pandoc got confused about the position of the last word: - [o] _hi_ Closes #3690.
2017-05-23Shared: Provide custom isURI that rejects unknown schemes [isURI]Albert Krewinkel1-26/+1
We also export the set of known `schemes`. The new function replaces the function of the same name from `Network.URI`, as the latter did not check whether a scheme is well-known. E.g. MediaWiki wikis frequently feature pages with names like `User:John`. These links were interpreted as URIs, thus turning internal links into global links. This is prevented by also checking whether the scheme of a URI is frequently used (i.e. is IANA registered or an otherwise well-known scheme). Fixes: #2713 Update set of well-known URIs from IANA list All official IANA schemes (as of 2017-05-22) are included in the set of known schemes. The four non-official schemes doi, isbn, javascript, and pmid are kept.
2017-05-22Move indentWith to Text.Pandoc.Parsing (#3687)Alexander Krotov1-0/+12
2017-05-17Merge pull request #3677 from labdsf/anylinenewlineJohn MacFarlane1-0/+5
Move anyLineNewline to Parsing.hs
2017-05-17Move anyLineNewline to Parsing.hsAlexander Krotov1-0/+5
2017-05-14Parsing: add `insertIncludedFilesF` which returns F blocksAlbert Krewinkel1-7/+24
The `insertIncludeFiles` function was generalized and renamed to `insertIncludedFiles'`; the specialized versions are based on that.
2017-05-14Parsing: introduce `HasIncludeFiles` type classAlbert Krewinkel1-9/+22
The `insertIncludeFile` function is generalized to work with all parser states which are instances of that class.
2017-05-14Parsing: replace partial with total functionAlbert Krewinkel1-1/+1
Calling `tail` on an empty list raises an exception, while calling the otherwise equivalent `drop 1` will return the empty list again.
2017-05-13Update dates in copyright noticesAlbert Krewinkel1-2/+2
This follows the suggestions given by the FSF for GPL licensed software. <https://www.gnu.org/prep/maintain/html_node/Copyright-Notices.html>
2017-05-11Combine grid table parsersAlbert Krewinkel1-18/+51
The grid table parsers for markdown and rst was combined into one single parser, slightly changing parsing behavior of both parsers: - The markdown parser now compactifies block content cell-wise: pure text blocks in cells are now treated as paragraphs only if the cell contains multiple paragraphs, and as plain blocks otherwise. Before, this was true only for single-column tables. - The rst parser now accepts newlines and multiple blocks in header cells. Closes: #3638