aboutsummaryrefslogtreecommitdiff
path: root/src/Text/Pandoc/Parsing.hs
AgeCommit message (Collapse)AuthorFilesLines
2016-10-23Tighten up parsing of raw email addresses.John MacFarlane1-4/+13
Technically `**@user` is a valid email address, but if we allow things like this, we get bad results in markdown flavors that autolink raw email addresses. (See #2940.) So we exclude a few valid email addresses in order to avoid these more common bad cases. Closes #2940.
2016-10-13Allow empty lines when parsing line blocksAlbert Krewinkel1-2/+5
Line blocks are allowed to contain empty lines and should be parsed as a single block in that case. Previously an empty (line block) line would have terminated parsing of the line block element.
2016-09-02Remove TagSoup compatJesse Rosenthal1-3/+3
We already lower-bound tagsoup at 0.13.7, which means we were always running the compatibility layer (it was conditional on min value 0.13). Better to just use `lookupEntity` from the library directly, and convert a string to a char if need be.
2016-09-02Remove Compat.MonoidJesse Rosenthal1-1/+1
This was only necessary for GHC versions with base below 4.5 (i.e., ghc < 7.4).
2016-07-15Use liftM since otherwise Functor type constraint needen in ghc 7.8.John MacFarlane1-1/+1
2016-07-14Fixed compiler warnings.John MacFarlane1-3/+3
2016-03-22Updated copyright dates to include 2016.John MacFarlane1-2/+2
2016-01-22Changed type of Shared.uniqueIdent argument from [String] to Set String.John MacFarlane1-6/+6
This avoids performance problems in documents with many identically named headers. Closes #2671.
2016-01-08Work around tagsoup bug - not allowing uppercase x in hex entities.John MacFarlane1-0/+1
Issue submitted at tagsoup.
2016-01-08Entity handling fixes:John MacFarlane1-1/+4
- Text.Pandoc.XML.fromEntities: handle entities without a semicolon. Always lookup character references with the trailing ';', even if it wasn't present. And never add it when looking up numerical entities. (This is what tagsoup seems to require.) - Text.Pandoc.Parsing.characterReference: Always lookup character references with the trailing ';', and leave off the ';' when looking up numerical entities. This fixes a regression for e.g. `&lang;`.
2015-12-12Fixed cite key parsing regression.John MacFarlane1-1/+1
We were capturing final colons as in [@foo: bar]; the citation id was being parsed as "@foo:". Closes jgm/pandoc-citeproc#201.
2015-11-19Merge branch 'new-image-attributes' of https://github.com/mb21/pandoc into ↵John MacFarlane1-2/+14
mb21-new-image-attributes * Bumped version to 1.16. * Added Attr field to Link and Image. * Added `common_link_attributes` extension. * Updated readers for link attributes. * Updated writers for link attributes. * Updated tests * Updated stack.yaml to build against unreleased versions of pandoc-types and texmath. * Fixed various compiler warnings. Closes #261. TODO: * Relative (percentage) image widths in docx writer. * ODT/OpenDocument writer (untested, same issue about percentage widths). * Update pandoc-citeproc.
2015-11-13Allow `://` in citation keys.John MacFarlane1-1/+2
Closes jgm/pandoc-citeproc#166.
2015-11-09Restored Text.Pandoc.Compat.Monoid.John MacFarlane1-0/+1
Don't use custom prelude for latest ghc. This is a better approach to making 'stack ghci' and 'cabal repl' work. Instead of using NoImplicitPrelude, we only use the custom prelude for older ghc versions. The custom prelude presents a uniform API that matches the current base version's prelude. So, when developing (presumably with latest ghc), we don't use a custom prelude at all and hence have no trouble with ghci. The custom prelude no longer exports (<>): we now want to match the base 4.8 prelude behavior.
2015-11-09Revert "Use -XNoImplicitPrelude and 'import Prelude' explicitly."John MacFarlane1-1/+0
This reverts commit c423dbb5a34c2d1195020e0f0ca3aae883d0749b.
2015-11-08Use -XNoImplicitPrelude and 'import Prelude' explicitly.John MacFarlane1-0/+1
This is needed for ghci to work with pandoc, given that we now use a custom prelude. Closes #2503.
2015-10-14Use custom Prelude to avoid compiler warnings.John MacFarlane1-2/+0
- The (non-exported) prelude is in prelude/Prelude.hs. - It exports Monoid and Applicative, like base 4.8 prelude, but works with older base versions. - It exports (<>) for mappend. - It hides 'catch' on older base versions. This allows us to remove many imports of Data.Monoid and Control.Applicative, and remove Text.Pandoc.Compat.Monoid. It should allow us to use -Wall again for ghc 7.10.
2015-08-05Parsing: Add `extractIdClass`, modified type of `KeyTable`.John MacFarlane1-2/+14
(mb21)
2015-07-23Parsing: toKey: strip off outer brackets.John MacFarlane1-2/+4
This makes keys with extra space at the beginning and end work: e.g. [foo]: bar [ foo ] will now be a link to bar (it wasn't before).
2015-07-14Improved bare autolink detection.John MacFarlane1-3/+2
Previously we disallowed `-` at the end of an autolink, and disallowed the combination `=-`. This commit liberalizes the rules for allowing punctuation in a bare URI. Added test cases. One potential drawback is that you can no longer put a bare URI in em dashes like this this uri---http://example.com---is an example. But in this respect we now match github's treatment of bare URIs. Closes #2299.
2015-05-13Markdown reader: Made implicit header references case-insensitive.John MacFarlane1-1/+3
Added `stateHeaderKeys` to `ParserState`; this is a `KeyTable` like `stateKeys`, but it only gets consulted if we don't find a match in `stateKeys`, and if `Ext_implicit_header_references` is enabled. Closes #1606.
2015-05-11HTML reader: Fixed detection of self-closing tags.John MacFarlane1-1/+1
Earlier versions had a bug and would wrongly think opening tags containing attributes with slashes in them were self-closing. Closes #2146.
2015-04-26Updated copyright notices to -2015. Closes #2111.John MacFarlane1-2/+2
2015-04-18Revert "Merge pull request #1947 from mpickering/Fmonad"John MacFarlane1-22/+33
Closes #2062. This reverts commit c302bdcdbe97b38721015fe82403b2a8f488a702, reversing changes made to b983adf0d0cbc98d2da1e2751f46ae1f93352be6. Conflicts: src/Text/Pandoc/Parsing.hs src/Text/Pandoc/Readers/Markdown.hs src/Text/Pandoc/Readers/Org.hs src/Text/Pandoc/Readers/RST.hs
2015-04-17Merge pull request #1954 from mcmtroffaes/feature/citekey-firstchar-alphanumJohn MacFarlane1-1/+1
Allow digit as first character of a citation key.
2015-04-18MD Reader: Smart `'` after inline mathNikolay Yakimov1-1/+6
Closes #1909. Adds new parser combinator to Parsing.hs `a <+?> b` : if a succeeds, applies b and mappends output (if any) to result of a. If b fails, it's just a, if a fails, whole expression fails.
2015-02-18Add Text.Pandoc.Error module with PandocError typeMatthew Pickering1-13/+6
2015-02-18Allow digit as first character of a citation key.Matthias C. M. Troffaes1-1/+1
* Update parser to recognize citation keys starting with a digit. * Update documentation accordingly. * Test case added. See https://github.com/jgm/pandoc-citeproc/issues/97
2015-02-18Factor out "returnState" into Parsing moduleMatthew Pickering1-0/+5
2015-02-18Generalise signature of addWarningMatthew Pickering1-1/+1
2015-02-18Add check to see whether in a footnote to ParserState (to avoid circular ↵Matthew Pickering1-2/+4
footnotes)
2015-02-18Remove F monad from ParsingMatthew Pickering1-24/+2
2015-02-18Changed parseWithWarnings to the more general returnWarnings parser transformerMatthew Pickering1-6/+5
2015-02-18Added generalize function which can be used to lift specialised parsers.Matthew Pickering1-0/+4
Monad m => Parsec s st a -> Parsec T s st m a
2014-12-15Text.Pandoc.Parsing: Change parseFromString to fail if not all input isMatthew Pickering1-1/+3
consumed.
2014-12-15Merge pull request #1805 from bergey/rstJohn MacFarlane1-3/+20
RST Reader - Improved Role Support
2014-12-14Fixe autolinks with following punctuation.John MacFarlane1-1/+1
Closes #1811. The price of this is that autolinked bare URIs can no longer contain `>` characters, but this is not a big issue.
2014-12-12RST Reader: compute Attrs when role is definedDaniel Bergey1-3/+2
Move recursive role lookup from renderRole to addNewRole. The Attr value will be the same for every occurance of this role, so there's no reason to compute it every time. This allows simplifying the stateRstCustomRoles map considerably. We could go even further, and remove the fmt and attr arguments to renderRole, which are null except for custom roles.
2014-12-12expose warnings from RST reader; refactorDaniel Bergey1-0/+10
This commit moves some code which was only used for the Markdown Reader into a generic form which can be used for any Reader. Otherwise, it takes naming and interface cues from the preexisting Markdown code.
2014-12-08RST Reader: Warn about skipped directivesDaniel Bergey1-0/+8
move `addWarning` to Parsing.hs, so it can be used by Markdown & RST readers.
2014-10-19Parsing: fixed `inlineMath` so it handles `\text{..}` containing `$`.John MacFarlane1-1/+23
For example: `$x = \text{the $n$th root of $y$}`. Closes #1677.
2014-08-04Use texmath 0.7 interface.John MacFarlane1-1/+2
2014-07-27Parsing: Added isbn and pmid schemesMatthew Pickering1-2/+2
2014-07-26Generalised more in Parsing.hs to enable the use of custom stateMatthew Pickering1-40/+53
2014-07-22Exported runParserT and StreamMatthew Pickering1-0/+2
2014-07-22Generalised readWith to readWithMMatthew Pickering1-10/+19
2014-07-20Fix behavior of `markdown_attribute` extension.John MacFarlane1-0/+2
It now works as in PHP markdown extra. Setting `markdown="1"` on an outer tag affects all contained tags until it is reversed with `markdown="0"`. Closes #1378. Added `stateMarkdownAttribute` to `ParserState`.
2014-07-20readWith: reverted generalization from f201bdcb.John MacFarlane1-8/+8
We need input to be a string so we can print the offending line on an error.
2014-07-12Parsing: Simplified dash and ellipsis.John MacFarlane1-40/+13
This originated with @dubiousjim's observation in #1419 that there was a typo in the definition of enDash. It returned an em dash character instead of an en dash. I thought about why this had not been noticed before, and realized that en dashes were just being parsed as regular symbols. That made me realize that, now that we no longer have dedicate EnDash, EmDash, and Ellipses inline elements, as we used to in pandoc, we no longer need to parse the unicode characters specially. This allowed a considerable simplification of the code. Partially resolves #1419.
2014-07-12Removed space at ends of lines in source.John MacFarlane1-37/+37