aboutsummaryrefslogtreecommitdiff
path: root/src/Text/Pandoc/Parsing.hs
AgeCommit message (Collapse)AuthorFilesLines
2013-03-28Parsing: Better error reporting in readWith.John MacFarlane1-4/+11
- Specialize readWith to String input. - On error have it print the line in which the error occurred, with a caret pointing to the column. - This should help diagnose parsing problems in LaTeX especially.
2013-03-28Parsing: Further improvements to uri parser.John MacFarlane1-2/+4
Don't treat punctuation before percent-encoding as final punctuation. Don't treat '+' as final punctuation.
2013-03-28Mediawiki: Fixed regression for `<ref>URL</ref>`.John MacFarlane1-1/+1
`<` is no longer allowed in URLs, according to the uri parser in Text.Pandoc.Parsing. Added a test case.
2013-02-21Make `implicit_header_references` work with explicit header ids.John MacFarlane1-2/+2
(Markdown reader.)
2013-02-15Allow `&` in emails (for entities).John MacFarlane1-1/+1
Added tests for entities in titles and links. Closes #723.
2013-02-15Parsing: uri, email: resolve entities.John MacFarlane1-2/+3
A markdown link `<http://g&ouml;ogle.com>` should be a link to http://göogle.com.
2013-02-02Optimized oneOfStringsCI.John MacFarlane1-3/+9
The call to toLower in ciMatch was very expensive (and very often used), because toLower from Data.Char calls a fully unicode aware function. This optimization avoids the call to toLower for the most common, ASCII cases. This dramatically reduces the speed penalty that comes from enabling the `autolink_bare_uris` extension. The penalty is still substantial (in one test, from 0.33s to 0.44s), but nowhere near what it used to be.
2013-01-28Fixed latex macro parsing.John MacFarlane1-4/+4
Now latex macro definitions are preserved when output is latex, and applied when it is another format, as originally intended. Partially addresses #730. \providecommand is still not supported. For this we need changes to texmath.
2013-01-25Parsing: More improvements of anyLine parser.John MacFarlane1-6/+8
2013-01-25More anyLine tweaks: Use incSourceLine.John MacFarlane1-1/+1
2013-01-25anyLine: Set position properly.John MacFarlane1-0/+3
2013-01-25Parsing: Much faster new version of anyLine.John MacFarlane1-1/+8
Not only faster but uses less memory.
2013-01-20Fixed bug in uri parser.John MacFarlane1-1/+1
The bug prevented an autolink at the end of a string (e.g. at the end of a line block line) from counting as a link. Closes #711.
2013-01-15Changed Ext_autolink_urls -> Ext_autolink_bare_uris.John MacFarlane1-2/+5
Added tests.
2013-01-15Case-insensitive parsing of URI schemes.John MacFarlane1-1/+1
2013-01-15Parsing: Improve oneOfStrings, export oneOfStringsCI.John MacFarlane1-7/+20
oneOfStrings will now take the longest match it can in a list of strings, so if 'foo' and 'foobar' are both included, 'foobar' will match even if 'foo' is first in the list.
2013-01-15Revised URI parser.John MacFarlane1-27/+50
* It no longer uses Network.URIs URI parser, which is too restrictive (not allowing unicode URIs unless encoded). * It allows many more schemes. * It better handles punctuation so as to avoid capturing trailing punctuation in bare URLs.
2013-01-14Parsing: Fixed uri -- escape unicode URLs.John MacFarlane1-2/+2
Otherwise Network.URI.parseURI fails on e.g. Chinese URLs. Changed an incorrect test in markdown-reader-more.
2013-01-14Parsing: Simplified and improved singleQuoteStart.John MacFarlane1-8/+2
This makes 's', 'l', etc. parse properly. Formerly we had some English-centric heuristics, but they are no longer needed now that we keep track of the last 'Str' position in state. Closes #698.
2013-01-13Moved lineBlockLines to Parsing.John MacFarlane1-0/+18
This will be used by both RST and markdown readers.
2013-01-09More improvements in emailAddress parser.John MacFarlane1-23/+17
2013-01-09Made email parser more correct.John MacFarlane1-12/+14
Now it's based on RFC 822, though it still doesn't implement quoted strings in email addresses.
2013-01-09Added Attr field to Header.John MacFarlane1-0/+2
Previously header ids were autogenerated by the writers. Now they are generated (unless supplied explicitly) in the markdown parser, if the `header_identifiers` extension is selected. In addition, the textile reader now supports id attributes on headers.
2013-01-04Markdown reader: Warn about duplicate link references.John MacFarlane1-0/+1
2013-01-03Added stateWarnings.John MacFarlane1-2/+4
It is not connected to anything yet.
2013-01-03Implemented `Ext_header_identifiers`, `Ext_implicit_header_references`.John MacFarlane1-0/+2
Now by default pandoc will act as if link references have been defined for all headers. So, you can do this: # My header Link to [My header]. Another link to [it][My header]. Closes #691.
2012-12-13Fixed bug in withRaw.John MacFarlane1-1/+1
Didn't correctly handle case where nothing is parsed.
2012-10-05Revert "Added stateWarnings to ParserState, added warning function."John MacFarlane1-8/+0
This reverts commit 5419b504cef0cc6e1a0f3e321b2fc0a66e12db3c.
2012-10-05Added stateWarnings to ParserState, added warning function.John MacFarlane1-0/+8
This will be used to provide warnings for things like duplicate footnote refs and link refs.
2012-09-29Renamed removedLeadingTrailingSpace to trim.John MacFarlane1-3/+2
Also removeLeadingSpace to triml, removeTrailingSpace to trimr.
2012-09-27Parsing: Changed type of stateSubstitutions to use Inlines.John MacFarlane1-2/+2
2012-09-27Removed nullBlock.John MacFarlane1-6/+0
Don't use nullBlock in Textile reader. Better to know about parsing problems than to skip stuff when we get stuck.
2012-09-27Added stateSubstitutions to ParserState, use for RST substitutions.John MacFarlane1-0/+5
2012-09-23Revert "More intelligent handling of text encodings."John MacFarlane1-1/+2
This reverts commit 7272735b3d413a644fd9ab01eeae8ae9cd5a925b.
2012-09-23More intelligent handling of text encodings.John MacFarlane1-2/+1
Previously, UTF-8 was enforced for both input and output. The new system: * For input, UTF-8 is tried first; if an error is raised, the locale encoding is tried. * For output, the locale encoding is always used.
2012-09-23Revert "Use local encoding for input/output rather than forcing UTF8."John MacFarlane1-1/+2
This reverts commit c69837adb648a479167be5e2d37279a02be8060c.
2012-09-23Use local encoding for input/output rather than forcing UTF8.John MacFarlane1-2/+1
Note that system templates are stored as UTF8 and will still be read as such, even if the local encoding is different. Text downloaded from URLs will also be treated as UTF-8.
2012-09-12Export 'nested' in Parsing.John MacFarlane1-0/+13
2012-09-12Text.Pandoc.Parsing: Handle trailing slash in 'uri'.John MacFarlane1-2/+3
2012-09-09Parsing: Generalized type of withQuoteContext.John MacFarlane1-2/+2
2012-08-08Changes to literate haskell options.John MacFarlane1-5/+0
- Removed writerLiterateHaskell from WriterOptions. - Removed readerLiterateHaskell from ReaderOptions. - Added Ext_literate_haskell to Extensions. Test for this instead of the above. - Removed failUnlessLHS from Shared. Note: At this point, +lhs and .lhs extension no longer has any effect. Need to fix.
2012-08-02Made F a newtype, moved definitions to Parser.John MacFarlane1-1/+23
Parser now exports F(..), askF, asksF, runF.
2012-08-01Parsing: removed duplication of Key and Key'.John MacFarlane1-38/+5
Now we just use the former Key' (string contents), renamed Key. lookupKeySrc and fromKey are no longer eport. Key', toKey' and KeyTable' have become Key, toKey, and KeyTable.
2012-08-01Major rewrite of markdown reader.John MacFarlane1-14/+43
* Use Builder's Inlines/Blocks instead of lists. * Return values in the reader monad, which are then run (at the end of parsing) against the final parser state. This allows links, notes, and example numbers to be resolved without a second parser pass. * An effect of using Builder is that everything is normalized automatically. * New exports from Text.Pandoc.Parsing: widthsFromIndices, NoteTable', KeyTable', Key', toKey', withQuoteContext, singleQuoteStart, singleQuoteEnd, doubleQuoteStart, doubleQuoteEnd, ellipses, apostrophe, dash * Updated opendocument tests. * Don't derive Show for ParserState. * Benchmarks: markdown reader takes 82% of the time it took before. Markdown writer takes 92% of the time (here the speedup is probably due to the fact that everything is normalized by default).
2012-07-27Removed commented-out pandoc2 code.John MacFarlane1-41/+0
This will be developed in a branch, noreparsing.
2012-07-27Parser: Changed types to use type alias Parser, not Parsec.John MacFarlane1-97/+138
2012-07-26Fixed whitespace errors.John MacFarlane1-25/+25
2012-07-26Parsing: Removed failIfStrict.John MacFarlane1-5/+0
2012-07-26Parsing: Added guardEnabled, guardDisabled.John MacFarlane1-3/+14
2012-07-25Moved stateApplyMacros, stateIndentedCodeClasses to ReaderOptions.John MacFarlane1-6/+2