aboutsummaryrefslogtreecommitdiff
path: root/src/Text/Pandoc/Parsing.hs
AgeCommit message (Collapse)AuthorFilesLines
2014-03-25API changes to HasReaderOptions, HasHeaderMap, HasIdentifierList.John MacFarlane1-31/+39
Previously these were typeclasses of monads. They've been changed to be typeclasses of states. This ismplifies the instance definitions and provides more flexibility. This is an API change! However, it should be backwards compatible unless you're defining instances of HasReaderOptions, HasHeaderMap, or HasIdentifierList. The old getOption function should work as before (albeit with a more general type). The function askReaderOption has been removed. extractReaderOptions has been added. getOption has been given a default definition. In HasHeaderMap, extractHeaderMap and updateHeaderMap have been added. Default definitions have been given for getHeaderMap, putHeaderMap, and modifyHeaderMap. In HasIdentifierList, extractIdentifierList and updateIdentifierList have been added. Default definitions have been given for getIdentifierList, putIdentifierList, and modifyIdentifierList. The ultimate goal here is to allow different parsers to use their own, tailored parser states (instead of ParserState) while still using shared functions.
2014-03-24Parsing: Make F an instance of Applicative. Closes #1138.John MacFarlane1-2/+2
2014-02-15Clarified field values in RstCustomRoles.Merijn Verstraaten1-0/+4
2014-02-15Enhanced Pandoc's support for rST roles.Merijn Verstraaten1-0/+2
rST parser now supports: - All built-in rST roles - New role definition - Role inheritance Issues/TODO: - Silently ignores illegal fields on roles - Silently drops class annotations for roles - Only supports :format: fields with a single format for :raw: roles, requires a change to Text.Pandoc.Definition.Format to support multiple formats. - Allows direct use of :raw: role, rST only allows indirect (i.e., inherited use of :raw:).
2013-12-19HLint: use `elem` and `notElem`Henry de Valence1-2/+2
Replaces long conditional chains with calls to `elem` and `notElem`.
2013-12-06HTML reader: Parse LaTeX math if appropriate options are set.John MacFarlane1-0/+35
* Moved inlineMath, displayMath from Markdown reader to Parsing. * Export them from Parsing. (API change.) * Generalize their types.
2013-11-17Parsing: Generalized type of registerHeader, using new typeclasses.John MacFarlane1-12/+42
New type classes HasReadeOptions, HasIdentifierList, HasHeaderMap. These allow certain common functions to be reused even in parsers that use custom state (instead of ParserState), such as the MediaWiki reader. Minor API bump.
2013-09-01Factored out registerHeader from markdown reader, added to Parsing.John MacFarlane1-0/+32
Text.Pandoc.Parsing now exports registerHeader, which can be used in other readers.
2013-08-18Parsing: Added stateMeta' to ParserState.John MacFarlane1-0/+2
2013-08-08Added Text.Pandoc.Compat.TagSoupEntity.John MacFarlane1-1/+1
This allows pandoc to compile with tagsoup 0.13.x. Thanks to Dirk Ullrich for the patch.
2013-08-03Removed comment that chokes recent cpp.John MacFarlane1-1/+0
Closes #933.
2013-07-02Markdown reader: Better error messages for yaml headers.John MacFarlane1-0/+2
2013-06-24Use new flexible metadata type.John MacFarlane1-7/+9
* Depend on pandoc 1.12. * Added yaml dependency. * `Text.Pandoc.XML`: Removed `stripTags`. (API change.) * `Text.Pandoc.Shared`: Added `metaToJSON`. This will be used in writers to create a JSON object for use in the templates from the pandoc metadata. * Revised readers and writers to use the new Meta type. * `Text.Pandoc.Options`: Added `Ext_yaml_title_block`. * Markdown reader: Added support for YAML metadata block. Note that it must come at the beginning of the document. * `Text.Pandoc.Parsing.ParserState`: Replace `stateTitle`, `stateAuthors`, `stateDate` with `stateMeta`. * RST reader: Improved metadata. Treat initial field list as metadata when standalone specified. Previously ALL fields "title", "author", "date" in field lists were treated as metadata, even if not at the beginning. Use `subtitle` metadata field for subtitle. * `Text.Pandoc.Templates`: Export `renderTemplate'` that takes a string instead of a compiled template.. * OPML template: Use 'for' loop for authors. * Org template: '#+TITLE:' is inserted before the title. Previously the writer did this.
2013-06-24Parsing: Generalized state type on readWith.John MacFarlane1-3/+3
2013-03-28Parsing: Better error reporting in readWith.John MacFarlane1-4/+11
- Specialize readWith to String input. - On error have it print the line in which the error occurred, with a caret pointing to the column. - This should help diagnose parsing problems in LaTeX especially.
2013-03-28Parsing: Further improvements to uri parser.John MacFarlane1-2/+4
Don't treat punctuation before percent-encoding as final punctuation. Don't treat '+' as final punctuation.
2013-03-28Mediawiki: Fixed regression for `<ref>URL</ref>`.John MacFarlane1-1/+1
`<` is no longer allowed in URLs, according to the uri parser in Text.Pandoc.Parsing. Added a test case.
2013-02-21Make `implicit_header_references` work with explicit header ids.John MacFarlane1-2/+2
(Markdown reader.)
2013-02-15Allow `&` in emails (for entities).John MacFarlane1-1/+1
Added tests for entities in titles and links. Closes #723.
2013-02-15Parsing: uri, email: resolve entities.John MacFarlane1-2/+3
A markdown link `<http://g&ouml;ogle.com>` should be a link to http://göogle.com.
2013-02-02Optimized oneOfStringsCI.John MacFarlane1-3/+9
The call to toLower in ciMatch was very expensive (and very often used), because toLower from Data.Char calls a fully unicode aware function. This optimization avoids the call to toLower for the most common, ASCII cases. This dramatically reduces the speed penalty that comes from enabling the `autolink_bare_uris` extension. The penalty is still substantial (in one test, from 0.33s to 0.44s), but nowhere near what it used to be.
2013-01-28Fixed latex macro parsing.John MacFarlane1-4/+4
Now latex macro definitions are preserved when output is latex, and applied when it is another format, as originally intended. Partially addresses #730. \providecommand is still not supported. For this we need changes to texmath.
2013-01-25Parsing: More improvements of anyLine parser.John MacFarlane1-6/+8
2013-01-25More anyLine tweaks: Use incSourceLine.John MacFarlane1-1/+1
2013-01-25anyLine: Set position properly.John MacFarlane1-0/+3
2013-01-25Parsing: Much faster new version of anyLine.John MacFarlane1-1/+8
Not only faster but uses less memory.
2013-01-20Fixed bug in uri parser.John MacFarlane1-1/+1
The bug prevented an autolink at the end of a string (e.g. at the end of a line block line) from counting as a link. Closes #711.
2013-01-15Changed Ext_autolink_urls -> Ext_autolink_bare_uris.John MacFarlane1-2/+5
Added tests.
2013-01-15Case-insensitive parsing of URI schemes.John MacFarlane1-1/+1
2013-01-15Parsing: Improve oneOfStrings, export oneOfStringsCI.John MacFarlane1-7/+20
oneOfStrings will now take the longest match it can in a list of strings, so if 'foo' and 'foobar' are both included, 'foobar' will match even if 'foo' is first in the list.
2013-01-15Revised URI parser.John MacFarlane1-27/+50
* It no longer uses Network.URIs URI parser, which is too restrictive (not allowing unicode URIs unless encoded). * It allows many more schemes. * It better handles punctuation so as to avoid capturing trailing punctuation in bare URLs.
2013-01-14Parsing: Fixed uri -- escape unicode URLs.John MacFarlane1-2/+2
Otherwise Network.URI.parseURI fails on e.g. Chinese URLs. Changed an incorrect test in markdown-reader-more.
2013-01-14Parsing: Simplified and improved singleQuoteStart.John MacFarlane1-8/+2
This makes 's', 'l', etc. parse properly. Formerly we had some English-centric heuristics, but they are no longer needed now that we keep track of the last 'Str' position in state. Closes #698.
2013-01-13Moved lineBlockLines to Parsing.John MacFarlane1-0/+18
This will be used by both RST and markdown readers.
2013-01-09More improvements in emailAddress parser.John MacFarlane1-23/+17
2013-01-09Made email parser more correct.John MacFarlane1-12/+14
Now it's based on RFC 822, though it still doesn't implement quoted strings in email addresses.
2013-01-09Added Attr field to Header.John MacFarlane1-0/+2
Previously header ids were autogenerated by the writers. Now they are generated (unless supplied explicitly) in the markdown parser, if the `header_identifiers` extension is selected. In addition, the textile reader now supports id attributes on headers.
2013-01-04Markdown reader: Warn about duplicate link references.John MacFarlane1-0/+1
2013-01-03Added stateWarnings.John MacFarlane1-2/+4
It is not connected to anything yet.
2013-01-03Implemented `Ext_header_identifiers`, `Ext_implicit_header_references`.John MacFarlane1-0/+2
Now by default pandoc will act as if link references have been defined for all headers. So, you can do this: # My header Link to [My header]. Another link to [it][My header]. Closes #691.
2012-12-13Fixed bug in withRaw.John MacFarlane1-1/+1
Didn't correctly handle case where nothing is parsed.
2012-10-05Revert "Added stateWarnings to ParserState, added warning function."John MacFarlane1-8/+0
This reverts commit 5419b504cef0cc6e1a0f3e321b2fc0a66e12db3c.
2012-10-05Added stateWarnings to ParserState, added warning function.John MacFarlane1-0/+8
This will be used to provide warnings for things like duplicate footnote refs and link refs.
2012-09-29Renamed removedLeadingTrailingSpace to trim.John MacFarlane1-3/+2
Also removeLeadingSpace to triml, removeTrailingSpace to trimr.
2012-09-27Parsing: Changed type of stateSubstitutions to use Inlines.John MacFarlane1-2/+2
2012-09-27Removed nullBlock.John MacFarlane1-6/+0
Don't use nullBlock in Textile reader. Better to know about parsing problems than to skip stuff when we get stuck.
2012-09-27Added stateSubstitutions to ParserState, use for RST substitutions.John MacFarlane1-0/+5
2012-09-23Revert "More intelligent handling of text encodings."John MacFarlane1-1/+2
This reverts commit 7272735b3d413a644fd9ab01eeae8ae9cd5a925b.
2012-09-23More intelligent handling of text encodings.John MacFarlane1-2/+1
Previously, UTF-8 was enforced for both input and output. The new system: * For input, UTF-8 is tried first; if an error is raised, the locale encoding is tried. * For output, the locale encoding is always used.
2012-09-23Revert "Use local encoding for input/output rather than forcing UTF8."John MacFarlane1-1/+2
This reverts commit c69837adb648a479167be5e2d37279a02be8060c.