aboutsummaryrefslogtreecommitdiff
path: root/src/Text/Pandoc/Parsing.hs
AgeCommit message (Collapse)AuthorFilesLines
2014-07-12Parsing: Simplified dash and ellipsis.John MacFarlane1-40/+13
This originated with @dubiousjim's observation in #1419 that there was a typo in the definition of enDash. It returned an em dash character instead of an en dash. I thought about why this had not been noticed before, and realized that en dashes were just being parsed as regular symbols. That made me realize that, now that we no longer have dedicate EnDash, EmDash, and Ellipses inline elements, as we used to in pandoc, we no longer need to parse the unicode characters specially. This allowed a considerable simplification of the code. Partially resolves #1419.
2014-07-12Removed space at ends of lines in source.John MacFarlane1-37/+37
2014-07-11Removed inline fmap from Parsing.hsMatthew Pickering1-8/+8
Replaced all inline occurences of fmap with the more idiomatic (<$>).
2014-07-11Removed (>>~) functionMatthew Pickering1-9/+4
This function is equivalent to the more general (<*) which is defined in Control.Applicative. This change makes pandoc code easier to understand for those not familar with the codebase.
2014-07-11Generalised all functions in Parsing.hsMatthew Pickering1-128/+168
Before it wasn't possible to use these general combinators with the ParsecT transformer but with the more general types this is now possible.
2014-07-07`Parsing`: Added `stateInHtmlBlock` to `ParserState`.John MacFarlane1-0/+2
This is used to keep track of the ending tag we're waiting for when we're parsing inside HTML block tags.
2014-05-27Markdown reader: inline math must have nonspace before final `$`.John MacFarlane1-4/+6
Closes #1313.
2014-05-14Move `citeKey` from Readers.Markdown to ParsingAlbert Krewinkel1-0/+13
The function can be used by other readers, so it is made accessible for all parsers.
2014-05-14Introduce class HasLastStrPosition, generalize functionsAlbert Krewinkel1-9/+23
Both `ParserState` and `OrgParserState` keep track of the parser position at which the last string ended. This patch introduces a new class `HasLastStrPosition` and makes the above types instances of that class. This enables the generalization of functions updating the state or checking if one is right after a string.
2014-05-09Update copyright notices for 2014, add missing noticesAlbert Krewinkel1-2/+2
2014-05-03LaTeX reader: Better error messages with include files.John MacFarlane1-1/+1
Closes #1274. Rewrote handleIncludes. We now report the actual source file and position where the error occurs, even if it is included. We do this by inserting special commands, `\PandocStartInclude` and `\PandocEndInclude`, that encode this information in the preprocessing phase. Also generalized the types of a couple functions from `Text.Pandoc.Parsing`.
2014-04-01Changed the smart punctuation parser to return Inlines rather than an Inline ↵Matthew Pickering1-22/+21
element and updated files accordingly
2014-03-25Parsing: Added stateCaption.John MacFarlane1-1/+2
This is primarily for use in the LaTeX reader, so far.
2014-03-25Parsing: Added HasMacros, simplified other typeclasses.John MacFarlane1-28/+22
Removed updateHeaderMap, setHeaderMap, getHeaderMap, updateIdentifierList, setIdentifierList, getIdentifierList.
2014-03-25Whitespace change, and note:John MacFarlane1-0/+1
Contrary to the previous commit message, there was no API change, since Text.Pandoc.Parsing is not an exposed module.
2014-03-25API changes to HasReaderOptions, HasHeaderMap, HasIdentifierList.John MacFarlane1-31/+39
Previously these were typeclasses of monads. They've been changed to be typeclasses of states. This ismplifies the instance definitions and provides more flexibility. This is an API change! However, it should be backwards compatible unless you're defining instances of HasReaderOptions, HasHeaderMap, or HasIdentifierList. The old getOption function should work as before (albeit with a more general type). The function askReaderOption has been removed. extractReaderOptions has been added. getOption has been given a default definition. In HasHeaderMap, extractHeaderMap and updateHeaderMap have been added. Default definitions have been given for getHeaderMap, putHeaderMap, and modifyHeaderMap. In HasIdentifierList, extractIdentifierList and updateIdentifierList have been added. Default definitions have been given for getIdentifierList, putIdentifierList, and modifyIdentifierList. The ultimate goal here is to allow different parsers to use their own, tailored parser states (instead of ParserState) while still using shared functions.
2014-03-24Parsing: Make F an instance of Applicative. Closes #1138.John MacFarlane1-2/+2
2014-02-15Clarified field values in RstCustomRoles.Merijn Verstraaten1-0/+4
2014-02-15Enhanced Pandoc's support for rST roles.Merijn Verstraaten1-0/+2
rST parser now supports: - All built-in rST roles - New role definition - Role inheritance Issues/TODO: - Silently ignores illegal fields on roles - Silently drops class annotations for roles - Only supports :format: fields with a single format for :raw: roles, requires a change to Text.Pandoc.Definition.Format to support multiple formats. - Allows direct use of :raw: role, rST only allows indirect (i.e., inherited use of :raw:).
2013-12-19HLint: use `elem` and `notElem`Henry de Valence1-2/+2
Replaces long conditional chains with calls to `elem` and `notElem`.
2013-12-06HTML reader: Parse LaTeX math if appropriate options are set.John MacFarlane1-0/+35
* Moved inlineMath, displayMath from Markdown reader to Parsing. * Export them from Parsing. (API change.) * Generalize their types.
2013-11-17Parsing: Generalized type of registerHeader, using new typeclasses.John MacFarlane1-12/+42
New type classes HasReadeOptions, HasIdentifierList, HasHeaderMap. These allow certain common functions to be reused even in parsers that use custom state (instead of ParserState), such as the MediaWiki reader. Minor API bump.
2013-09-01Factored out registerHeader from markdown reader, added to Parsing.John MacFarlane1-0/+32
Text.Pandoc.Parsing now exports registerHeader, which can be used in other readers.
2013-08-18Parsing: Added stateMeta' to ParserState.John MacFarlane1-0/+2
2013-08-08Added Text.Pandoc.Compat.TagSoupEntity.John MacFarlane1-1/+1
This allows pandoc to compile with tagsoup 0.13.x. Thanks to Dirk Ullrich for the patch.
2013-08-03Removed comment that chokes recent cpp.John MacFarlane1-1/+0
Closes #933.
2013-07-02Markdown reader: Better error messages for yaml headers.John MacFarlane1-0/+2
2013-06-24Use new flexible metadata type.John MacFarlane1-7/+9
* Depend on pandoc 1.12. * Added yaml dependency. * `Text.Pandoc.XML`: Removed `stripTags`. (API change.) * `Text.Pandoc.Shared`: Added `metaToJSON`. This will be used in writers to create a JSON object for use in the templates from the pandoc metadata. * Revised readers and writers to use the new Meta type. * `Text.Pandoc.Options`: Added `Ext_yaml_title_block`. * Markdown reader: Added support for YAML metadata block. Note that it must come at the beginning of the document. * `Text.Pandoc.Parsing.ParserState`: Replace `stateTitle`, `stateAuthors`, `stateDate` with `stateMeta`. * RST reader: Improved metadata. Treat initial field list as metadata when standalone specified. Previously ALL fields "title", "author", "date" in field lists were treated as metadata, even if not at the beginning. Use `subtitle` metadata field for subtitle. * `Text.Pandoc.Templates`: Export `renderTemplate'` that takes a string instead of a compiled template.. * OPML template: Use 'for' loop for authors. * Org template: '#+TITLE:' is inserted before the title. Previously the writer did this.
2013-06-24Parsing: Generalized state type on readWith.John MacFarlane1-3/+3
2013-03-28Parsing: Better error reporting in readWith.John MacFarlane1-4/+11
- Specialize readWith to String input. - On error have it print the line in which the error occurred, with a caret pointing to the column. - This should help diagnose parsing problems in LaTeX especially.
2013-03-28Parsing: Further improvements to uri parser.John MacFarlane1-2/+4
Don't treat punctuation before percent-encoding as final punctuation. Don't treat '+' as final punctuation.
2013-03-28Mediawiki: Fixed regression for `<ref>URL</ref>`.John MacFarlane1-1/+1
`<` is no longer allowed in URLs, according to the uri parser in Text.Pandoc.Parsing. Added a test case.
2013-02-21Make `implicit_header_references` work with explicit header ids.John MacFarlane1-2/+2
(Markdown reader.)
2013-02-15Allow `&` in emails (for entities).John MacFarlane1-1/+1
Added tests for entities in titles and links. Closes #723.
2013-02-15Parsing: uri, email: resolve entities.John MacFarlane1-2/+3
A markdown link `<http://g&ouml;ogle.com>` should be a link to http://göogle.com.
2013-02-02Optimized oneOfStringsCI.John MacFarlane1-3/+9
The call to toLower in ciMatch was very expensive (and very often used), because toLower from Data.Char calls a fully unicode aware function. This optimization avoids the call to toLower for the most common, ASCII cases. This dramatically reduces the speed penalty that comes from enabling the `autolink_bare_uris` extension. The penalty is still substantial (in one test, from 0.33s to 0.44s), but nowhere near what it used to be.
2013-01-28Fixed latex macro parsing.John MacFarlane1-4/+4
Now latex macro definitions are preserved when output is latex, and applied when it is another format, as originally intended. Partially addresses #730. \providecommand is still not supported. For this we need changes to texmath.
2013-01-25Parsing: More improvements of anyLine parser.John MacFarlane1-6/+8
2013-01-25More anyLine tweaks: Use incSourceLine.John MacFarlane1-1/+1
2013-01-25anyLine: Set position properly.John MacFarlane1-0/+3
2013-01-25Parsing: Much faster new version of anyLine.John MacFarlane1-1/+8
Not only faster but uses less memory.
2013-01-20Fixed bug in uri parser.John MacFarlane1-1/+1
The bug prevented an autolink at the end of a string (e.g. at the end of a line block line) from counting as a link. Closes #711.
2013-01-15Changed Ext_autolink_urls -> Ext_autolink_bare_uris.John MacFarlane1-2/+5
Added tests.
2013-01-15Case-insensitive parsing of URI schemes.John MacFarlane1-1/+1
2013-01-15Parsing: Improve oneOfStrings, export oneOfStringsCI.John MacFarlane1-7/+20
oneOfStrings will now take the longest match it can in a list of strings, so if 'foo' and 'foobar' are both included, 'foobar' will match even if 'foo' is first in the list.
2013-01-15Revised URI parser.John MacFarlane1-27/+50
* It no longer uses Network.URIs URI parser, which is too restrictive (not allowing unicode URIs unless encoded). * It allows many more schemes. * It better handles punctuation so as to avoid capturing trailing punctuation in bare URLs.
2013-01-14Parsing: Fixed uri -- escape unicode URLs.John MacFarlane1-2/+2
Otherwise Network.URI.parseURI fails on e.g. Chinese URLs. Changed an incorrect test in markdown-reader-more.
2013-01-14Parsing: Simplified and improved singleQuoteStart.John MacFarlane1-8/+2
This makes 's', 'l', etc. parse properly. Formerly we had some English-centric heuristics, but they are no longer needed now that we keep track of the last 'Str' position in state. Closes #698.
2013-01-13Moved lineBlockLines to Parsing.John MacFarlane1-0/+18
This will be used by both RST and markdown readers.
2013-01-09More improvements in emailAddress parser.John MacFarlane1-23/+17