aboutsummaryrefslogtreecommitdiff
path: root/src/Text/Pandoc/Parsing.hs
AgeCommit message (Collapse)AuthorFilesLines
2021-07-29parseFromString: preserve at least the source directory.John MacFarlane1-1/+1
Previously we just set the source name to "chunk" when parsing from strings, to avoid misleading source positions. This had the side effect that `rebase_relative_paths` would break inside sections that were parsed as strings. So, now we use "ORIGINAL_SOURCE_PATH_chunk" instead of just "chunk". Closes #7464.
2021-06-22Fix unneeded importJohn MacFarlane1-1/+1
2021-06-21Improve emailAddress in Text.Pandoc.Parsing.John MacFarlane1-4/+3
Previously the parser would accept characters in domains that are illegal in domains, and this sometimes caused it to gobble bits of the following text. Closes #7398. Note that this change, by itself, caused some txt2tag reader tests to fail. txt2tags allows bare email addresses with a following form query. So, in addition to the change to emailAddress, we modify the txt2tags parser so it can still handle these cases.
2021-05-13Implement curly-brace syntax for Markdown citation keys.John MacFarlane1-5/+13
The change provides a way to use citation keys that contain special characters not usable with the standard citation key syntax. Example: `@{foo_bar{x}'}` for the key `foo_bar{x}`. Closes #6026. The change requires adding a new parameter to the `citeKey` parser from Text.Pandoc.Parsing [API change]. Markdown reader: recognize @{..} syntax for citatinos. Markdown writer: use @{..} syntax for citations when needed. Update manual with curly-brace syntax for citations. Closes #6026.
2021-05-09T.P.Parsing: improve include file functions.John MacFarlane1-30/+31
Remove old `insertIncludedFileF`. [API change] Give `insertIncludedFile` a more general type, allowing it to be used where `insertIncludedFileF` was.
2021-05-09Change reader types, allowing better tracking of source positions.John MacFarlane1-187/+287
Previously, when multiple file arguments were provided, pandoc simply concatenated them and passed the contents to the readers, which took a Text argument. As a result, the readers had no way of knowing which file was the source of any particular bit of text. This meant that we couldn't report accurate source positions on errors or include accurate source positions as attributes in the AST. More seriously, it meant that we couldn't resolve resource paths relative to the files containing them (see e.g. #5501, #6632, #6384, #3752). Add Text.Pandoc.Sources (exported module), with a `Sources` type and a `ToSources` class. A `Sources` wraps a list of `(SourcePos, Text)` pairs. [API change] A parsec `Stream` instance is provided for `Sources`. The module also exports versions of parsec's `satisfy` and other Char parsers that track source positions accurately from a `Sources` stream (or any instance of the new `UpdateSourcePos` class). Text.Pandoc.Parsing now exports these modified Char parsers instead of the ones parsec provides. Modified parsers to use a `Sources` as stream [API change]. The readers that previously took a `Text` argument have been modified to take any instance of `ToSources`. So, they may still be used with a `Text`, but they can also be used with a `Sources` object. In Text.Pandoc.Error, modified the constructor PandocParsecError to take a `Sources` rather than a `Text` as first argument, so parse error locations can be accurately reported. T.P.Error: showPos, do not print "-" as source name.
2021-04-29Further improvements in smart quotes.John MacFarlane1-2/+2
Improves heuristic for detection of an "open double quote." Closes #2103.
2021-04-28Smarter smart quotes.John MacFarlane1-16/+29
Treat a leading " with no closing " as a left curly quote. This supports the practice, in fiction, of continuing paragraphs quoting the same speaker without an end quote. It also helps with quotes that break over lines in line blocks. Closes #7216.
2021-03-21Simplify T.P.Asciify and export toAsciiText [API change].John MacFarlane1-3/+3
Instead of encoding a giant (and incomplete) map, we now just use unicode-transforms to normalize the text to a canonical decomposition, and manipulate the result. The new `toAsciiText` is equivalent to the old `T.pack . mapMaybe toAsciiChar . T.unpack` but should be faster.
2021-03-20Support `yaml_metadata_block` extension form commonmark, gfm.John MacFarlane1-1/+1
This is a bit more limited than with markdown, as documented in the manual: - The YAML block must be the first thing in the input. - The leaf notes are parsed in isolation from the rest of the document. So, for example, you can't use reference links if the references are defined later in the document. Closes #6537.
2021-03-20Text.Pandoc.Parsing: remove F type synonym.John MacFarlane1-5/+2
Muse and Org were defining their own F anyway, with their own state. We therefore move this definition to the Markdown reader.
2021-03-19T.P.Shared: Remove ToString, ToText typeclasses [API change].John MacFarlane1-4/+4
T.P.Parsing: revise type of readWithM so that it takes a Text rather than a polymorphic ToText value. These typeclasses were there to ease the transition from String to Text. They are no longer needed, and they may clash with more useful versions under the same name. This will require a bump to 2.13.
2021-02-22Text.Pandoc.UTF8: change IO functions to return Text, not String.John MacFarlane1-1/+1
[API change] This affects `readFile`, `getContents`, `writeFileWith`, `writeFile`, `putStrWith`, `putStr`, `putStrLnWith`, `putStrLn`. `hPutStrWith`, `hPutStr`, `hPutStrLnWith`, `hPutStrLn`, `hGetContents`. This avoids the need to uselessly create a linked list of characters when emiting output.
2021-01-08Update copyright notices for 2021 (#7012)Albert Krewinkel1-1/+1
2021-01-07T.P.Parsing: modify gridTableWith' for headerless tables.John MacFarlane1-11/+11
If the table lacks a header, the header row should be an empty list. Previously we got a list of empty cells, which caused an empty header to be emitted instead of no header. In LaTeX/PDF output that meant we got a double top line with space between. @tarleb @despres - please let me know if this is problematic for some reason I'm not grasping.
2020-12-07Parsing: Small code improvements.John MacFarlane1-3/+4
2020-12-07Parsing: More minor performance improvements.John MacFarlane1-10/+13
2020-12-07Small efficiency improvement in uri parserJohn MacFarlane1-1/+14
2020-12-07Parsing: in nonspaceChar use satisfy instead of oneOf.John MacFarlane1-1/+7
For efficiency.
2020-11-17Markdown reader: fix regression with example list references.John MacFarlane1-1/+5
This affects example list references followed by dashes. Introduced by commit b8d17f7. Closes #6855.
2020-10-08Export ParseError from T.P.Parsing.John MacFarlane1-1/+2
2020-10-07Raise informative errors when YAML metadata parsing fails.John MacFarlane1-0/+2
Closes #6730. Previously the command would succeed, returning empty metadata, with no errors or warnings. API changes: - Remove now unused CouldNotParseYamlMetadata constructor for LogMessage (T.P.Logging). - Add 'Maybe FilePath' parameter to yamlToMeta in T.P.Readers.Markdown.
2020-09-21Markdown reader: Set citationNoteNum accurately in citations.John MacFarlane1-2/+2
This also changes stateLastNoteNumber -> stateNoteNumber.
2020-09-21Parsing: add stateInNote and stateLastNoteNumber to ParserState.John MacFarlane1-0/+4
These will be used to populate note numbers for citations.
2020-09-13Fix hlint suggestions, update hlint.yaml (#6680)Christian Despres1-9/+7
* Fix hlint suggestions, update hlint.yaml Most suggestions were redundant brackets. Some required LambdaCase. The .hlint.yaml file had a small typo, and didn't ignore camelCase suggestions in certain modules.
2020-06-23Markdown reader: Don't require blank line after grid table.John MacFarlane1-2/+2
This fixes #6481, allowing grid tables to be enclosed in fenced divs with no intervening blank lines.
2020-04-15Use the new builders, modify readers to preserve empty headersdespresc1-3/+8
The Builder.simpleTable now only adds a row to the TableHead when the given header row is not null. This uncovered an inconsistency in the readers: some would unconditionally emit a header filled with empty cells, even if the header was not present. Now every reader has the conditional behaviour. Only the XWiki writer depended on the header row being always present; it now pads its head as necessary.
2020-04-15Adapt to the newest Table type, fix some previous adaptation issuesdespresc1-2/+2
- Writers.Native is now adapted to the new Table type. - Inline captions should now be conditionally wrapped in a Plain, not a Para block. - The toLegacyTable function now lives in Writers.Shared.
2020-04-15Implement the new Table typedespresc1-1/+5
2020-03-22Finer grained imports of Text.Pandoc.Class submodules (#6203)Albert Krewinkel1-1/+1
This should speed-up recompilation after changes in `Text.Pandoc.Class`, as the number of modules affected by a change will be smaller in general. It also offers faster insights into the parts of `T.P.Class` used within a module.
2020-03-15Use implicit Prelude (#6187)Albert Krewinkel1-2/+0
* Use implicit Prelude The previous behavior was introduced as a fix for #4464. It seems that this change alone did not fix the issue, and `stack ghci` and `cabal repl` only work with GHC 8.4.1 or newer, as no custom Prelude is loaded for these versions. Given this, it seems cleaner to revert to the implicit Prelude. * PandocMonad: remove outdated check for base version Only base versions 4.9 and later are supported, the check for `MIN_VERSION_base(4,8,0)` is therefore unnecessary. * Always use custom prelude Previously, the custom prelude was used only with older GHC versions, as a workaround for problems with ghci. The ghci problems are resolved by replacing package `base` with `base-noprelude`, allowing for consistent use of the custom prelude across all GHC versions.
2020-03-13Update copyright year (#6186)Albert Krewinkel1-1/+1
* Update copyright year * Copyright: add notes for Lua and Jira modules
2020-02-13Add highlight directive to the rST reader (#6140)Lucas Escot1-0/+2
2020-02-07Apply linter suggestions. Add fix_spacing to lint target in Makefile.John MacFarlane1-1/+1
2020-02-07Resolve HLint warningsAlbert Krewinkel1-4/+1
All warnings are either fixed or, if more appropriate, HLint is configured to ignore them. HLint suggestions remain. * Ignore "Use camelCase" warnings in Lua and legacy code * Fix or ignore remaining HLint warnings * Remove redundant brackets * Remove redundant `return`s * Remove redundant as-pattern * Fuse mapM_/map * Use `.` to shorten code * Remove redundant `fmap` * Remove unused LANGUAGE pragmas * Hoist `not` in Text.Pandoc.App * Use fewer imports for `Text.DocTemplates` * Remove redundant `do`s * Remove redundant `$`s * Jira reader: remove unnecessary parentheses
2019-11-14Parsing: Rename takeWhileP -> take1WhileP and clean it up.John MacFarlane1-9/+11
(It doesn't match the empty sequence.)
2019-11-12Switch to new pandoc-types and use Text instead of String [API change].despresc1-150/+226
PR #5884. + Use pandoc-types 1.20 and texmath 0.12. + Text is now used instead of String, with a few exceptions. + In the MediaBag module, some of the types using Strings were switched to use FilePath instead (not Text). + In the Parsing module, new parsers `manyChar`, `many1Char`, `manyTillChar`, `many1TillChar`, `many1Till`, `manyUntil`, `mantyUntilChar` have been added: these are like their unsuffixed counterparts but pack some or all of their output. + `glob` in Text.Pandoc.Class still takes String since it seems to be intended as an interface to Glob, which uses strings. It seems to be used only once in the package, in the EPUB writer, so that is not hard to change.
2019-11-11Fix typos (#5896)Brian Wignall1-1/+1
2019-09-28Use Prelude.fail to avoid ambiguity with fail from GHC.Base.John MacFarlane1-6/+6
2019-08-27Add stateAllowLineBreaks to ParserState. [API change]John MacFarlane1-0/+2
2019-08-26parseFromString': reset stateLastStrPos to Nothing before parse.John MacFarlane1-0/+1
2019-08-26Fix inline parsing in grid table cells.John MacFarlane1-14/+16
* T.P.Parsing: Change type of `setLastStrPos` so it takes a `Maybe SourcePos` rather than a `SourcePos`. [API change] * T.P.Parsing: Make `parseFromString'` and `gridTableWith` and `gridTableWith'` polymorphic in the parser state, constraining it with `HasLastStrPosition`. [API change] Closes #5708.
2019-07-02Fix redundant constraint warnings. (#5625)Pete Ryland1-8/+6
2019-03-01Remove license boilerplate.John MacFarlane1-18/+0
The haddock module header contains essentially the same information, so the boilerplate is redundant and just one more thing to get out of sync.
2019-02-04Add missing copyright notices and remove license boilerplate (#5112)Albert Krewinkel1-2/+2
Quite a few modules were missing copyright notices. This commit adds copyright notices everywhere via haddock module headers. The old license boilerplate comment is redundant with this and has been removed. Update copyright years to 2019. Closes #4592.
2018-12-31Remove unused HasHeaderMap (#5175)Alexander1-16/+1
It is updated by some readers, but never actually used.
2018-12-17Parsing: use safeRead instead of read.John MacFarlane1-1/+1
2018-11-11Text.Pandoc.Shared: add parameter to uniqueIdent, inlineListToIdentifier.John MacFarlane1-1/+1
The parameter is Extensions. This allows these functions to be sensitive to the settings of `Ext_gfm_auto_identifiers` and `Ext_ascii_identifiers`. This allows us to use `uniqueIdent` in the CommonMark reader, replacing some custom code. It also means that `gfm_auto_identifiers` can now be used in all formats. Semantically, `gfm_auto_identifiers` is now a modifier of `auto_identifiers`; for identifiers to be set, `auto_identifiers` must be turned on, and then the type of identifier produced depends on `gfm_auto_identifiers` and `ascii_identifiers` are set. Closes #5057.
2018-11-08Remove Functor and Applicative constraints where Monad already existsAlexander Krotov1-14/+7
2018-11-03Make readWithM accept Text input as well as String (API change)Alexander Krotov1-12/+6