aboutsummaryrefslogtreecommitdiff
path: root/src/Text/Pandoc/Readers/Txt2Tags.hs
AgeCommit message (Collapse)AuthorFilesLines
2021-06-21Improve emailAddress in Text.Pandoc.Parsing.John MacFarlane1-1/+21
Previously the parser would accept characters in domains that are illegal in domains, and this sometimes caused it to gobble bits of the following text. Closes #7398. Note that this change, by itself, caused some txt2tag reader tests to fail. txt2tags allows bare email addresses with a following form query. So, in addition to the change to emailAddress, we modify the txt2tags parser so it can still handle these cases.
2021-05-09Change reader types, allowing better tracking of source positions.John MacFarlane1-6/+6
Previously, when multiple file arguments were provided, pandoc simply concatenated them and passed the contents to the readers, which took a Text argument. As a result, the readers had no way of knowing which file was the source of any particular bit of text. This meant that we couldn't report accurate source positions on errors or include accurate source positions as attributes in the AST. More seriously, it meant that we couldn't resolve resource paths relative to the files containing them (see e.g. #5501, #6632, #6384, #3752). Add Text.Pandoc.Sources (exported module), with a `Sources` type and a `ToSources` class. A `Sources` wraps a list of `(SourcePos, Text)` pairs. [API change] A parsec `Stream` instance is provided for `Sources`. The module also exports versions of parsec's `satisfy` and other Char parsers that track source positions accurately from a `Sources` stream (or any instance of the new `UpdateSourcePos` class). Text.Pandoc.Parsing now exports these modified Char parsers instead of the ones parsec provides. Modified parsers to use a `Sources` as stream [API change]. The readers that previously took a `Text` argument have been modified to take any instance of `ToSources`. So, they may still be used with a `Text`, but they can also be used with a `Sources` object. In Text.Pandoc.Error, modified the constructor PandocParsecError to take a `Sources` rather than a `Text` as first argument, so parse error locations can be accurately reported. T.P.Error: showPos, do not print "-" as source name.
2021-03-19Hlint suggestion.John MacFarlane1-2/+3
2021-03-19Protect partial uses of maximum with NonEmpty.John MacFarlane1-9/+13
2021-03-18Remove another foldr1 partial function use.John MacFarlane1-5/+6
2020-12-30Hlint fixesJohn MacFarlane1-1/+1
2020-11-07Lint code in PRs and when committing to master (#6790)Albert Krewinkel1-1/+1
* Remove unused LANGUAGE pragmata * Apply HLint suggestions * Configure HLint to ignore some warnings * Lint code when committing to master
2020-09-13Fix hlint suggestions, update hlint.yaml (#6680)Christian Despres1-1/+1
* Fix hlint suggestions, update hlint.yaml Most suggestions were redundant brackets. Some required LambdaCase. The .hlint.yaml file had a small typo, and didn't ignore camelCase suggestions in certain modules.
2020-04-28Support new Underline element in readers and writers (#6277)Vaibhav Sagar1-3/+2
Deprecate `underlineSpan` in Shared in favor of `Text.Pandoc.Builder.underline`.
2020-04-15Use the new builders, modify readers to preserve empty headersdespresc1-2/+6
The Builder.simpleTable now only adds a row to the TableHead when the given header row is not null. This uncovered an inconsistency in the readers: some would unconditionally emit a header filled with empty cells, even if the header was not present. Now every reader has the conditional behaviour. Only the XWiki writer depended on the header row being always present; it now pads its head as necessary.
2020-04-15Adapt to the newest Table type, fix some previous adaptation issuesdespresc1-1/+1
- Writers.Native is now adapted to the new Table type. - Inline captions should now be conditionally wrapped in a Plain, not a Para block. - The toLegacyTable function now lives in Writers.Shared.
2020-04-15Implement the new Table typedespresc1-1/+1
2020-03-22Finer grained imports of Text.Pandoc.Class submodules (#6203)Albert Krewinkel1-2/+2
This should speed-up recompilation after changes in `Text.Pandoc.Class`, as the number of modules affected by a change will be smaller in general. It also offers faster insights into the parts of `T.P.Class` used within a module.
2020-03-15Use implicit Prelude (#6187)Albert Krewinkel1-2/+0
* Use implicit Prelude The previous behavior was introduced as a fix for #4464. It seems that this change alone did not fix the issue, and `stack ghci` and `cabal repl` only work with GHC 8.4.1 or newer, as no custom Prelude is loaded for these versions. Given this, it seems cleaner to revert to the implicit Prelude. * PandocMonad: remove outdated check for base version Only base versions 4.9 and later are supported, the check for `MIN_VERSION_base(4,8,0)` is therefore unnecessary. * Always use custom prelude Previously, the custom prelude was used only with older GHC versions, as a workaround for problems with ghci. The ghci problems are resolved by replacing package `base` with `base-noprelude`, allowing for consistent use of the custom prelude across all GHC versions.
2019-11-12Switch to new pandoc-types and use Text instead of String [API change].despresc1-66/+66
PR #5884. + Use pandoc-types 1.20 and texmath 0.12. + Text is now used instead of String, with a few exceptions. + In the MediaBag module, some of the types using Strings were switched to use FilePath instead (not Text). + In the Parsing module, new parsers `manyChar`, `many1Char`, `manyTillChar`, `many1TillChar`, `many1Till`, `manyUntil`, `mantyUntilChar` have been added: these are like their unsuffixed counterparts but pack some or all of their output. + `glob` in Text.Pandoc.Class still takes String since it seems to be intended as an interface to Glob, which uses strings. It seems to be used only once in the package, in the EPUB writer, so that is not hard to change.
2019-03-01Remove license boilerplate.John MacFarlane1-18/+0
The haddock module header contains essentially the same information, so the boilerplate is redundant and just one more thing to get out of sync.
2018-08-10Avoid non-exhaustive pattern match.John MacFarlane1-2/+4
2018-07-02Spellcheck commentsAlexander Krotov1-1/+1
2018-03-18Removed old-locale flag and Text.Pandoc.Compat.Time.John MacFarlane1-1/+1
This is no longer necessary since we no longer support ghc 7.8.
2018-03-18Use NoImplicitPrelude and explicitly import Prelude.John MacFarlane1-0/+2
This seems to be necessary if we are to use our custom Prelude with ghci. Closes #4464.
2018-03-16Monoid/Semiground cleanup relying on custom Prelude.John MacFarlane1-2/+1
2018-02-19Move manyUntil to Text.Pandoc.Parsing and use it in Txt2Tags readerAlexander Krotov1-2/+1
2018-01-19hlint code improvements.John MacFarlane1-2/+2
2017-11-10Txt2Tags reader: hlintAlexander Krotov1-27/+25
2017-11-02hlintAlexander Krotov1-1/+1
2017-10-27Automatic reformating by stylish-haskell.John MacFarlane1-10/+11
2017-10-27Consistent underline for Readers (#2270)hftf1-2/+2
* Added underlineSpan builder function. This can be easily updated if needed. The purpose is for Readers to transform underlines consistently. * Docx Reader: Use underlineSpan and update test * Org Reader: Use underlineSpan and add test * Textile Reader: Use underlineSpan and add test case * Txt2Tags Reader: Use underlineSpan and update test * HTML Reader: Use underlineSpan and add test case
2017-09-30Removed writerSourceURL, add source URL to common state.John MacFarlane1-8/+2
Removed `writerSourceURL` from `WriterOptions` (API change). Added `stSourceURL` to `CommonState`. It is set automatically by `setInputFiles`. Text.Pandoc.Class now exports `setInputFiles`, `setOutputFile`. The type of `getInputFiles` has changed; it now returns `[FilePath]` instead of `Maybe [FilePath]`. Functions in Class that formerly took the source URL as a parameter now have one fewer parameter (`fetchItem`, `downloadOrRead`, `setMediaResource`, `fillMediaBag`). Removed `WriterOptions` parameter from `makeSelfContained` in `SelfContained`.
2017-07-07Rewrote LaTeX reader with proper tokenization.John MacFarlane1-1/+1
This rewrite is primarily motivated by the need to get macros working properly. A side benefit is that the reader is significantly faster (27s -> 19s in one benchmark, and there is a lot of room for further optimization). We now tokenize the input text, then parse the token stream. Macros modify the token stream, so they should now be effective in any context, including math. Thus, we no longer need the clunky macro processing capacities of texmath. A custom state LaTeXState is used instead of ParserState. This, plus the tokenization, will require some rewriting of the exported functions rawLaTeXInline, inlineCommand, rawLaTeXBlock. * Added Text.Pandoc.Readers.LaTeX.Types (new exported module). Exports Macro, Tok, TokType, Line, Column. [API change] * Text.Pandoc.Parsing: adjusted type of `insertIncludedFile` so it can be used with token parser. * Removed old texmath macro stuff from Parsing. Use Macro from Text.Pandoc.Readers.LaTeX.Types instead. * Removed texmath macro material from Markdown reader. * Changed types for Text.Pandoc.Readers.LaTeX's rawLaTeXInline and rawLaTeXBlock. (Both now return a String, and they are polymorphic in state.) * Added orgMacros field to OrgState. [API change] * Removed readerApplyMacros from ReaderOptions. Now we just check the `latex_macros` reader extension. * Allow `\newcommand\foo{blah}` without braces. Fixes #1390. Fixes #2118. Fixes #3236. Fixes #3779. Fixes #934. Fixes #982.
2017-06-20Move CR filtering from tabFilter to the readers.John MacFarlane1-2/+4
The readers previously assumed that CRs had been filtered from the input. Now we strip the CRs in the readers themselves, before parsing. (The point of this is just to simplify the parsers.) Shared now exports a new function `crFilter`. [API change] And `tabFilter` no longer filters CRs.
2017-06-10Changed all readers to take Text instead of String.John MacFarlane1-3/+4
Readers: Renamed StringReader -> TextReader. Updated tests. API change.
2017-05-24Parsing: Provide parseFromString'.John MacFarlane1-3/+3
This is a verison of parseFromString specialied to ParserState, which resets stateLastStrPos at the end. This is almost always what we want. This fixes a bug where `_hi_` wasn't treated as emphasis in the following, because pandoc got confused about the position of the last word: - [o] _hi_ Closes #3690.
2017-05-23Shared: Provide custom isURI that rejects unknown schemes [isURI]Albert Krewinkel1-1/+0
We also export the set of known `schemes`. The new function replaces the function of the same name from `Network.URI`, as the latter did not check whether a scheme is well-known. E.g. MediaWiki wikis frequently feature pages with names like `User:John`. These links were interpreted as URIs, thus turning internal links into global links. This is prevented by also checking whether the scheme of a URI is frequently used (i.e. is IANA registered or an otherwise well-known scheme). Fixes: #2713 Update set of well-known URIs from IANA list All official IANA schemes (as of 2017-05-22) are included in the set of known schemes. The four non-official schemes doi, isbn, javascript, and pmid are kept.
2017-05-22Move indentWith to Text.Pandoc.Parsing (#3687)Alexander Krotov1-3/+0
2017-05-17Merge pull request #3676 from labdsf/space-charJohn MacFarlane1-1/+1
Txt2Tags parser: newline is not indentation
2017-05-17Move anyLineNewline to Parsing.hsAlexander Krotov1-3/+0
2017-05-17Txt2Tags parser: newline is not indentationAlexander Krotov1-1/+1
space parses '\n', while spaceChar parses only ' ' and '\t'
2017-03-04Stylish-haskell automatic formatting changes.John MacFarlane1-16/+16
2017-01-27Shared: rename compactify', compactify'DL -> compactify, compactifyDL.John MacFarlane1-4/+4
2017-01-25Removed `--normalize` option and normalization functions from Shared.John MacFarlane1-2/+6
* Removed normalize, normalizeInlines, normalizeBlocks from Text.Pandoc.Shared. These shouldn't now be necessary, since normalization is handled automatically by the Builder monoid instance. * Remove `--normalize` command-line option. * Don't use normalize in tests. * A few revisions to readers so they work well without normalize.
2017-01-25Readers: pass errors straight up to PandocMonad.Jesse Rosenthal1-2/+1
Since we've unified error types, we can just throw the same error at the toplevel.
2017-01-25Unify Errors.Jesse Rosenthal1-1/+2
2017-01-25Add Text2Tags to Text.PandocJesse Rosenthal1-3/+3
2017-01-25Working on readers.Jesse Rosenthal1-15/+30
2016-09-02Remove directory compatJesse Rosenthal1-1/+1
directory 1.1 depends on base 4.5 (ghc 7.4) which we are no longer supporting. So we don't have to use a compatibility layer for it.
2016-09-02Remove Compat.MonoidJesse Rosenthal1-1/+1
This was only necessary for GHC versions with base below 4.5 (i.e., ghc < 7.4).
2016-07-14Fixed compiler warnings.John MacFarlane1-1/+1
2015-12-12Modified readers to emit SoftBreak when appropriate.John MacFarlane1-1/+1
2015-11-09Restored Text.Pandoc.Compat.Monoid.John MacFarlane1-0/+1
Don't use custom prelude for latest ghc. This is a better approach to making 'stack ghci' and 'cabal repl' work. Instead of using NoImplicitPrelude, we only use the custom prelude for older ghc versions. The custom prelude presents a uniform API that matches the current base version's prelude. So, when developing (presumably with latest ghc), we don't use a custom prelude at all and hence have no trouble with ghci. The custom prelude no longer exports (<>): we now want to match the base 4.8 prelude behavior.
2015-11-09Revert "Use -XNoImplicitPrelude and 'import Prelude' explicitly."John MacFarlane1-1/+0
This reverts commit c423dbb5a34c2d1195020e0f0ca3aae883d0749b.