aboutsummaryrefslogtreecommitdiff
path: root/src/Text/Pandoc/Readers/LaTeX/Parsing.hs
AgeCommit message (Collapse)AuthorFilesLines
2021-05-09Change reader types, allowing better tracking of source positions.John MacFarlane1-3/+9
Previously, when multiple file arguments were provided, pandoc simply concatenated them and passed the contents to the readers, which took a Text argument. As a result, the readers had no way of knowing which file was the source of any particular bit of text. This meant that we couldn't report accurate source positions on errors or include accurate source positions as attributes in the AST. More seriously, it meant that we couldn't resolve resource paths relative to the files containing them (see e.g. #5501, #6632, #6384, #3752). Add Text.Pandoc.Sources (exported module), with a `Sources` type and a `ToSources` class. A `Sources` wraps a list of `(SourcePos, Text)` pairs. [API change] A parsec `Stream` instance is provided for `Sources`. The module also exports versions of parsec's `satisfy` and other Char parsers that track source positions accurately from a `Sources` stream (or any instance of the new `UpdateSourcePos` class). Text.Pandoc.Parsing now exports these modified Char parsers instead of the ones parsec provides. Modified parsers to use a `Sources` as stream [API change]. The readers that previously took a `Text` argument have been modified to take any instance of `ToSources`. So, they may still be used with a `Text`, but they can also be used with a `Sources` object. In Text.Pandoc.Error, modified the constructor PandocParsecError to take a `Sources` rather than a `Text` as first argument, so parse error locations can be accurately reported. T.P.Error: showPos, do not print "-" as source name.
2021-02-28T.P.Readers.LaTeX: Don't export tokenize, untokenize.John MacFarlane1-0/+9
[API change] These were only exported for testing, which seems the wrong thing to do. They don't belong in the public API and are not really usable as they are, without access to the Tok type which is not exported. Removed the tokenize/untokenize roundtrip test. We put a quickcheck property in the comments which may be used when this code is touched (if it is).
2021-02-28Factor out T.P.Readers.LaTeX.Citation.John MacFarlane1-0/+5
2021-02-27Factor out T.P.Readers.LaTeX.Table.John MacFarlane1-0/+33
2021-02-21LaTeX reader: further optimizations in satisfyTok.John MacFarlane1-5/+5
Benchmarks show 2/3 of the run time and 2/3 of the allocation of the Feb. 10 benchmarks.
2021-02-21LaTeX reader: removed sExpanded in state.John MacFarlane1-7/+2
This isn't actually needed and checking it doesn't change anything. Also remove an unnecessary `doMacros` before `satisfyTok`, which does it anyway.
2021-02-21LaTeX reader: further performance optimization.John MacFarlane1-23/+19
Avoid unnecessary 'doMacros'.
2021-02-20LaTeX reader: Another small improvement to macro handling.John MacFarlane1-4/+3
2021-02-20LaTeX reader: avoid macro resolution code if no macros defined.John MacFarlane1-16/+19
2021-02-20T.P.Readers.LaTeX.Parsing: improve braced'.John MacFarlane1-16/+13
Remove the parameter, have it parse the opening brace, and make it more efficient.
2021-02-12LaTeX reader improvements.John MacFarlane1-18/+66
* Rewrote `withRaw` so it doesn't rely on fragile assumptions about token positions (which break when macros are expanded). This requires the addition of `sEnableWithRaw` and `sRawTokens` in `LaTeXState`, and a new combinator `disablingWithRaw` to disable collecting of raw tokens in certain contexts. * Add `parseFromToks` to T.P.Readers.LaTeX.Parsing. * Fix parsing of single character tokens so it doesn't mess up the new raw token collecting. * These changes slightly increase allocations and have a small performance impact, but it's minor. Closes #7092.
2021-01-08Update copyright notices for 2021 (#7012)Albert Krewinkel1-1/+1
2021-01-04LaTeX reader: handle filecontents environment.John MacFarlane1-0/+2
Closes #7003.
2020-11-16Move getNextNumber from Readers.LaTeX to Readers.LaTeX.Parsing.John MacFarlane1-0/+26
2020-11-02LaTeX reader: fix bug parsing macro arguments.John MacFarlane1-1/+5
If `\cL` is defined as `\mathcal{L}`, and `\til` as `\tilde{#1}`, then `\til\cL` should expand to `\tilde{\mathcal{L}}`, but pandoc was expanding it to `\tilde\mathcal{L}`. This is fixed by parsing the arguments in "verbatim mode" when the macro expands arguments at the point of use. Closes #6796.
2020-10-08LaTeX reader: Fix parsing of "show name" in newtheorem.John MacFarlane1-1/+1
Previously we were just treating it as a string and ignoring accents and formatting. See #6734.
2020-09-13Fix hlint suggestions, update hlint.yaml (#6680)Christian Despres1-8/+6
* Fix hlint suggestions, update hlint.yaml Most suggestions were redundant brackets. Some required LambdaCase. The .hlint.yaml file had a small typo, and didn't ignore camelCase suggestions in certain modules.
2020-07-22LaTeX reader: SUpport ams `\theoremstyle`.John MacFarlane1-2/+10
2020-07-22LaTeX reader: support theorem environments and `\newtheorem`.John MacFarlane1-0/+2
Includes numbering and labels and refs. Note that numbering support is not complete; we don't reset numbers with sections for example.
2020-07-22LaTeX reader: support ams proof environment.John MacFarlane1-0/+10
2020-07-22Moved more from LaTeX reader to LaTeX.Parsing.John MacFarlane1-0/+67
2020-07-20Move some code from T.P.R.LaTeX. to T.P.R.LaTeX.Parsing.John MacFarlane1-0/+64
We need to reduce the size of the LaTeX reader to ease compilation on resource-limited systems. More can be done in this vein.
2020-03-22Finer grained imports of Text.Pandoc.Class submodules (#6203)Albert Krewinkel1-1/+1
This should speed-up recompilation after changes in `Text.Pandoc.Class`, as the number of modules affected by a change will be smaller in general. It also offers faster insights into the parts of `T.P.Class` used within a module.
2020-03-15Use implicit Prelude (#6187)Albert Krewinkel1-2/+0
* Use implicit Prelude The previous behavior was introduced as a fix for #4464. It seems that this change alone did not fix the issue, and `stack ghci` and `cabal repl` only work with GHC 8.4.1 or newer, as no custom Prelude is loaded for these versions. Given this, it seems cleaner to revert to the implicit Prelude. * PandocMonad: remove outdated check for base version Only base versions 4.9 and later are supported, the check for `MIN_VERSION_base(4,8,0)` is therefore unnecessary. * Always use custom prelude Previously, the custom prelude was used only with older GHC versions, as a workaround for problems with ghci. The ghci problems are resolved by replacing package `base` with `base-noprelude`, allowing for consistent use of the custom prelude across all GHC versions.
2020-03-13Update copyright year (#6186)Albert Krewinkel1-1/+1
* Update copyright year * Copyright: add notes for Lua and Jira modules
2020-02-12LaTeX reader: improve caption and label parsing.John MacFarlane1-2/+4
- Don't emit empty Span elements for labels. - Put tables with labels in a surrounding Div.
2020-02-11LaTeX reader: resolve `\ref` to table numbers.John MacFarlane1-0/+2
Closes #6137.
2020-02-07Resolve HLint warningsAlbert Krewinkel1-2/+2
All warnings are either fixed or, if more appropriate, HLint is configured to ignore them. HLint suggestions remain. * Ignore "Use camelCase" warnings in Lua and legacy code * Fix or ignore remaining HLint warnings * Remove redundant brackets * Remove redundant `return`s * Remove redundant as-pattern * Fuse mapM_/map * Use `.` to shorten code * Remove redundant `fmap` * Remove unused LANGUAGE pragmas * Hoist `not` in Text.Pandoc.App * Use fewer imports for `Text.DocTemplates` * Remove redundant `do`s * Remove redundant `$`s * Jira reader: remove unnecessary parentheses
2020-02-05LaTeX reader: skip comments in more places where this is needed.John MacFarlane1-2/+4
Closes #6114.
2019-11-12Switch to new pandoc-types and use Text instead of String [API change].despresc1-16/+16
PR #5884. + Use pandoc-types 1.20 and texmath 0.12. + Text is now used instead of String, with a few exceptions. + In the MediaBag module, some of the types using Strings were switched to use FilePath instead (not Text). + In the Parsing module, new parsers `manyChar`, `many1Char`, `manyTillChar`, `many1TillChar`, `many1Till`, `manyUntil`, `mantyUntilChar` have been added: these are like their unsuffixed counterparts but pack some or all of their output. + `glob` in Text.Pandoc.Class still takes String since it seems to be intended as an interface to Glob, which uses strings. It seems to be used only once in the package, in the EPUB writer, so that is not hard to change.
2019-11-02LaTeX untokenize: Ensure space between control sequence and following letter.John MacFarlane1-2/+14
Closes #5836.
2019-10-23T.P.Readers.LaTeX.Parsing: add `[Tok]` parameter to rawLaTeXParser.John MacFarlane1-4/+3
This allows us to avoid retokenizing multiple times in e.g. rawLaTeXBlock. (Unexported module, so not an API change.)
2019-09-28Use Prelude.fail to avoid ambiguity with fail from GHC.Base.John MacFarlane1-5/+5
2019-09-09LaTeX reader: Fix parsing of optional arguments that contain braced text.John MacFarlane1-4/+3
Closes #5740.
2019-09-02LaTeX reader: properly handle optional arguments for macros.John MacFarlane1-1/+1
Closes #5682.
2019-08-14LaTeX reader: improve withRaw so it can handle cases where...John MacFarlane1-2/+3
the token string is modified by a parser (e.g. accent when it only takes part of a Word token). Closes #5686. Still not ideal, because we get the whole `\t0BAR` and not just `\t0` as a raw latex inline command. But I'm willing to let this be an edge case, since you can easily work around this by inserting a space, braces, or raw attribute. The important thing is that we no longer drop the rest of the document after a raw latex inline command that gobbles only part of a Word token!
2019-07-19Markdown: Ensure that expanded latex macros end with space if original did.John MacFarlane1-1/+10
Closes #4442.
2019-07-16LaTeX reader: handle \looseness command values better.John MacFarlane1-5/+4
Closes #4439.
2019-03-01Remove license boilerplate.John MacFarlane1-18/+0
The haddock module header contains essentially the same information, so the boilerplate is redundant and just one more thing to get out of sync.
2019-02-04Add missing copyright notices and remove license boilerplate (#5112)Albert Krewinkel1-2/+2
Quite a few modules were missing copyright notices. This commit adds copyright notices everywhere via haddock module headers. The old license boilerplate comment is redundant with this and has been removed. Update copyright years to 2019. Closes #4592.
2019-01-31LaTeX reader: don't let `\egroup` match `{`.John MacFarlane1-3/+3
`braced` now actually requires nested braces. Otherwise some legitimate command and environment definitions can break (see test/command/tex-group.md).
2018-12-31Remove unused HasHeaderMap (#5175)Alexander1-6/+0
It is updated by some readers, but never actually used.
2018-11-19LaTeX reader: cleaned up handling of dimension arguments.John MacFarlane1-5/+11
Allow decimal points, preceding space. Also require text 1.1+.
2018-10-15LaTeX reader: withVerbatimMode now does nothing if already inJohn MacFarlane1-4/+8
verbatim mode. Previously nested uses wouldn't work properly.
2018-10-15LaTeX reader: simplified type on doMacros'.John MacFarlane1-11/+8
2018-10-15LaTeX reader: small efficiency improvement.John MacFarlane1-1/+2
2018-10-15LaTeX reader: tokenize before pulling tokens,John MacFarlane1-11/+14
rather than after. This has some performance penalty but is more reliable. Closes #4408.
2018-10-15LaTeX reader: improved parsing of `\def`, `\let`.John MacFarlane1-16/+23
We now correctly parse: ``` \def\bar{hello} \let\fooi\bar \def\fooii{\bar} \fooi +\fooii \def\bar{goodbye} \fooi +\fooii ```
2018-10-15LaTeX reader: Fix small regression in pattern argumnents...John MacFarlane1-1/+2
introduced in last commit.
2018-10-15More refactoring of LaTeX reader code.John MacFarlane1-33/+36