aboutsummaryrefslogtreecommitdiff
path: root/src/Text/Pandoc/Readers
AgeCommit message (Collapse)AuthorFilesLines
2019-11-13Fix regression introduced by last commit.John MacFarlane1-1/+2
2019-11-13Markdown reader: don't parse footnote body unless extension enabled.John MacFarlane1-18/+20
2019-11-12Switch to new pandoc-types and use Text instead of String [API change].despresc47-2307/+2427
PR #5884. + Use pandoc-types 1.20 and texmath 0.12. + Text is now used instead of String, with a few exceptions. + In the MediaBag module, some of the types using Strings were switched to use FilePath instead (not Text). + In the Parsing module, new parsers `manyChar`, `many1Char`, `manyTillChar`, `many1TillChar`, `many1Till`, `manyUntil`, `mantyUntilChar` have been added: these are like their unsuffixed counterparts but pack some or all of their output. + `glob` in Text.Pandoc.Class still takes String since it seems to be intended as an interface to Glob, which uses strings. It seems to be used only once in the package, in the EPUB writer, so that is not hard to change.
2019-11-11Markdown reader: fix small super/subscript issue.John MacFarlane1-2/+6
Superscripts and subscripts cannot contain spaces, but newlines were previously allowed (unintentionally). This led to bad interactions in some cases with footnotes. E.g. ``` foo^[note] bar^[note] ``` With this change newlines are also not allowed inside super/subscripts. Closes #5878.
2019-11-11Change the implementation of `htmlSpanLikeElements` and implement `<dfn>` ↵Florian Beeres1-4/+11
(#5882) * Add HTML Reader support for `<dfn>`, parsing this as a Span with class `dfn`. * Change `htmlSpanLikeElements` implementation to retain classes, attributes and inline content.
2019-11-07DocBook reader: Fix bug with entities in mathphrase element.John MacFarlane1-4/+2
Closes #5885.
2019-11-07Change merge behavior for metadata.John MacFarlane1-1/+3
Previously, if a document contained two YAML metadata blocks that set the same field, the conflict would be resolved in favor of the first. Now it is resolved in favor of the second (due to a change in pandoc-types). This makes the behavior more uniform with other things in pandoc (such as reference links and `--metadata-file`).
2019-11-04Removed an unnecessary unpack.John MacFarlane1-1/+1
2019-11-04HTML Reader/Writer - Add support for <var> and <samp> (#5861)Amogh Rathore1-5/+7
Closes #5799
2019-11-03Docx reader: Only use LTR when it is overriding BiDi settingJesse Rosenthal3-2/+14
The left-to-right direction setting in docx is used in the spec only for overriding an explicit right-to-left setting. We only process it when it happens in a paragraph set with BiDi. This is especially important for docs exported from Google Docs, which explicitly (and unnecessarily) set "rtl=0" for every paragraph. Closes: #5723
2019-11-03Docx reader: fix list number resumption for sublists. Closes #4324.John MacFarlane1-1/+8
The first list item of a sublist should not resume numbering from the number of the last sublist item of the same level, if that sublist was a sublist of a different list item. That is, we should not get: ``` 1. one 1. sub one 2. sub two 2. two 3. sub one ```
2019-11-02RST reader: avoid spurious warning...John MacFarlane1-1/+1
when resolving links to internal anchors ending with `_`. Closes #5763.
2019-11-02LaTeX reader: Fixed dollar-math parsing...John MacFarlane1-9/+9
...to ensure that space is left between a control seq and a following word that would otherwise change its meaning. Closes #5836.
2019-11-02LaTeX untokenize: Ensure space between control sequence and following letter.John MacFarlane2-2/+15
Closes #5836.
2019-11-02LaTeX reader: Don't omit macro definitions defined in the preamble.John MacFarlane1-6/+7
These were formerly omitted (though they still affected macro resolution if `latex_macros` was set). Now they are included in the document.
2019-11-02LaTeX reader: parse macro defs as raw latex...John MacFarlane1-8/+13
when `latex_macros` is disabled. (When `latex_macros` is enabled, we omit them, since pandoc is applying the macros itself.) Previously, it was documented that the macro definitions got passed through as raw latex regardless of whether `latex_macros` was set -- but in fact they never got passed through.
2019-11-02LaTeX reader: fixed a hang/memory leak in certain circumstances.John MacFarlane1-3/+3
We were using `grouped blocks` instead of `grouped block`. This caused the reader to hang in an infinite loop (with a memory leak) on e.g. `\parbox{1em}{#1}`. Closes #5845.
2019-10-30docbook reader: fix nesting of chapters and sections (#5864)Florian Klink1-1/+1
* Set dbBook to true when traversing a chapter too. Currently, a `<title/>` in a chapter and in a `<section/>` below that chapter have the same level if they're not inside a `<book/>`. This can happen in a multi-file book project. Also see the example at https://tdg.docbook.org/tdg/4.5/chapter.html Co-authored-by: Félix Baylac-Jacqué <felix@alternativebit.fr> * Add docbook-chapter test This tests nested `<section/>` and makes sure `<title/>` in the first `<section/>` below `<chapter/>` is one level deeper than the `<chapter/>`'s `<title/>`, also when not inside a `<book/>`. Co-authored-by: Félix Baylac-Jacqué <felix@alternativebit.fr>
2019-10-27Org reader: fix parsing of empty comment linesAlbert Krewinkel1-1/+3
Comment lines in Org-mode can be completely empty; both of these line should produce no output: # a comment # The reader used to produce a wrong result for the latter, but ignores that line as well now. Fixes: #5856
2019-10-24HTML reader/writer: Better handling of <q> with cite attribute (#5837)Ole Martin Ruud1-23/+34
* HTML reader: Handle cite attribute for quotes. If a `<q>` tag has a `cite` attribute, we interpret it as a Quoted element with an inner Span. Closes #5798 * Refactor url canonicalization into a helper function * Modify HTML writer to handle quote with cite. [0]: https://developer.mozilla.org/en-US/docs/Web/HTML/Element/q
2019-10-23T.P.Readers.LaTeX.Parsing: add `[Tok]` parameter to rawLaTeXParser.John MacFarlane2-10/+16
This allows us to avoid retokenizing multiple times in e.g. rawLaTeXBlock. (Unexported module, so not an API change.)
2019-10-23Add Reader support for HTML <samp> element (#5843)Amogh Rathore1-0/+9
The `<samp>` element is parsed as a Span with class `sample`. Closes #5792.
2019-10-15Add support for reading and writing <kbd> elementsDaniele D'Orazio1-1/+9
* Text.Pandoc.Shared: export `htmlSpanLikeElements` [API change] This commit also introduces a mapping of HTML span like elements that are internally represented as a Span with a single class, but that are converted back to the original element by the html writer. As of now, only the kbd element is handled this way. Ideally these elements should be handled as plain AST values, but since that would be a breaking change with a large impact, we revert to this stop-gap solution. Fixes https://github.com/jgm/pandoc/issues/5796.
2019-10-15Muse reader: do not allow closing asterisks to be followed by "*"Alexander Krotov1-2/+7
2019-10-15Muse reader: do not split series of asterisks into symbols and emphasisAlexander Krotov1-0/+7
Fixes #5821
2019-10-15Muse reader: do not terminate emphasis on "*" not followed by spaceAlexander Krotov1-2/+1
2019-10-04hlint FB2 readerAlexander Krotov1-1/+1
2019-10-04Fix all hlint warnings in Muse readerAlexander Krotov1-2/+2
2019-10-03Minor ghc 8.8 fixups.John MacFarlane1-2/+6
2019-09-29RST reader: don't strip final underscore from absolute URI.John MacFarlane1-3/+7
Partially addresses #5763.
2019-09-28Use throwError instead of fail when appropriate.John MacFarlane1-2/+3
2019-09-28Use Prelude.fail to avoid ambiguity with fail from GHC.Base.John MacFarlane9-20/+20
2019-09-24LaTeX reader: Add 'tikzcd' to list of special environments.Eigil Rischel1-0/+1
This allows it to be processed by filters, in the same way that one can do for 'tikzpicture'
2019-09-22RST reader: Fixed parsing of indented blocks.John MacFarlane1-6/+9
We were requiring consistent indentation, but this isn't required by RST, as long as each nonblank line of the block has *some* indentation. Closes #5753.
2019-09-22[Docx Writer] Re-use Readers.Docx.Parse for StyleMap (#5766)Nikolay Yakimov3-380/+307
* [Docx Parser] Move style-parsing-specific code to a new module * [Docx Writer] Re-use Readers.Docx.Parse.Styles for StyleMap * [Docx Writer] Move Readers.Docx.StyleMap to Writers.Docx.StyleMap It's never used outside of writer code, so it makes more sense to scope it under writers really.
2019-09-22Use HsYAML-0.2.0.0John MacFarlane1-11/+12
Most of this is due to @vijayphoenix (#5704), but it needed some revisions to integrate with current master, and to use the released HsYAML. Closes #5704.
2019-09-21[Docx Reader] Use style names, not ids, for assigning semantic meaningNikolay Yakimov3-183/+287
Motivating issues: #5523, #5052, #5074 Style name comparisons are case-insensitive, since those are case-insensitive in Word. w:styleId will be used as style name if w:name is missing (this should only happen for malformed docx and is kept as a fallback to avoid failing altogether on malformed documents) Block quote detection code moved from Docx.Parser to Readers.Docx Code styles, i.e. "Source Code" and "Verbatim Char" now honor style inheritance Docx Reader now honours "Compact" style (used in Pandoc-generated docx). The side-effect is that "Compact" style no longer shows up in docx+styles output. Styles inherited from "Compact" will still show up. Removed obsolete list-item style from divsToKeep. That didn't really do anything for a while now. Add newtypes to differentiate between style names, ids, and different style types (that is, paragraph and character styles) Since docx style names can have spaces in them, and pandoc-markdown classes can't, anywhere when style name is used as a class name, spaces are replaced with ASCII dashes `-`. Get rid of extraneous intermediate types, carrying styleId information. Instead, styleId is saved with other style data. Use RunStyle for inline style definitions only (lacking styleId and styleName); for Character Styles use CharStyle type (which is basicaly RunStyle with styleId and StyleName bolted onto it).
2019-09-21[Docx Reader] Code clean-upNikolay Yakimov2-63/+39
Reduce code duplication, remove redundant brackets, use newtype instead of data where appropriate
2019-09-19MediaWiki: skip optional {{table}} template.John MacFarlane1-0/+1
See https://en.wikipedia.org/wiki/Template:Table Closes #5757.
2019-09-09LaTeX reader: Fix parsing of optional arguments that contain braced text.John MacFarlane1-4/+3
Closes #5740.
2019-09-08Org reader: modify handling of example blocks. (#5717)Brian Leung2-14/+43
* Org reader: allow the `-i` switch to ignore leading spaces. * Org reader: handle awkwardly-aligned code blocks within lists. Code blocks in Org lists must have their #+BEGIN_ aligned in a reasonable way, but their other components can be positioned otherwise.
2019-09-05Roff reader: Better support for 'while'.John MacFarlane1-0/+3
2019-09-05Roff reader: improve handling of groups.John MacFarlane1-4/+2
2019-09-04Roff reader: Fix problem parsing comments before macro.John MacFarlane1-2/+0
2019-09-04Roff reader: more improvements in parsing conditionals.John MacFarlane1-3/+4
2019-09-04Roff readers: better parsing of groups.John MacFarlane1-9/+5
We now allow groups where the closing `\\}` isn't at the beginning of a line. Closes #5410.
2019-09-02LaTeX reader: don't try to parse includes if raw_tex is set.John MacFarlane1-5/+13
When the `raw_tex` extension is set, we just carry through `\usepackage`, `\input`, etc. verbatim as raw LaTeX. Closes #5673.
2019-09-02LaTeX reader: properly handle optional arguments for macros.John MacFarlane2-2/+2
Closes #5682.
2019-08-27LaTeX reader: fix `\\` in `\parbox` inside a table cell.John MacFarlane1-3/+18
Closes #5711.
2019-08-27Markdown reader: Headers: don't parse content over newline boundary.John MacFarlane1-4/+15
Closes #5714.