aboutsummaryrefslogtreecommitdiff
path: root/src/Text/Pandoc/Readers/Markdown.hs
AgeCommit message (Collapse)AuthorFilesLines
2021-10-27Switch back from HsYAML to yaml.John MacFarlane1-6/+4
Reasons: - Performance: HsYAML is around 20 times slower in parsing large YAML bibliographies (#6084). - An issue was submitted to HsYAML, but it hasn't gotten any attention. HsYAML seems borderline unmaintained; it hasn't had a commit in over a year. - Unfortunately this goes back on our attempts to free ourselves from C dependencies (#4535). But I don't see a better alternative until a better pure Haskell parser is available. Closes #6084. Notes: - We've removed the FromYAML instances for all types that had them, since this is a HsYAML-specific typeclass [API change]. (The yaml package just uses From/ToJSON.) - Unlike HsYAML (in the configuration we were using), yaml parses 'Y', 'N', 'Yes', 'No', 'On', 'Off' as boolean values. Users may need to quote these when they are meant to be interpreted as strings. Similarly, 'null' is parsed as a YAML null value (and will be treated as an empty string by pandoc rather than the string 'null'). Quoting it will force it to be interpreted as a string. - Some tests had to be adjusted accordingly. - Pandoc now behaves better when the YAML metadata contains escaping errors: instead of just falling back on treating the section as a table, it raises a YAML parsing error.
2021-10-22Use simpleFigure in Readers.Aner Lucero1-14/+13
2021-10-20Markdown reader: don't parse links or bracketed spans as citations.John MacFarlane1-2/+4
Previously pandoc would parse [link to (@a)](url) as a citation; similarly [(@a)]{#ident} This is undesirable. One should be able to use example references in citations, and even if `@a` is not defined as an example reference, `[@a](url)` should be a link containing an author-in-text citation rather than a normal citation followed by literal `(url)`. Closes #7632.
2021-10-13Fix markdown parsing bug for math in bracketed spans and links.John MacFarlane1-0/+1
This affects math with unbalanced brackets (e.g. `$(0,1]$`) inside links, images, bracketed spans. Closes #7623.
2021-09-17Fix linter warning.John MacFarlane1-4/+3
2021-09-16Fix code blocks using `--preserve-tabs`.John MacFarlane1-1/+7
Previously they did not behave as the equivalent input with spaces would. Closes #7573.
2021-08-23Markdown reader: fix interaction of --strip-comments and listJohn MacFarlane1-1/+1
parsing. Use of `--strip-comments` was causing tight lists to be rendered as loose (as if the comment were a blank line). Closes #7521.
2021-08-16Fix bug in last commit due to removal of take1WhileP.John MacFarlane1-2/+2
2021-08-15Multimarkdown sub- and superscripts (#5512) (#7188)OCzarnecki1-8/+16
Added an extension `short_subsuperscripts` which modifies the behavior of `subscript` and `superscript`, allowing subscripts or superscripts containing only alphanumerics to end with a space character (eg. `x^2 = 4` or `H~2 is combustible`). This improves support for multimarkdown. Closes #5512. Add `Ext_short_subsuperscripts` constructor to `Extension` [API change]. This is enabled by default for `markdown_mmd`.
2021-07-06Markdown reader: don't try to read contents in self-closing HTML tag.John MacFarlane1-1/+4
Previously we had problems parsing raw HTML with self-closing tags like `<col/>`. The problem was that pandoc would look for a closing tag to close the markdown contents, but the closing tag had, in effect, already been parsed by `htmlTag`. This fixes the issue described in <https://groups.google.com/d/msgid/pandoc-discuss/297bc662-7841-4423-bcbb-534e99bbba09n%40googlegroups.com>.
2021-06-01Markdown reader: fix pipe table regression in 2.11.4.John MacFarlane1-1/+1
Previously pipe tables with empty headers (that is, a header line with all empty cells) would be rendered as headerless tables. This broke in 2.11.4. The fix here is to produce an AST with an empty table head when a pipe table has all empty header cells. Closes #7343.
2021-05-29Markdown reader: in rebasePaths, check for both Windows and PosixJohn MacFarlane1-4/+5
absolute paths. Previously Windows pandoc was treating `/foo/bar.jpg` as non-absolute.
2021-05-29In rebasePath, check for absolute paths two ways.John MacFarlane1-1/+4
isAbsolute from FilePath doesn't return True on Windows for paths beginning with `/`, so we check that separately.
2021-05-27rebase_relative_paths: leave empty paths unchanged.John MacFarlane1-1/+1
2021-05-27rebase_relative_paths extension: don't change fragment paths.John MacFarlane1-1/+2
We don't want a pure fragment path to be rewritten, since these are used for cross-referencing.
2021-05-27Modify rebase_reference_links treatment of reference links/images.John MacFarlane1-5/+4
The directory is based on the file containing the link reference, not the file containing the link, if these differ.
2021-05-27Add `rebase_relative_paths` extension.John MacFarlane1-7/+29
- Add manual entry for (non-default) extension `rebase_relative_paths`. - Add constructor `Ext_rebase_relative_paths` to `Extensions` in Text.Pandoc.Extensions [API change]. When enabled, this extension rewrites relative image and link paths by prepending the (relative) directory of the containing file. - Make Markdown reader sensitive to the new extension. - Add tests for #3752. Closes #3752. NB. currently the extension applies to markdown and associated readers but not commonmark/gfm.
2021-05-13Implement curly-brace syntax for Markdown citation keys.John MacFarlane1-3/+3
The change provides a way to use citation keys that contain special characters not usable with the standard citation key syntax. Example: `@{foo_bar{x}'}` for the key `foo_bar{x}`. Closes #6026. The change requires adding a new parameter to the `citeKey` parser from Text.Pandoc.Parsing [API change]. Markdown reader: recognize @{..} syntax for citatinos. Markdown writer: use @{..} syntax for citations when needed. Update manual with curly-brace syntax for citations. Closes #6026.
2021-05-12Fix source position reporting for YAML bibliographies.John MacFarlane1-2/+0
Closes #7273.
2021-05-09Change reader types, allowing better tracking of source positions.John MacFarlane1-22/+27
Previously, when multiple file arguments were provided, pandoc simply concatenated them and passed the contents to the readers, which took a Text argument. As a result, the readers had no way of knowing which file was the source of any particular bit of text. This meant that we couldn't report accurate source positions on errors or include accurate source positions as attributes in the AST. More seriously, it meant that we couldn't resolve resource paths relative to the files containing them (see e.g. #5501, #6632, #6384, #3752). Add Text.Pandoc.Sources (exported module), with a `Sources` type and a `ToSources` class. A `Sources` wraps a list of `(SourcePos, Text)` pairs. [API change] A parsec `Stream` instance is provided for `Sources`. The module also exports versions of parsec's `satisfy` and other Char parsers that track source positions accurately from a `Sources` stream (or any instance of the new `UpdateSourcePos` class). Text.Pandoc.Parsing now exports these modified Char parsers instead of the ones parsec provides. Modified parsers to use a `Sources` as stream [API change]. The readers that previously took a `Text` argument have been modified to take any instance of `ToSources`. So, they may still be used with a `Text`, but they can also be used with a `Sources` object. In Text.Pandoc.Error, modified the constructor PandocParsecError to take a `Sources` rather than a `Text` as first argument, so parse error locations can be accurately reported. T.P.Error: showPos, do not print "-" as source name.
2021-04-28Smarter smart quotes.John MacFarlane1-8/+10
Treat a leading " with no closing " as a left curly quote. This supports the practice, in fiction, of continuing paragraphs quoting the same speaker without an end quote. It also helps with quotes that break over lines in line blocks. Closes #7216.
2021-04-18Use MetaInlines not MetaBlocks for multimarkdown metadata fields.John MacFarlane1-1/+1
This gives better results in converting to e.g. pandoc markdown. Ref: <https://groups.google.com/d/msgid/pandoc-discuss/9728d1f4-040e-4392-aa04-148f648a8dfdn%40googlegroups.com>
2021-03-20Move yamlMetaBlock from Markdown reader to T.P.Readers.Metadata.John MacFarlane1-22/+2
2021-03-20Markdown reader: export `yamlMetaBlock`.John MacFarlane1-17/+23
[API change] This will allow us to parse YAML metadata blocks in other readers, potentially.
2021-03-20Text.Pandoc.Parsing: remove F type synonym.John MacFarlane1-0/+2
Muse and Org were defining their own F anyway, with their own state. We therefore move this definition to the Markdown reader.
2021-03-19Protect partial uses of maximum with NonEmpty.John MacFarlane1-1/+2
2021-03-17Fix regression with `tex_math_backslash` in Markdown reader.John MacFarlane1-1/+1
Added regression test. Closes #7155.
2021-03-15Use foldl' instead of foldl everywhere.John MacFarlane1-3/+3
2021-03-04Revert "Revert "Relax `--abbreviations` rules so that a period isn't required.John MacFarlane1-1/+1
This reverts commit 916ce4d51121e0529b938fda71f37e947882abe5. I was confused in thinking it wouldn't work.
2021-03-04Revert "Relax `--abbreviations` rules so that a period isn't required."John MacFarlane1-1/+1
This reverts commit e461b7dd45f717f3317216c7d3207a1d24bf1c85. Ill-advised change. This doesn't work because we parse strings in chunks.
2021-03-04Relax `--abbreviations` rules so that a period isn't required.John MacFarlane1-1/+1
Partially addresses #7124.
2021-02-28Fix bug in last commit.John MacFarlane1-1/+1
2021-02-28Markdown reader efficiency improvements.John MacFarlane1-182/+208
Benchmarks show that these make the reader 13-17% faster, depending on extensions.
2021-02-06Markdown reader: improved handling of mmd link attributes in references.John MacFarlane1-0/+2
Previously they only worked for links that had titles. Closes #7080.
2021-01-16Revert "Markdown reader: support GitHub wiki's internal links (#2923) (#6458)"John MacFarlane1-25/+0
This reverts commit 6efd3460a776620fdb93812daa4f6831e6c332ce. Since this extension is designed to be used with GitHub markdown (gfm), we need to implement the parser as a commonmark extension (commonmark-extensions), rather than in pandoc's markdown reader. When that is done, we can add it here.
2021-01-16Markdown reader: support GitHub wiki's internal links (#2923) (#6458)Gautier DI FOLCO1-0/+25
Canges overview: * Add a `Ext_markdown_github_wikilink` constructor to `Extension` [API change]. * Add the parser `githubWikiLink` in `Text.Pandoc.Readers.Markdown` * Add tests.
2021-01-08Update copyright notices for 2021 (#7012)Albert Krewinkel1-1/+1
2020-11-17Markdown reader: fix regression with example list references.John MacFarlane1-1/+5
This affects example list references followed by dashes. Introduced by commit b8d17f7. Closes #6855.
2020-11-15Markdown reader: fix detection of locators following in-text citations.John MacFarlane1-27/+30
Prevously, if we had `@foo [p. 33; @bar]`, the `p. 33` would be incorrectly parsed as a prefix of `@bar` rather than a suffix of `@foo`.
2020-11-14Markdown reader: don't increment stateNoteNumber for example refs.John MacFarlane1-0/+12
Background: syntactically, references to example list items can't be distinguished from citations; we only know which they are after we've parsed the whole document (and this is resolved in the `runF` stage). This means that pandoc's calculation of `citationNoteNum` can sometimes be wrong when there are example list references. This commit partially addresses #6836, but only for the case where the example list references refer to list items defined previously in the document.
2020-11-07Lint code in PRs and when committing to master (#6790)Albert Krewinkel1-1/+1
* Remove unused LANGUAGE pragmata * Apply HLint suggestions * Configure HLint to ignore some warnings * Lint code when committing to master
2020-10-07Raise informative errors when YAML metadata parsing fails.John MacFarlane1-2/+14
Closes #6730. Previously the command would succeed, returning empty metadata, with no errors or warnings. API changes: - Remove now unused CouldNotParseYamlMetadata constructor for LogMessage (T.P.Logging). - Add 'Maybe FilePath' parameter to yamlToMeta in T.P.Readers.Markdown.
2020-10-05Fixed regresison in last commit.John MacFarlane1-1/+1
Parsing of YAML bibliographies was broken; this fixes it.
2020-10-05Add yamlToRefs, yamlBsToRefs.John MacFarlane1-2/+25
T.P.Readers.Markdown now exports yamlToRefs. [API change] T.P.Readers.Metadata exports yamlBsToRefs. [API change] These allow specifying an id filter so we parse only references that are used in the document. Improves timing with a 3M yaml references file from 36s to 17s.
2020-09-21Add built-in citation support using new citeproc library.John MacFarlane1-0/+1
This deprecates the use of the external pandoc-citeproc filter; citation processing is now built in to pandoc. * Add dependency on citeproc library. * Add Text.Pandoc.Citeproc module (and some associated unexported modules under Text.Pandoc.Citeproc). Exports `processCitations`. [API change] * Add data files needed for Text.Pandoc.Citeproc: default.csl in the data directory, and a citeproc directory that is just used at compile-time. Note that we've added file-embed as a mandatory rather than a conditional depedency, because of the biblatex localization files. We might eventually want to use readDataFile for this, but it would take some code reorganization. * Text.Pandoc.Loging: Add `CiteprocWarning` to `LogMessage` and use it in `processCitations`. [API change] * Add tests from the pandoc-citeproc package as command tests (including some tests pandoc-citeproc did not pass). * Remove instructions for building pandoc-citeproc from CI and release binary build instructions. We will no longer distribute pandoc-citeproc. * Markdown reader: tweak abbreviation support. Don't insert a nonbreaking space after a potential abbreviation if it comes right before a note or citation. This messes up several things, including citeproc's moving of note citations. * Add `csljson` as and input and output format. This allows pandoc to convert between `csljson` and other bibliography formats, and to generate formatted versions of CSL JSON bibliographies. * Add module Text.Pandoc.Writers.CslJson, exporting `writeCslJson`. [API change] * Add module Text.Pandoc.Readers.CslJson, exporting `readCslJson`. [API change] * Added `bibtex`, `biblatex` as input formats. This allows pandoc to convert between BibLaTeX and BibTeX and other bibliography formats, and to generated formatted versions of BibTeX/BibLaTeX bibliographies. * Add module Text.Pandoc.Readers.BibTeX, exporting `readBibTeX` and `readBibLaTeX`. [API change] * Make "standalone" implicit if output format is a bibliography format. This is needed because pandoc readers for bibliography formats put the bibliographic information in the `references` field of metadata; and unless standalone is specified, metadata gets ignored. (TODO: This needs improvement. We should trigger standalone for the reader when the input format is bibliographic, and for the writer when the output format is markdown.) * Carry over `citationNoteNum` to `citationNoteNumber`. This was just ignored in pandoc-citeproc. * Text.Pandoc.Filter: Add `CiteprocFilter` constructor to Filter. [API change] This runs the processCitations transformation. We need to treat it like a filter so it can be placed in the sequence of filter runs (after some, before others). In FromYAML, this is parsed from `citeproc` or `{type: citeproc}`, so this special filter may be specified either way in a defaults file (or by `citeproc: true`, though this gives no control of positioning relative to other filters). TODO: we need to add something to the manual section on defaults files for this. * Add deprecation warning if `upandoc-citeproc` filter is used. * Add `--citeproc/-C` option to trigger citation processing. This behaves like a filter and will be positioned relative to filters as they appear on the command line. * Rewrote the manual on citatations, adding a dedicated Citations section which also includes some information formerly found in the pandoc-citeproc man page. * Look for CSL styles in the `csl` subdirectory of the pandoc user data directory. This changes the old pandoc-citeproc behavior, which looked in `~/.csl`. Users can simply symlink `~/.csl` to the `csl` subdirectory of their pandoc user data directory if they want the old behavior. * Add support for CSL bibliography entry formatting to LaTeX, HTML, Ms writers. Added CSL-related CSS to styles.html.
2020-09-21Markdown reader: Set citationNoteNum accurately in citations.John MacFarlane1-5/+26
This also changes stateLastNoteNumber -> stateNoteNumber.
2020-09-19Change deprecated Builder.isNull to null.John MacFarlane1-2/+2
2020-09-13Fix hlint suggestions, update hlint.yaml (#6680)Christian Despres1-6/+6
* Fix hlint suggestions, update hlint.yaml Most suggestions were redundant brackets. Some required LambdaCase. The .hlint.yaml file had a small typo, and didn't ignore camelCase suggestions in certain modules.
2020-06-29Clean up T.P.R.MetadataNikolay Yakimov1-5/+2
2020-06-29Handle errors in yamlToMetaNikolay Yakimov1-3/+1