aboutsummaryrefslogtreecommitdiff
path: root/src/Text/Pandoc/Readers
AgeCommit message (Collapse)AuthorFilesLines
2020-06-30Ipnyb: allow lossless round-tripping of markdown cell content.John MacFarlane1-1/+2
The reader now parses the contents of the markdown cell to a Pandoc structure, but *also* stores the raw markdown in a `source` attribute on the cell Div. When we convert back to markdown, this attribute is stripped off and the original source is used. When we convert to other formats, the attribute is usually ignored (though it will come through in HTML as a `data-source` attribute, not unhelpfully). I'll note some potential drawbacks of this approach: - It makes it impossible to use pandoc to clean up or change the contents of markdown cells, e.g. going from `+smart` to `-smart`. - There may be formats where the addition of the `source` attribute is problematic. I can't think of any, though. Closes #5408.
2020-06-30Org reader: respect export setting which disables entitiesAlbert Krewinkel3-6/+16
MathML-like entities, e.g., `\alpha`, can be disabled with the `#+OPTION: e:nil` export setting.
2020-06-29Merge pull request #6328 from lierdakil/defaults-meta-parseJohn MacFarlane2-43/+30
Unify defaults metadata and markdown metadata parsers
2020-06-29Org reader: keep unknown keyword lines as raw orgAlbert Krewinkel2-2/+13
The lines of unknown keywords, like `#+SOMEWORD: value` are no longer read as metadata, but kept as raw `org` blocks. This ensures that more information is retained when round-tripping org-mode files; additionally, this change makes it possible to support non-standard org extensions via filters.
2020-06-29Org reader: unify keyword handlingAlbert Krewinkel1-75/+67
Handling of export settings and other keywords (like `#+LINK`) has been combined and unified.
2020-06-29Org reader: support LATEX_HEADER_EXTRA and HTML_HEAD_EXTRA settingsAlbert Krewinkel1-5/+9
These export settings are treated like their non-extra counterparts, i.e., the values are added to the `header-includes` metadata list.
2020-06-29Org reader: allow multiple #+SUBTITLE export settingsAlbert Krewinkel1-0/+1
The values of all lines are read as inlines and collected in the `subtitle` metadata field.
2020-06-29Clean up T.P.R.MetadataNikolay Yakimov2-41/+25
2020-06-29Handle errors in yamlToMetaNikolay Yakimov1-3/+1
2020-06-29Unify defaults and markdown metadata parsersNikolay Yakimov2-15/+20
2020-06-28Remove obsolete RelaxedPolyRec extension (#6487)Nikolay Yakimov5-7/+0
2020-06-28JATS reader: parse abstract element into metadata field of same name (#6482)Albert Krewinkel1-0/+9
Closes: #6480
2020-06-28Org reader: read `#+INSTITUTE` values as text with markupAlbert Krewinkel1-7/+13
The value is stored in the `institute` metadata field and used in the default beamer presentation template.
2020-06-28Org reader: update behavior of author, keywords export settingsAlbert Krewinkel1-19/+9
The behavior of the `#+AUTHOR` and `#+KEYWORD` export settings has changed: Org now allows multiple such lines and adds a space between the contents of each line. Pandoc now always parses these settings as meta inlines; setting values are no longer treated as comma-separated lists. Note that a Lua filter can be used to restore the previous behavior.
2020-06-28Org reader: refactor export setting handlingAlbert Krewinkel1-79/+67
2020-06-27Org reader: read description lines as inlinesAlbert Krewinkel1-10/+46
`#+DESCRIPTION` lines are treated as text with markup. If multiple such lines are given, then all lines are read and separated by soft linebreaks. Closes: #6485
2020-06-25Org reader: honor tex export optionAlbert Krewinkel4-30/+75
The `tex` export option can be set with `#+OPTION: tex:nil` and allows three settings: - `t` causes LaTeX fragments to be parsed as TeX or added as raw TeX, - `nil` removes all LaTeX fragments from the document, and - `verbatim` treats LaTeX as text. The default is `t`. Closes: #4070
2020-06-23LaTeX reader: Retain the Div around tables with attributes.John MacFarlane1-1/+8
We'll need this to store table attributes until all writers are adjusted to react to attributes on the Table element.
2020-06-22Use native Underline instead of Span in JiraJohn MacFarlane1-1/+1
2020-06-20Recognize images with uppercase extensionsAlbert Krewinkel1-1/+2
Fixes: #6472
2020-06-17RST reader: pass arbitrary attributes through in code blocks.John MacFarlane1-12/+12
Exceptions: name (which becomes the id), class (which becomes the classes), and number-lines (which is treated specially to fit with pandoc highlighting). Closes #6465.
2020-06-14Docbook reader: implement <procedure> (#6442)Mathieu Boespflug1-4/+6
A `<procedure>` contains a sequence of `<step>`'s, or `<substeps>` that themselves contain `<step>`'s.
2020-06-14Docbook reader: implement <phrase> (#6438)Mathieu Boespflug1-1/+7
A `<phrase>` has no semantic meaning. It is only useful to hang an `id` or other attributes around a piece of text.
2020-06-14Docbook reader: treat envar and systemitem like code (#6435)Mathieu Boespflug1-2/+4
2020-06-14Docbook: implement <replaceable> (#6437)Mathieu Boespflug1-1/+3
A `<replaceable>` is a placeholder that a user is instructed to replace with a value of their own, like `<replaceable>prefix</replacable>/bin/foo`. In the standard Docbook toolchain, this typically appears emphasized, and no other adornement. But a `<replaceable>` is nearly always in a code element, where emphasis won't work. So we do the same thing as for `<optional>`: decorate the content with brackets.
2020-06-14Docbook: map <simplesect> to unnumbered section (#6436)Mathieu Boespflug1-15/+19
A <simplesect> is a section like any other, except that it never contains an subsection, and is typically rendered unnumbered.
2020-06-13Textile reader: support "pre." for code blocks.John MacFarlane1-8/+8
Cloess #6454.
2020-06-09Ipynb reader: handle application/pdf output as image.John MacFarlane1-1/+1
Closes #6430.
2020-06-09Ipynb reader: properly handle image/svg+xml as an image.John MacFarlane1-3/+5
Partially addresses #6430.
2020-05-20Add "summary" to list of block-level HTML tags.John MacFarlane1-1/+1
Closes #6385. (The summary element needs to be the first child of details and should not be enclosed by p tags.) NOTE: you need to include a blank line before the closing `</details>`, if you want the last part of the content to be parsed as a paragraph.
2020-05-19LaTeX reader: don't parse beyond `\end{document}`.John MacFarlane1-13/+25
This required some internal changes to `\subfile` handling. Closes #6380.
2020-05-14DocBook writer: add id of figure to enclosed image.John MacFarlane1-4/+12
2020-05-08Implement implicit_figures extension for commonmark reader.John MacFarlane1-1/+6
Closes #6350.
2020-05-05Avoid unnecessary guard (#6340)Joseph C. Sible1-1/+1
2020-05-04Fix mediawiki reader with gfm_auto_identifiers.John MacFarlane1-1/+4
Previously the `-` was being replaced by `_`. Closes #6335.
2020-04-28Support new Underline element in readers and writers (#6277)Vaibhav Sagar11-23/+32
Deprecate `underlineSpan` in Shared in favor of `Text.Pandoc.Builder.underline`.
2020-04-18HTML reader: parse attributes into table attributes.John MacFarlane1-14/+18
2020-04-17LaTeX reader: don't put surrounding Div around Table.John MacFarlane1-2/+5
This reverts a change in the last release; the Div is no longer needed, because we can now put the id right in the Table's attributes. However, writers may still need to be modified to do something with the id in a Table (e.g. create an anchor), so in the short term we may lose the ability to link to tables in some writers.
2020-04-15Markdown reader: Remove unnecessary qualificationdespresc1-8/+8
2020-04-15Use the new builders, modify readers to preserve empty headersdespresc18-60/+154
The Builder.simpleTable now only adds a row to the TableHead when the given header row is not null. This uncovered an inconsistency in the readers: some would unconditionally emit a header filled with empty cells, even if the header was not present. Now every reader has the conditional behaviour. Only the XWiki writer depended on the header row being always present; it now pads its head as necessary.
2020-04-15Adapt to the removal of the RowSpan, ColSpan, RowHeadColumns accessorsdespresc1-1/+1
2020-04-15Adapt to the newest Table type, fix some previous adaptation issuesdespresc19-72/+74
- Writers.Native is now adapted to the new Table type. - Inline captions should now be conditionally wrapped in a Plain, not a Para block. - The toLegacyTable function now lives in Writers.Shared.
2020-04-15Remove the onlySimpleCellBodies function from Shareddespresc1-2/+2
2020-04-15Implement the new Table typedespresc20-126/+150
2020-04-15Markdown Reader: Fix inline code in lists (#6284)Nikolay Yakimov1-6/+11
Closes #6284. Previously inline code containing list markers was sometimes parsed incorrectly.
2020-04-15JATS reader: handle "label" element in section title.John MacFarlane1-1/+7
Closes #6288.
2020-04-12RST reader: handle "date::" directive.John MacFarlane1-1/+10
Closes #6276.
2020-04-11HTML reader: support <bdo> (#6271)Tristan de Cacqueray1-0/+13
See https://developer.mozilla.org/en-US/docs/Web/HTML/Element/bdo Closes #5794
2020-04-09Jira reader: improve icon conversionAlbert Krewinkel1-12/+12
Icons are now converted as follows: `(/)` to ✔, `(x)` to ❌, `(!)` to ❗, `(+)` to ➕, `(-)` to ➖, `(off)` to 🌙, and `(*)` to ☆. The new icons render well in most fonts. Furthermore, the UTF-8 characters all fit into 4-bytes. Closes: #6264
2020-04-07LaTeX reader: better handling of `\lettrine`.John MacFarlane1-1/+8
- SmallCaps instead of Span for the part after the initial capital. - Ensure that both arguments are parsed, so that in Markdown both are treated as raw LateX. (Closes #6258.)