aboutsummaryrefslogtreecommitdiff
path: root/src/Text/Pandoc
AgeCommit message (Collapse)AuthorFilesLines
2021-11-05Add interface for custom readers written in Lua. (#7671)John MacFarlane2-5/+63
New module Text.Pandoc.Readers.Custom, exporting readCustom [API change]. Users can now do `-f myreader.lua` and pandoc will treat the script myreader.lua as a custom reader, which parses an input string to a pandoc AST, using the pandoc module defined for Lua filters. A sample custom reader can be found in data/reader.lua. Closes #7669.
2021-11-05Support for <indexterm>s when reading DocBook (#7607)Rowan Rodrik van der Molen1-4/+37
* Support for <indexterm>s when reading DocBook * Update implementation status of `<n-ary>` tags * Remove non-idiomatic parentheses * More complete `<indexterm>` support, with tests Co-authored-by: Rowan Rodrik van der Molen <rowan@ytec.nl>
2021-11-05T.P.Error: sort errors in handleError by exit codeAlbert Krewinkel1-15/+15
2021-11-05Lua: display Pandoc values using their native Haskell representationAlbert Krewinkel1-0/+4
2021-11-05Lua: always load lpeg as global moduleAlbert Krewinkel2-5/+27
2021-11-04Lua: include lpeg module (#7649)Albert Krewinkel1-0/+4
Compiles the 'lpeg' library (Parsing Expression Grammars For Lua) into the program. Package maintainers may choose to rely on package dependencies to make lpeg available, in which case they can compile the with the constraint `lpeg +rely-on-shared-lpeg-library`.
2021-11-04Allow `plain` to be used in raw attribute syntax.John MacFarlane2-2/+4
2021-11-03Lua: add missing space in "package not found" messageAlbert Krewinkel1-1/+1
Closes: #7658
2021-11-02Markdown reader: Improve inlinesInBalancedBrackets.John MacFarlane1-20/+12
This is just a small improvement in terms of performance, but it's simpler and more direct code. Also, we avoid parsing interparagraph spaces in balanced brackets, as the original did.
2021-11-02Docx reader: don't let first line indents trigger block quotes.John MacFarlane1-3/+2
This fixes a regression introduced in pandoc 2.15 by PR #7606. Closes #7655.
2021-11-02Lua: fix typo in SoftBreak constructorAlbert Krewinkel1-1/+1
2021-11-02Lua: re-add `content` property to Strikeout elementsAlbert Krewinkel1-0/+2
Fixes a regression introduced in 2.15.
2021-11-02Lua: be more forgiving when retrieving the Image `caption` propertyAlbert Krewinkel1-1/+1
Fixes a regression introduced in 2.15.
2021-11-02Docx writer: use getTimestamp for modification times in reference.docx.John MacFarlane1-1/+1
This ensures that when `SOURCE_DATE_EPOCH` is set, the modification times of files taken from the reference.docx will be set deterministically, allowing for reproducible builds. Closes #7654.
2021-11-02Lua: display Attr values using their native Haskell representationAlbert Krewinkel1-0/+4
2021-11-02Lua: allow omitting the 2nd parameter in pandoc.Code constructorAlbert Krewinkel1-2/+2
Fixes a regression introduced in 2.15 which required users to always specify an Attr value when constructing a Code element.
2021-11-02Lua: allow to compare, show Citation valuesAlbert Krewinkel1-1/+12
Comparisons of Citation values are performed in Haskell; values are equal if they represent the same Haskell value. Converting a Citation value to a string now yields its native Haskell string representation.
2021-11-01Lua: restore `content` property on Header elementsAlbert Krewinkel1-0/+2
2021-10-31Lua: re-add `content` property to Link elementsAlbert Krewinkel1-0/+2
This was a regression introduced in version 2.15. Fixes: #7647
2021-10-30Fix build on GHC 9.2Joseph C. Sible1-0/+1
2021-10-29Docx writer: move ": " out of the caption bookmark.Tristan Stenner2-6/+4
This is needed so that native references to the figure are included as "As seen in Figure X, it is..." instead of "As seen in [Figure: , it is..."
2021-10-29Lua: use hslua module abstraction where possibleAlbert Krewinkel11-297/+374
This will make it easier to generate module documentation in the future.
2021-10-28Lua: increase strictness when getting attribute keysAlbert Krewinkel1-2/+2
2021-10-27Lua: re-add `t` and `tag` property to Attr valuesAlbert Krewinkel1-0/+4
Removal of these properties from Attr values was a regression.
2021-10-27Markdown writer: Be sure to quote special values in YAML metadata.John MacFarlane1-3/+13
E.g. "Y", "yes", which are now (with yaml library) considered boolean values, as well as "null". This fixes a bug with roundtripping markdown -> markdown: ``` --- foo: "true" ... ```
2021-10-27Change JSON encodings of some types.John MacFarlane3-44/+56
- For LineEnding use lowercase constructors, e.g. `crlf`, `native`. This was the original intent, but there was a bug in the implementation. - For HTMLSlideVariant use lowercase constructors. - For ReaderOptions use e.g. `default-image-extension` instead of `readerDefaultImageExtension` for field names. - For Extension, use e.g. `tex_math_dollars` instead of `Ext_tex_math_dollars` as constructor. - For Extensions, use an array of Extensions, instead of an object wrapping the tag `Extensions` and an integer. (The representation is not supposed to be part of the public API.) - For Opt, use field names like `tab-stop` instead of `optTabStop`.
2021-10-27Switch back from HsYAML to yaml.John MacFarlane9-419/+334
Reasons: - Performance: HsYAML is around 20 times slower in parsing large YAML bibliographies (#6084). - An issue was submitted to HsYAML, but it hasn't gotten any attention. HsYAML seems borderline unmaintained; it hasn't had a commit in over a year. - Unfortunately this goes back on our attempts to free ourselves from C dependencies (#4535). But I don't see a better alternative until a better pure Haskell parser is available. Closes #6084. Notes: - We've removed the FromYAML instances for all types that had them, since this is a HsYAML-specific typeclass [API change]. (The yaml package just uses From/ToJSON.) - Unlike HsYAML (in the configuration we were using), yaml parses 'Y', 'N', 'Yes', 'No', 'On', 'Off' as boolean values. Users may need to quote these when they are meant to be interpreted as strings. Similarly, 'null' is parsed as a YAML null value (and will be treated as an empty string by pandoc rather than the string 'null'). Quoting it will force it to be interpreted as a string. - Some tests had to be adjusted accordingly. - Pandoc now behaves better when the YAML metadata contains escaping errors: instead of just falling back on treating the section as a table, it raises a YAML parsing error.
2021-10-27Lua: fix `pandoc.utils.stringify` regressionAlbert Krewinkel1-1/+1
The `pandoc.utils.stringify` function returned empty strings when called with a string argument.
2021-10-26Fix a copy/paste bug in Lua marshalling code.John MacFarlane1-1/+1
This led changes in link properties in Lua filters to change the links into images! Closes #7639.
2021-10-26Lua: marshal SimpleTable values as userdata objectsAlbert Krewinkel3-46/+58
2021-10-26Lua: generate constants in module pandoc programmaticallyAlbert Krewinkel1-0/+17
2021-10-26Lua: marshal ListAttributes values as userdata objectsAlbert Krewinkel6-15/+80
2021-10-26Lua: marshal Block values as userdata objectsAlbert Krewinkel4-146/+461
Properties of Block values are marshalled lazily, which generally improves performance considerably. Script users may also notice the following differences: - Block element properties can no longer be accessed by numerical indexing of the `.c` field. The `.c` property now serves as an alias for `.content`, so some filter that used this undocumented method for property access may continue to work, while others will need to be updated and use proper property names. - The marshalled Block elements now have a `show` method, and a `__tostring` metamethod. Both return the Haskell string representation of the element. - Block values now have the Lua type `userdata` instead of `table`.
2021-10-25Lua: marshal Citation values as userdata objectsAlbert Krewinkel3-16/+53
2021-10-23Lua: convert IOErrors to PandocErrors in pandoc.pipe functionAlbert Krewinkel1-0/+2
Fixes: #7523
2021-10-22Org reader: allow an initial :PROPERTIES: drawer to add to metadata.John MacFarlane1-2/+10
Closes #7520.
2021-10-22Use simpleFigure in Readers.Aner Lucero25-110/+93
2021-10-22Lua: marshal Version values as userdataAlbert Krewinkel6-125/+12
2021-10-22Lua: marshal Inline elements as userdataAlbert Krewinkel2-63/+345
This includes the following user-facing changes: - Deprecated inline constructors are removed. These are `DoubleQuoted`, `SingleQuoted`, `DisplayMath`, and `InlineMath`. - Attr values are no longer normalized when assigned to an Inline element property. - It's no longer possible to access parts of Inline elements via numerical indexes. E.g., `pandoc.Span('test')[2]` used to give `pandoc.Str 'test'`, but yields `nil` now. This was undocumented behavior not intended to be used in user scripts. Use named properties instead. - Accessing `.c` to get a JSON-like tuple of all components no longer works. This was undocumented behavior. - Only known properties can be set on an element value. Trying to set a different property will now raise an error.
2021-10-22Lua: marshal Attr values as userdataAlbert Krewinkel4-14/+233
- Adds a new `pandoc.AttributeList()` constructor, which creates the associative attribute list that is used as the third component of `Attr` values. Values of this type can often be passed to constructors instead of `Attr` values. - `AttributeList` values can no longer be indexed numerically.
2021-10-22Lua: marshal Pandoc values as userdataAlbert Krewinkel2-11/+36
2021-10-22Switch to hslua-2.0Albert Krewinkel24-1187/+1095
The new HsLua version takes a somewhat different approach to marshalling and unmarshalling, relying less on typeclasses and more on specialized types. This allows for better performance and improved error messages. Furthermore, new abstractions allow to document the code and exposed functions.
2021-10-21Move splitStrWhen to T.P.Citeproc.Util.John MacFarlane3-23/+15
Previously there were two copies, in BibTeX and Locator.
2021-10-21SelfContained: fix bug that caused everything to be made a data uri.John MacFarlane1-12/+12
All the code we needed to put most styles and scripts into inline style and script tags was there, but because of the order of pattern matching, it was never being called. Putting the catch-all clause at the end fixes the bug. Closes #7635, closes #7367. See also #3423.
2021-10-20Markdown reader: don't parse links or bracketed spans as citations.John MacFarlane1-2/+4
Previously pandoc would parse [link to (@a)](url) as a citation; similarly [(@a)]{#ident} This is undesirable. One should be able to use example references in citations, and even if `@a` is not defined as an example reference, `[@a](url)` should be a link containing an author-in-text citation rather than a normal citation followed by literal `(url)`. Closes #7632.
2021-10-19FormatHeuristics: remove `.tei.xml` extension for TEI.John MacFarlane1-1/+0
As noted in #7630, this never worked, because `takeExtension` only returns `.xml`. So it won't be missed if we remove it. Closes #7630.
2021-10-18Docx reader: fix handling of empty fieldsMilan Bracke1-0/+4
Some fields only have an instrText and no content, Pandoc didn't understand these, causing other fields to be misunderstood because it seemed like a field was still open when it wasn't.
2021-10-18Docx parser: implement PAGEREF fieldsMilan Bracke2-0/+26
These fields, often used in tables of contents, can be a hyperlink.
2021-10-18Docx reader: fix handling of nested fieldsMilan Bracke2-115/+150
Fields delimited by fldChar elements can contain other fields. Before, the nested fields would be ignored, except for the end, which would be considered the end of the parent field. To fix this issue, fields needed to be considered containing ParParts instead of Runs, since a Run can't represent complex enough structures. This also impacted Hyperlinks since they can originate from a field.
2021-10-17pptx: Line up continuation paragraphsEmily Bourke2-10/+93
This commit changes the `marL` and `indent` values used for plain paragraphs and numbered lists, and changes the spacing defined in the reference doc master for bulleted lists. For paragraphs, there is now a left-indent taken from the `otherStyle` in the master. For numbered lists, the number is positioned where the text would be if this were a plain paragraph, and the text is indented to the next level. This means that continuation paragraphs line up nicely with numbered lists. It also /mostly/ matches the observed PowerPoint behaviour when inserting paragraphs and numbered lists: the only difference is that PowerPoint was using a different margin value for the first level numbered lists – I’ve changed this to match the other levels, as I don’t think it makes the spacing unappealing and it allows continuation paragraphs at any level to line up. With bulleted lists, I’m keeping the observed PowerPoint behaviour of specifying only a level, letting `marL` and `indent` be automatically taken from `bodyStyle`. To that end, this commit changes the `bodyStyle` spacing in the master of the default reference doc, to: - line up the text of the first paragraph in each bullet with any continuation paragraphs - line up nested bullet markers in any continuation paragraphs with the first paragraph, matching lists and plain paragraphs This does mean the continuation paragraphs still won’t line up for anyone using their own reference doc where they haven’t matched the `otherStyle` and `bodyStyle` indent levels, but I think people in that situation will be able to troubleshoot.