aboutsummaryrefslogtreecommitdiff
path: root/src/Text
AgeCommit message (Collapse)AuthorFilesLines
2021-02-21LaTeX reader: removed sExpanded in state.John MacFarlane1-7/+2
This isn't actually needed and checking it doesn't change anything. Also remove an unnecessary `doMacros` before `satisfyTok`, which does it anyway.
2021-02-21LaTeX reader: further performance optimization.John MacFarlane1-23/+19
Avoid unnecessary 'doMacros'.
2021-02-20HTML reader: small performance tweak.John MacFarlane1-9/+5
2021-02-20T.P.Shared: remove some obsolete functions [API change].John MacFarlane1-43/+1
Removed: - `splitByIndices` - `splitStringByIndicies` - `substitute` - `underlineSpan` None of these are used elsewhere in the code base.
2021-02-20HTML reader: small efficiency improvements.John MacFarlane1-25/+18
Also, remove exported class NamedTag(..) [API change]. This was just intended to smooth over the transition from String to Text and is no longer needed. The functions isInlineTag and isBlockTag are no longer polymorphic.
2021-02-20LaTeX reader: Another small improvement to macro handling.John MacFarlane1-4/+3
2021-02-20LaTeX reader: avoid macro resolution code if no macros defined.John MacFarlane1-16/+19
2021-02-20T.P.Readers.LaTeX.Parsing: improve braced'.John MacFarlane1-16/+13
Remove the parameter, have it parse the opening brace, and make it more efficient.
2021-02-20HTML reader: efficiency improvements.John MacFarlane1-81/+129
Do a lookahead to find the right parser to use. Benchmarks from 34ms to 23ms, with less allocation. Also speeds up the epub reader.
2021-02-18DocBook, JATS, OPML readers: performance optimization.John MacFarlane3-64/+8
With the new XML parser, we can avoid the expensive tree normalization step we used to do. This gives a significant speed boost in docbook and JATS parsing (e.g. 9.7 to 6 ms).
2021-02-18T.P.XML Improve fromEntities.John MacFarlane1-17/+13
2021-02-18T.P.PDF: disable `smart` when building PDF via LaTeX.John MacFarlane1-1/+5
This is to prevent accidental creation of ligatures like `` ?` `` and `` !` `` (especially in languages with quotations like German), and similar ligature issues. See jgm/citeproc#54.
2021-02-18LaTeX writer: adjust hypertargets to beginnings of paragraphs.John MacFarlane1-2/+3
Use `\vadjust pre` so that the hypertarget takes you to the beginning of the paragraph rather than one line down. Closes #7078. This makes a particular difference for links to citations using `--citeproc` and `link-citations: true`.
2021-02-18T.P.Shared: cleanup.John MacFarlane1-11/+26
Cleanup up some functions and added deprecation pragmas to funtions no longer used in the code base.
2021-02-18Org reader: fix bug in org-ref citation parsing.Albert Krewinkel1-1/+1
The org-ref syntax allows to list multiple citations separated by comma. This fixes a bug that accepted commas as part of the citation id, so all citation lists were parsed as one single citation. Fixes: #7101
2021-02-17Docx reader: use Map instead of list for Namespaces.John MacFarlane2-20/+20
This gives a speedup of about 5-10%. The reader is now approximately twice as fast as in the last release.
2021-02-16Revert "Add T.P.XML.Light.Cursor."John MacFarlane1-346/+0
This reverts commit d8fc4971868104274881570ce9bc3d9edf0d2506.
2021-02-16Add T.P.XML.Light.Cursor.John MacFarlane1-0/+346
2021-02-16Add orig copyright/license info for code derived from xml-light.John MacFarlane3-3/+12
2021-02-16Split up T.P.XML.Light into submodules.John MacFarlane4-504/+565
2021-02-16Rename Text.Pandoc.XMLParser -> Text.Pandoc.XML.Light...John MacFarlane24-928/+1384
..and add new definitions isomorphic to xml-light's, but with Text instead of String. This allows us to keep most of the code in existing readers that use xml-light, but avoid lots of unnecessary allocation. We also add versions of the functions from xml-light's Text.XML.Light.Output and Text.XML.Light.Proc that operate on our modified XML types, and functions that convert xml-light types to our types (since some of our dependencies, like texmath, use xml-light). Update golden tests for docx and pptx. OOXML test: Use `showContent` instead of `ppContent` in `displayDiff`. Docx: Do a manual traversal to unwrap sdt and smartTag. This is faster, and needed to pass the tests. Benchmarks: A = prior to 8ca191604dcd13af27c11d2da225da646ebce6fc (Feb 8) B = as of 8ca191604dcd13af27c11d2da225da646ebce6fc (Feb 8) C = this commit | Reader | A | B | C | | ------- | ----- | ------ | ----- | | docbook | 18 ms | 12 ms | 10 ms | | opml | 65 ms | 62 ms | 35 ms | | jats | 15 ms | 11 ms | 9 ms | | docx | 72 ms | 69 ms | 44 ms | | odt | 78 ms | 41 ms | 28 ms | | epub | 64 ms | 61 ms | 56 ms | | fb2 | 14 ms | 5 ms | 4 ms |
2021-02-14T.P.Error: remove unused variablesAlbert Krewinkel1-2/+2
2021-02-13HTML reader: fix bad handling of empty src attribute in iframe.John MacFarlane1-6/+12
- If src is empty, we simply skip the iframe. - If src is invalid or cannot be fetched, we issue a warning and skip instead of failing with an error. - Closes #7099.
2021-02-13T.P.Error: export `renderError`.John MacFarlane1-33/+72
Refactor `handleError` to use `renderError`. This allows us render error messages without exiting.
2021-02-13Org: support task_lists extensionAlbert Krewinkel3-5/+54
The tasks lists extension is now supported by the org reader and writer; the extension is turned on by default. Closes: #6336
2021-02-13T.P.Shared: export `handleTaskListItem`. [API change]Albert Krewinkel1-0/+1
2021-02-13LaTeX reader: remove unnecessary lineJohn MacFarlane1-1/+0
2021-02-13Remove Ext_fenced_code_attributes from allowed commonmark attributes.John MacFarlane1-2/+0
This attribute was listed as allowed, but it didn't actually do anything. Use `attributes` for code attributes and more. Closes #7097.
2021-02-12Avoid an unnecessary withRaw.John MacFarlane1-1/+4
2021-02-12LaTeX reader improvements.John MacFarlane2-22/+68
* Rewrote `withRaw` so it doesn't rely on fragile assumptions about token positions (which break when macros are expanded). This requires the addition of `sEnableWithRaw` and `sRawTokens` in `LaTeXState`, and a new combinator `disablingWithRaw` to disable collecting of raw tokens in certain contexts. * Add `parseFromToks` to T.P.Readers.LaTeX.Parsing. * Fix parsing of single character tokens so it doesn't mess up the new raw token collecting. * These changes slightly increase allocations and have a small performance impact, but it's minor. Closes #7092.
2021-02-11Use getTimestamp instead of getCurrentTime in writers.John MacFarlane5-7/+7
Setting SOURCE_DATE_EPOCH will allow reproducible builds. Partially addresses #7093. This does not suffice to fully enable reproducible in EPUB, since a unique id is being generated for each build.
2021-02-11T.P.Class: Add getTimestamp [API change].John MacFarlane1-2/+19
This attempts to read the SOURCE_DATE_EPOCH environment variable and parse a UTC time from it (treating it as a unix date stamp, see https://reproducible-builds.org/specs/source-date-epoch/). If the variable is not set or can't be parsed as a unix date stamp, then the function returns the current date.
2021-02-11Correctly parse "raw" date value in markdown references metadata.John MacFarlane1-3/+5
See jgm/citeproc#53.
2021-02-10Add new unexported module T.P.XMLParser.John MacFarlane16-87/+224
This exports functions that uses xml-conduit's parser to produce an xml-light Element or [Content]. This allows existing pandoc code to use a better parser without much modification. The new parser is used in all places where xml-light's parser was previously used. Benchmarks show a significant performance improvement in parsing XML-based formats (especially ODT and FB2). Note that the xml-light types use String, so the conversion from xml-conduit types involves a lot of extra allocation. It would be desirable to avoid that in the future by gradually switching to using xml-conduit directly. This can be done module by module. The new parser also reports errors, which we report when possible. A new constructor PandocXMLError has been added to PandocError in T.P.Error [API change]. Closes #7091, which was the main stimulus. These changes revealed the need for some changes in the tests. The docbook-reader.docbook test lacked definitions for the entities it used; these have been added. And the docx golden tests have been updated, because the new parser does not preserve the order of attributes. Add entity defs to docbook-reader.docbook. Update golden tests for docx.
2021-02-08ODT reader: finer-grained errors on parse failure.John MacFarlane1-21/+18
See #7091.
2021-02-08ODT reader: give more information if zip can't be unpacked.John MacFarlane1-1/+4
2021-02-08DocBook reader: Support informalfigure (#7079)Nils Carlson1-1/+3
Add support for informalfigure.
2021-02-07Avoid unnecessary use of NoImplicitPrelude pragma (#7089)Albert Krewinkel1-1/+0
2021-02-06Markdown reader: improved handling of mmd link attributes in references.John MacFarlane1-0/+2
Previously they only worked for links that had titles. Closes #7080.
2021-02-04Lua filters: use same function names in Haskell and LuaAlbert Krewinkel2-28/+30
2021-02-03ePub writer: `belongs-to-collection` metadata (#7063)Nick Berendsen1-41/+58
2021-02-02Lua: add module "pandoc.path"Albert Krewinkel1-0/+2
The module allows to work with file paths in a convenient and platform-independent manner. Closes: #6001 Closes: #6565
2021-02-02Add parseOptionsFromArgs [API change, addition].John MacFarlane2-2/+9
Exported by Text.Pandoc.App.
2021-02-01BibTeX writer: use doclayout and doctemplate.John MacFarlane3-23/+39
This change allows bibtex/biblatex output to wrap as other formats do, depending on the settings of `--wrap` and `--columns`. It also introduces default templates for bibtex and biblatex, which allow for using the variables `header-include`, `include-before` or `include-after` (or alternatively the command line options `--include-in-header`, `--include-before-body`, `--include-after-body`) to insert content into the generated bibtex/biblatex. This change requires a change in the return type of the unexported `T.P.Citeproc.writeBibTeXString` from `Text` to `Doc Text`. Closes #7068.
2021-02-01BibTeX writer fixes. Closes #7067.John MacFarlane1-6/+15
+ Require citeproc 0.3.0.7, which correctly titlecases when titles contain non-ASCII characters. + Correctly handle 'pages' (= 'page' in CSL). + Correctly handle BibLaTeX 'langid' (= 'language' in CSL). + In BibTeX output, protect foreign titles since there's no language field.
2021-01-31RST reader: fix handling of header in CSV tables.John MacFarlane1-4/+5
The interpretation of this line is not affected by the delim option. Closes #7064.
2021-01-31CslJson writer: fix compiler warningAlbert Krewinkel1-1/+1
2021-01-30CslJson writer: output `[]` if no references in input,John MacFarlane1-5/+5
instead of raising a PandocAppError as before.
2021-01-29Markdown writer: handle math right before digit.John MacFarlane1-1/+5
We insert an HTML comment to avoid a `$` right before a digit, which pandoc will not recognize as a math delimiter.
2021-01-29JATS writer: escape special chars in reference elements.Albert Krewinkel1-3/+6
Prevents the generation of invalid markup if a citation element contains an ampersand or another character with a special meaning in XML.