aboutsummaryrefslogtreecommitdiff
path: root/src/Text/Pandoc
AgeCommit message (Collapse)AuthorFilesLines
2021-02-28Removed unnecessary pragmas.John MacFarlane1-2/+0
2021-02-28Change T.P.Readers.LaTeX.SIunitx to export a command map...John MacFarlane2-16/+16
instead of individual commands.
2021-02-28T.P.Readers.LaTeX: Don't export tokenize, untokenize.John MacFarlane2-2/+9
[API change] These were only exported for testing, which seems the wrong thing to do. They don't belong in the public API and are not really usable as they are, without access to the Tok type which is not exported. Removed the tokenize/untokenize roundtrip test. We put a quickcheck property in the comments which may be used when this code is touched (if it is).
2021-02-28LaTeX writer: use function instead of map for accent lookup.John MacFarlane1-27/+25
2021-02-28Factor out T.P.Readers.LaTeX.Math.John MacFarlane2-193/+229
2021-02-28Fix bug in last commit.John MacFarlane1-1/+1
2021-02-28Markdown reader efficiency improvements.John MacFarlane1-182/+208
Benchmarks show that these make the reader 13-17% faster, depending on extensions.
2021-02-28LaTeX reader: another small efficiency improvement.John MacFarlane1-6/+12
2021-02-28LaTeX reader efficiency improvements.John MacFarlane1-31/+42
In conjunction with other changes this makes the reader almost twice as fast on our benchmark as it was on Feb. 10.
2021-02-28Move setDefaultLanguage to T.P.Readers.LaTeX.Lang.John MacFarlane2-16/+22
2021-02-28LaTeX reader: remove two unnecessary parsers in inline.John MacFarlane1-2/+0
These are handled anyway by regularSymbol.
2021-02-28Factor out T.P.Readers.LaTeX.Citation.John MacFarlane3-186/+231
2021-02-27Factor out T.P.Readers.LaTeX.Table.John MacFarlane3-363/+411
2021-02-27Split off T.P.Readers.LaTeX.Accent.John MacFarlane2-60/+86
To help reduce memory demands compiling the main LaTeX reader.
2021-02-27Lua: use strict evaluation when retrieving AST value from the stackAlbert Krewinkel1-79/+77
Fixes: #6674
2021-02-26Fix/update URLs and use HTTP**S** where possible (#7122)Salim B4-7/+7
2021-02-22T.P.CSV: fix parsing of unquoted values.John MacFarlane1-2/+1
Previously we didn't allow unescaped quotes in unquoted values, but they are allowed. Closes #7112.
2021-02-22Fall back to latin1 if UTF-8 decoding fails...John MacFarlane1-1/+7
...when handling URL argument served with no charset in the mime type. The assumption is that most pages that don't specify a charset in the mime type are either UTF-8 or latin1. I think that's a good assumption, though I'm not sure.
2021-02-22When downloading content from URL arguments, be sensitive to...John MacFarlane1-1/+9
the character encoding. We can properly handle UTF-8 and latin1 (ISO-8859-1); for others we raise an error. See #5600.
2021-02-22T.P.Error: Add PandocUnsupportedCharsetError constructor...John MacFarlane1-0/+4
...for PandocError. [API change]
2021-02-22Text.Pandoc.MIME: add exported function getCharset.John MacFarlane1-2/+15
[API change]
2021-02-22Text.Pandoc.UTF8: change IO functions to return Text, not String.John MacFarlane8-64/+65
[API change] This affects `readFile`, `getContents`, `writeFileWith`, `writeFile`, `putStrWith`, `putStr`, `putStrLnWith`, `putStrLn`. `hPutStrWith`, `hPutStr`, `hPutStrLnWith`, `hPutStrLn`, `hGetContents`. This avoids the need to uselessly create a linked list of characters when emiting output.
2021-02-21LaTeX reader: further optimizations in satisfyTok.John MacFarlane1-5/+5
Benchmarks show 2/3 of the run time and 2/3 of the allocation of the Feb. 10 benchmarks.
2021-02-21LaTeX reader: removed sExpanded in state.John MacFarlane1-7/+2
This isn't actually needed and checking it doesn't change anything. Also remove an unnecessary `doMacros` before `satisfyTok`, which does it anyway.
2021-02-21LaTeX reader: further performance optimization.John MacFarlane1-23/+19
Avoid unnecessary 'doMacros'.
2021-02-20HTML reader: small performance tweak.John MacFarlane1-9/+5
2021-02-20T.P.Shared: remove some obsolete functions [API change].John MacFarlane1-43/+1
Removed: - `splitByIndices` - `splitStringByIndicies` - `substitute` - `underlineSpan` None of these are used elsewhere in the code base.
2021-02-20HTML reader: small efficiency improvements.John MacFarlane1-25/+18
Also, remove exported class NamedTag(..) [API change]. This was just intended to smooth over the transition from String to Text and is no longer needed. The functions isInlineTag and isBlockTag are no longer polymorphic.
2021-02-20LaTeX reader: Another small improvement to macro handling.John MacFarlane1-4/+3
2021-02-20LaTeX reader: avoid macro resolution code if no macros defined.John MacFarlane1-16/+19
2021-02-20T.P.Readers.LaTeX.Parsing: improve braced'.John MacFarlane1-16/+13
Remove the parameter, have it parse the opening brace, and make it more efficient.
2021-02-20HTML reader: efficiency improvements.John MacFarlane1-81/+129
Do a lookahead to find the right parser to use. Benchmarks from 34ms to 23ms, with less allocation. Also speeds up the epub reader.
2021-02-18DocBook, JATS, OPML readers: performance optimization.John MacFarlane3-64/+8
With the new XML parser, we can avoid the expensive tree normalization step we used to do. This gives a significant speed boost in docbook and JATS parsing (e.g. 9.7 to 6 ms).
2021-02-18T.P.XML Improve fromEntities.John MacFarlane1-17/+13
2021-02-18T.P.PDF: disable `smart` when building PDF via LaTeX.John MacFarlane1-1/+5
This is to prevent accidental creation of ligatures like `` ?` `` and `` !` `` (especially in languages with quotations like German), and similar ligature issues. See jgm/citeproc#54.
2021-02-18LaTeX writer: adjust hypertargets to beginnings of paragraphs.John MacFarlane1-2/+3
Use `\vadjust pre` so that the hypertarget takes you to the beginning of the paragraph rather than one line down. Closes #7078. This makes a particular difference for links to citations using `--citeproc` and `link-citations: true`.
2021-02-18T.P.Shared: cleanup.John MacFarlane1-11/+26
Cleanup up some functions and added deprecation pragmas to funtions no longer used in the code base.
2021-02-18Org reader: fix bug in org-ref citation parsing.Albert Krewinkel1-1/+1
The org-ref syntax allows to list multiple citations separated by comma. This fixes a bug that accepted commas as part of the citation id, so all citation lists were parsed as one single citation. Fixes: #7101
2021-02-17Docx reader: use Map instead of list for Namespaces.John MacFarlane2-20/+20
This gives a speedup of about 5-10%. The reader is now approximately twice as fast as in the last release.
2021-02-16Revert "Add T.P.XML.Light.Cursor."John MacFarlane1-346/+0
This reverts commit d8fc4971868104274881570ce9bc3d9edf0d2506.
2021-02-16Add T.P.XML.Light.Cursor.John MacFarlane1-0/+346
2021-02-16Add orig copyright/license info for code derived from xml-light.John MacFarlane3-3/+12
2021-02-16Split up T.P.XML.Light into submodules.John MacFarlane4-504/+565
2021-02-16Rename Text.Pandoc.XMLParser -> Text.Pandoc.XML.Light...John MacFarlane24-928/+1384
..and add new definitions isomorphic to xml-light's, but with Text instead of String. This allows us to keep most of the code in existing readers that use xml-light, but avoid lots of unnecessary allocation. We also add versions of the functions from xml-light's Text.XML.Light.Output and Text.XML.Light.Proc that operate on our modified XML types, and functions that convert xml-light types to our types (since some of our dependencies, like texmath, use xml-light). Update golden tests for docx and pptx. OOXML test: Use `showContent` instead of `ppContent` in `displayDiff`. Docx: Do a manual traversal to unwrap sdt and smartTag. This is faster, and needed to pass the tests. Benchmarks: A = prior to 8ca191604dcd13af27c11d2da225da646ebce6fc (Feb 8) B = as of 8ca191604dcd13af27c11d2da225da646ebce6fc (Feb 8) C = this commit | Reader | A | B | C | | ------- | ----- | ------ | ----- | | docbook | 18 ms | 12 ms | 10 ms | | opml | 65 ms | 62 ms | 35 ms | | jats | 15 ms | 11 ms | 9 ms | | docx | 72 ms | 69 ms | 44 ms | | odt | 78 ms | 41 ms | 28 ms | | epub | 64 ms | 61 ms | 56 ms | | fb2 | 14 ms | 5 ms | 4 ms |
2021-02-14T.P.Error: remove unused variablesAlbert Krewinkel1-2/+2
2021-02-13HTML reader: fix bad handling of empty src attribute in iframe.John MacFarlane1-6/+12
- If src is empty, we simply skip the iframe. - If src is invalid or cannot be fetched, we issue a warning and skip instead of failing with an error. - Closes #7099.
2021-02-13T.P.Error: export `renderError`.John MacFarlane1-33/+72
Refactor `handleError` to use `renderError`. This allows us render error messages without exiting.
2021-02-13Org: support task_lists extensionAlbert Krewinkel3-5/+54
The tasks lists extension is now supported by the org reader and writer; the extension is turned on by default. Closes: #6336
2021-02-13T.P.Shared: export `handleTaskListItem`. [API change]Albert Krewinkel1-0/+1
2021-02-13LaTeX reader: remove unnecessary lineJohn MacFarlane1-1/+0