pandoc - Conversion between markup formats

Age	Commit message (Collapse)	Author	Files	Lines
2021-03-04	Revert "Relax `--abbreviations` rules so that a period isn't required."	John MacFarlane	1	-1/+1
	This reverts commit e461b7dd45f717f3317216c7d3207a1d24bf1c85. Ill-advised change. This doesn't work because we parse strings in chunks.
2021-03-04	Relax `--abbreviations` rules so that a period isn't required.	John MacFarlane	1	-1/+1
	Partially addresses #7124.
2021-03-03	Revert "Add T.P.Readers.LaTeX.Include."	John MacFarlane	3	-86/+52
	This reverts commit b569b0226d4bd5e0699077089d54fb03d4394b7d. Memory usage improvement in compilation wasn't very significant.
2021-03-03	Add T.P.Readers.LaTeX.Include.	John MacFarlane	3	-52/+86

2021-03-03	Remove T.P.Readers.LaTeX.Accent.	John MacFarlane	3	-82/+69
	Incorporate accentCommands into T.P.Readers.LaTeX.Inline.
2021-03-03	Move enquote commands to T.P.LaTeX.Lang.	John MacFarlane	3	-24/+34

2021-03-03	Moved more into T.P.Readers.LaTeX.Lang.	John MacFarlane	3	-82/+97

2021-03-03	Split out T.P.Readers.LaTeX.Inline.	John MacFarlane	2	-336/+413

2021-03-01	Make T.P.Readers.LaTeX.Types an unexported module.	John MacFarlane	1	-1/+1
	[API change] This is really an implementation detail that shouldn't be exposed in the public API.
2021-03-01	Factor out T.P.Readers.LaTeX.Macro.	John MacFarlane	2	-139/+155

2021-02-28	Removed unnecessary pragmas.	John MacFarlane	1	-2/+0

2021-02-28	Change T.P.Readers.LaTeX.SIunitx to export a command map...	John MacFarlane	2	-16/+16
	instead of individual commands.
2021-02-28	T.P.Readers.LaTeX: Don't export tokenize, untokenize.	John MacFarlane	2	-2/+9
	[API change] These were only exported for testing, which seems the wrong thing to do. They don't belong in the public API and are not really usable as they are, without access to the Tok type which is not exported. Removed the tokenize/untokenize roundtrip test. We put a quickcheck property in the comments which may be used when this code is touched (if it is).
2021-02-28	Factor out T.P.Readers.LaTeX.Math.	John MacFarlane	2	-193/+229

2021-02-28	Fix bug in last commit.	John MacFarlane	1	-1/+1

2021-02-28	Markdown reader efficiency improvements.	John MacFarlane	1	-182/+208
	Benchmarks show that these make the reader 13-17% faster, depending on extensions.
2021-02-28	LaTeX reader: another small efficiency improvement.	John MacFarlane	1	-6/+12

2021-02-28	LaTeX reader efficiency improvements.	John MacFarlane	1	-31/+42
	In conjunction with other changes this makes the reader almost twice as fast on our benchmark as it was on Feb. 10.
2021-02-28	Move setDefaultLanguage to T.P.Readers.LaTeX.Lang.	John MacFarlane	2	-16/+22

2021-02-28	LaTeX reader: remove two unnecessary parsers in inline.	John MacFarlane	1	-2/+0
	These are handled anyway by regularSymbol.
2021-02-28	Factor out T.P.Readers.LaTeX.Citation.	John MacFarlane	3	-186/+231

2021-02-27	Factor out T.P.Readers.LaTeX.Table.	John MacFarlane	3	-363/+411

2021-02-27	Split off T.P.Readers.LaTeX.Accent.	John MacFarlane	2	-60/+86
	To help reduce memory demands compiling the main LaTeX reader.
2021-02-26	Fix/update URLs and use HTTPS where possible (#7122)	Salim B	1	-1/+1

2021-02-21	LaTeX reader: further optimizations in satisfyTok.	John MacFarlane	1	-5/+5
	Benchmarks show 2/3 of the run time and 2/3 of the allocation of the Feb. 10 benchmarks.
2021-02-21	LaTeX reader: removed sExpanded in state.	John MacFarlane	1	-7/+2
	This isn't actually needed and checking it doesn't change anything. Also remove an unnecessary `doMacros` before `satisfyTok`, which does it anyway.
2021-02-21	LaTeX reader: further performance optimization.	John MacFarlane	1	-23/+19
	Avoid unnecessary 'doMacros'.
2021-02-20	HTML reader: small performance tweak.	John MacFarlane	1	-9/+5

2021-02-20	HTML reader: small efficiency improvements.	John MacFarlane	1	-25/+18
	Also, remove exported class NamedTag(..) [API change]. This was just intended to smooth over the transition from String to Text and is no longer needed. The functions isInlineTag and isBlockTag are no longer polymorphic.
2021-02-20	LaTeX reader: Another small improvement to macro handling.	John MacFarlane	1	-4/+3

2021-02-20	LaTeX reader: avoid macro resolution code if no macros defined.	John MacFarlane	1	-16/+19

2021-02-20	T.P.Readers.LaTeX.Parsing: improve braced'.	John MacFarlane	1	-16/+13
	Remove the parameter, have it parse the opening brace, and make it more efficient.
2021-02-20	HTML reader: efficiency improvements.	John MacFarlane	1	-81/+129
	Do a lookahead to find the right parser to use. Benchmarks from 34ms to 23ms, with less allocation. Also speeds up the epub reader.
2021-02-18	DocBook, JATS, OPML readers: performance optimization.	John MacFarlane	3	-64/+8
	With the new XML parser, we can avoid the expensive tree normalization step we used to do. This gives a significant speed boost in docbook and JATS parsing (e.g. 9.7 to 6 ms).
2021-02-18	Org reader: fix bug in org-ref citation parsing.	Albert Krewinkel	1	-1/+1
	The org-ref syntax allows to list multiple citations separated by comma. This fixes a bug that accepted commas as part of the citation id, so all citation lists were parsed as one single citation. Fixes: #7101
2021-02-17	Docx reader: use Map instead of list for Namespaces.	John MacFarlane	2	-20/+20
	This gives a speedup of about 5-10%. The reader is now approximately twice as fast as in the last release.
2021-02-16	Rename Text.Pandoc.XMLParser -> Text.Pandoc.XML.Light...	John MacFarlane	15	-344/+309
	..and add new definitions isomorphic to xml-light's, but with Text instead of String. This allows us to keep most of the code in existing readers that use xml-light, but avoid lots of unnecessary allocation. We also add versions of the functions from xml-light's Text.XML.Light.Output and Text.XML.Light.Proc that operate on our modified XML types, and functions that convert xml-light types to our types (since some of our dependencies, like texmath, use xml-light). Update golden tests for docx and pptx. OOXML test: Use `showContent` instead of `ppContent` in `displayDiff`. Docx: Do a manual traversal to unwrap sdt and smartTag. This is faster, and needed to pass the tests. Benchmarks: A = prior to 8ca191604dcd13af27c11d2da225da646ebce6fc (Feb 8) B = as of 8ca191604dcd13af27c11d2da225da646ebce6fc (Feb 8) C = this commit \| Reader \| A \| B \| C \| \| ------- \| ----- \| ------ \| ----- \| \| docbook \| 18 ms \| 12 ms \| 10 ms \| \| opml \| 65 ms \| 62 ms \| 35 ms \| \| jats \| 15 ms \| 11 ms \| 9 ms \| \| docx \| 72 ms \| 69 ms \| 44 ms \| \| odt \| 78 ms \| 41 ms \| 28 ms \| \| epub \| 64 ms \| 61 ms \| 56 ms \| \| fb2 \| 14 ms \| 5 ms \| 4 ms \|
2021-02-13	HTML reader: fix bad handling of empty src attribute in iframe.	John MacFarlane	1	-6/+12
	- If src is empty, we simply skip the iframe. - If src is invalid or cannot be fetched, we issue a warning and skip instead of failing with an error. - Closes #7099.
2021-02-13	Org: support task_lists extension	Albert Krewinkel	1	-2/+39
	The tasks lists extension is now supported by the org reader and writer; the extension is turned on by default. Closes: #6336
2021-02-13	LaTeX reader: remove unnecessary line	John MacFarlane	1	-1/+0

2021-02-12	Avoid an unnecessary withRaw.	John MacFarlane	1	-1/+4

2021-02-12	LaTeX reader improvements.	John MacFarlane	2	-22/+68
	* Rewrote `withRaw` so it doesn't rely on fragile assumptions about token positions (which break when macros are expanded). This requires the addition of `sEnableWithRaw` and `sRawTokens` in `LaTeXState`, and a new combinator `disablingWithRaw` to disable collecting of raw tokens in certain contexts. * Add `parseFromToks` to T.P.Readers.LaTeX.Parsing. * Fix parsing of single character tokens so it doesn't mess up the new raw token collecting. * These changes slightly increase allocations and have a small performance impact, but it's minor. Closes #7092.
2021-02-11	Use getTimestamp instead of getCurrentTime in writers.	John MacFarlane	1	-2/+2
	Setting SOURCE_DATE_EPOCH will allow reproducible builds. Partially addresses #7093. This does not suffice to fully enable reproducible in EPUB, since a unique id is being generated for each build.
2021-02-10	Add new unexported module T.P.XMLParser.	John MacFarlane	8	-62/+109
	This exports functions that uses xml-conduit's parser to produce an xml-light Element or [Content]. This allows existing pandoc code to use a better parser without much modification. The new parser is used in all places where xml-light's parser was previously used. Benchmarks show a significant performance improvement in parsing XML-based formats (especially ODT and FB2). Note that the xml-light types use String, so the conversion from xml-conduit types involves a lot of extra allocation. It would be desirable to avoid that in the future by gradually switching to using xml-conduit directly. This can be done module by module. The new parser also reports errors, which we report when possible. A new constructor PandocXMLError has been added to PandocError in T.P.Error [API change]. Closes #7091, which was the main stimulus. These changes revealed the need for some changes in the tests. The docbook-reader.docbook test lacked definitions for the entities it used; these have been added. And the docx golden tests have been updated, because the new parser does not preserve the order of attributes. Add entity defs to docbook-reader.docbook. Update golden tests for docx.
2021-02-08	ODT reader: finer-grained errors on parse failure.	John MacFarlane	1	-21/+18
	See #7091.
2021-02-08	ODT reader: give more information if zip can't be unpacked.	John MacFarlane	1	-1/+4

2021-02-08	DocBook reader: Support informalfigure (#7079)	Nils Carlson	1	-1/+3
	Add support for informalfigure.
2021-02-06	Markdown reader: improved handling of mmd link attributes in references.	John MacFarlane	1	-0/+2
	Previously they only worked for links that had titles. Closes #7080.
2021-01-31	RST reader: fix handling of header in CSV tables.	John MacFarlane	1	-4/+5
	The interpretation of this line is not affected by the delim option. Closes #7064.
2021-01-26	Clean up BibTeX parsing.	John MacFarlane	1	-0/+18
	Previously there was a messy code path that gave strange results in some cases, not passing through raw tex but trying to extract a string content. This was an artefact of trying to handle some special bibtex-specific commands in the BibTeX reader. Now we just handle these in the LaTeX reader and simplify parsing in the BibTeX reader. This does mean that more raw tex will be passed through (and currently this is not sensitive to the `raw_tex` extension; this should be fixed). Closes #7049.