From 8ca191604dcd13af27c11d2da225da646ebce6fc Mon Sep 17 00:00:00 2001 From: John MacFarlane Date: Mon, 8 Feb 2021 23:35:19 -0800 Subject: Add new unexported module T.P.XMLParser. This exports functions that uses xml-conduit's parser to produce an xml-light Element or [Content]. This allows existing pandoc code to use a better parser without much modification. The new parser is used in all places where xml-light's parser was previously used. Benchmarks show a significant performance improvement in parsing XML-based formats (especially ODT and FB2). Note that the xml-light types use String, so the conversion from xml-conduit types involves a lot of extra allocation. It would be desirable to avoid that in the future by gradually switching to using xml-conduit directly. This can be done module by module. The new parser also reports errors, which we report when possible. A new constructor PandocXMLError has been added to PandocError in T.P.Error [API change]. Closes #7091, which was the main stimulus. These changes revealed the need for some changes in the tests. The docbook-reader.docbook test lacked definitions for the entities it used; these have been added. And the docx golden tests have been updated, because the new parser does not preserve the order of attributes. Add entity defs to docbook-reader.docbook. Update golden tests for docx. --- test/docx/golden/lists.docx | Bin 10352 -> 10358 bytes 1 file changed, 0 insertions(+), 0 deletions(-) (limited to 'test/docx/golden/lists.docx') diff --git a/test/docx/golden/lists.docx b/test/docx/golden/lists.docx index 5dbe298b7..07046f223 100644 Binary files a/test/docx/golden/lists.docx and b/test/docx/golden/lists.docx differ -- cgit v1.2.3