aboutsummaryrefslogtreecommitdiff
path: root/test/docx
AgeCommit message (Collapse)AuthorFilesLines
2021-03-13Use integral values for w:tblW in docx.John MacFarlane3-0/+0
Cloess #7141.
2021-02-16Rename Text.Pandoc.XMLParser -> Text.Pandoc.XML.Light...John MacFarlane33-0/+0
..and add new definitions isomorphic to xml-light's, but with Text instead of String. This allows us to keep most of the code in existing readers that use xml-light, but avoid lots of unnecessary allocation. We also add versions of the functions from xml-light's Text.XML.Light.Output and Text.XML.Light.Proc that operate on our modified XML types, and functions that convert xml-light types to our types (since some of our dependencies, like texmath, use xml-light). Update golden tests for docx and pptx. OOXML test: Use `showContent` instead of `ppContent` in `displayDiff`. Docx: Do a manual traversal to unwrap sdt and smartTag. This is faster, and needed to pass the tests. Benchmarks: A = prior to 8ca191604dcd13af27c11d2da225da646ebce6fc (Feb 8) B = as of 8ca191604dcd13af27c11d2da225da646ebce6fc (Feb 8) C = this commit | Reader | A | B | C | | ------- | ----- | ------ | ----- | | docbook | 18 ms | 12 ms | 10 ms | | opml | 65 ms | 62 ms | 35 ms | | jats | 15 ms | 11 ms | 9 ms | | docx | 72 ms | 69 ms | 44 ms | | odt | 78 ms | 41 ms | 28 ms | | epub | 64 ms | 61 ms | 56 ms | | fb2 | 14 ms | 5 ms | 4 ms |
2021-02-10Add new unexported module T.P.XMLParser.John MacFarlane33-0/+0
This exports functions that uses xml-conduit's parser to produce an xml-light Element or [Content]. This allows existing pandoc code to use a better parser without much modification. The new parser is used in all places where xml-light's parser was previously used. Benchmarks show a significant performance improvement in parsing XML-based formats (especially ODT and FB2). Note that the xml-light types use String, so the conversion from xml-conduit types involves a lot of extra allocation. It would be desirable to avoid that in the future by gradually switching to using xml-conduit directly. This can be done module by module. The new parser also reports errors, which we report when possible. A new constructor PandocXMLError has been added to PandocError in T.P.Error [API change]. Closes #7091, which was the main stimulus. These changes revealed the need for some changes in the tests. The docbook-reader.docbook test lacked definitions for the entities it used; these have been added. And the docx golden tests have been updated, because the new parser does not preserve the order of attributes. Add entity defs to docbook-reader.docbook. Update golden tests for docx.
2021-01-12Docx writer: handle table header using styles.John MacFarlane32-0/+0
Instead of hard-coding the border and header cell vertical alignment, we now let this be determined by the Table style, making use of Word's "conditional formatting" for the table's first row. For headerless tables, we use the tblLook element to tell Word not to apply conditional first-row formatting. Closes #7008.
2020-12-13Docx writer: keep raw openxml strings verbatim.Albert Krewinkel4-0/+9
Closes: #6933
2020-12-03Docx writer: Support bold and italic in "complex script."John MacFarlane3-0/+0
Previously bold and italics didn't work properly in LTR text. This commit causes the w:bCs and w:iCs attributes to be used, in addition to w:b and w:i, for bold and italics respectively. Closes #6911.
2020-11-26Docx writer: Fix bullets/lists indentationcholonam31-0/+0
Fix appearance of bullets/numbered lists (the first level is slightly indented to the right instead of right on the margin). New golden files have been tested using Word 2010 on Windows 10.
2020-10-06DOCX reader: Allow empty dates in comments and tracked changes (#6726)Diego Balseiro3-0/+9
For security reasons, some legal firms delete the date from comments and tracked changes. * Make date optional (Maybe) in tracked changes and comments datatypes * Add tests
2020-10-02Docx writer: better handle list items whose contents are lists (#6522)Michael Hoffmann2-0/+8
If the first element of a bulleted or ordered list is another list, then that first item will disappear if the target format is docx. This changes the docx writer so that it prepends an empty string for those cases. With this, no items will disappear. Closes #5948.
2020-08-24Docx writer: separate adjacent tables.John MacFarlane1-0/+0
Word combines adjacent tables, so to prevent this we insert an empty paragraph between two adjacent tables. Closes #4315.
2020-07-22Docx writer: support --number-sections.John MacFarlane28-0/+0
Closes #1413.
2020-07-07[Docx Reader] Refactor/update smushInlinesNikolay Yakimov2-1/+1
2020-05-16Docx writer: enable column and row bands for tables.John MacFarlane28-0/+0
This change will not have any effect with the default style. However, it enables users to use a style (via a reference.docx) that turns on row and/or column bands. Closes #6371.
2020-04-28Support new Underline element in readers and writers (#6277)Vaibhav Sagar3-2/+4
Deprecate `underlineSpan` in Shared in favor of `Text.Pandoc.Builder.underline`.
2020-04-15Use the new builders, modify readers to preserve empty headersdespresc4-27/+5
The Builder.simpleTable now only adds a row to the TableHead when the given header row is not null. This uncovered an inconsistency in the readers: some would unconditionally emit a header filled with empty cells, even if the header was not present. Now every reader has the conditional behaviour. Only the XWiki writer depended on the header row being always present; it now pads its head as necessary.
2020-04-15Adapt to the removal of the RowSpan, ColSpan, RowHeadColumns accessorsdespresc6-81/+81
2020-04-15Adapt to the newest Table type, fix some previous adaptation issuesdespresc6-183/+236
- Writers.Native is now adapted to the new Table type. - Inline captions should now be conditionally wrapped in a Plain, not a Para block. - The toLegacyTable function now lives in Writers.Shared.
2020-04-15Implement the new Table typedespresc6-83/+204
2019-11-16Change styles in reference.docx.John MacFarlane28-0/+0
All headings now have a uniform color. Level-1 headings no longer set `w:themeShade="B5"`. Level-2 headings are now 14 point rather than 16 point. Level-3 headings are now 12 point rather than 14 point. Level-4 headings are italic rather than bold. Closes #5820.
2019-11-14Change reference.docx to use more normal block quotes.John MacFarlane28-0/+0
Indented left and right, same font and size. Previously it was unindented, smaller font and different typeface. See #5820.
2019-11-03Docx reader: fix list number resumption for sublists. Closes #4324.John MacFarlane2-0/+8
The first list item of a sublist should not resume numbering from the number of the last sublist item of the same level, if that sublist was a sublist of a different list item. That is, we should not get: ``` 1. one 1. sub one 2. sub two 2. two 3. sub one ```
2019-09-21[Docx Reader] Update testsNikolay Yakimov6-7/+7
Notice this commit updates lists.docx. The old test file contained references to "ListParagraph" style, which should never leak outside of pandoc, so I'm not sure what that was supposed to test for exactly.
2019-09-21[Docx Reader] Use style names, not ids, for assigning semantic meaningNikolay Yakimov4-0/+10
Motivating issues: #5523, #5052, #5074 Style name comparisons are case-insensitive, since those are case-insensitive in Word. w:styleId will be used as style name if w:name is missing (this should only happen for malformed docx and is kept as a fallback to avoid failing altogether on malformed documents) Block quote detection code moved from Docx.Parser to Readers.Docx Code styles, i.e. "Source Code" and "Verbatim Char" now honor style inheritance Docx Reader now honours "Compact" style (used in Pandoc-generated docx). The side-effect is that "Compact" style no longer shows up in docx+styles output. Styles inherited from "Compact" will still show up. Removed obsolete list-item style from divsToKeep. That didn't really do anything for a while now. Add newtypes to differentiate between style names, ids, and different style types (that is, paragraph and character styles) Since docx style names can have spaces in them, and pandoc-markdown classes can't, anywhere when style name is used as a class name, spaces are replaced with ASCII dashes `-`. Get rid of extraneous intermediate types, carrying styleId information. Instead, styleId is saved with other style data. Use RunStyle for inline style definitions only (lacking styleId and styleName); for Character Styles use CharStyle type (which is basicaly RunStyle with styleId and StyleName bolted onto it).
2019-09-20Preserve built-in styles in DOCX with custom style (#5670)Ben Steinberg3-0/+15
This commit prevents custom styles on divs and spans from overriding styles on certain elements inside them, like headings, blockquotes, and links. On those elements, the "native" style is required for the element to display correctly. This change also allows nesting of custom styles; in order to do so, it removes the default "Compact" style applied to Plain blocks, except when inside a table.
2019-07-19Change order of ilvl and numId in document.xml (#5647)Agustín Martín Barbero4-0/+0
Workaround for Word Online shortcomming. Fixes #5645 Also, make list para properties go first. This reordering of properties shouldn't be necessary but it seems Word Online does not understand the docx correctly otherwise.
2019-03-21Docx writer: Use w:br without attributes for line breaks.John MacFarlane2-0/+0
We previously added the attribute `type="textWrapping"`, but this causes problems on Word Online. Closes #5377.
2019-03-11docx writer: avoid extra copy of abstractNum and num elements...John MacFarlane27-0/+0
...in numbering.xml. This caused pandoc-produced docx files to be uneditable using Word Online. The problem was that recent versions of reference.docx include samples of various kinds of text, including lists. The numering elements for these were getting copied over to the new docx, where they clashed with the autogenerated elements produced by pandoc. This didn't confuse Desktop Word, but it did confuse Word Online. Closes #5358.
2019-02-18Docx reader tests: fix test file with trailing space.Jesse Rosenthal1-1/+1
This failed due to the fix of #5273.
2019-02-18Docx reader: add tests for trimming last inline.Jesse Rosenthal2-0/+2
2019-02-12Docx reader: Add test for reading sdts in footnotes.Jesse Rosenthal2-0/+1
2019-02-06Docx reader: Tests for alternate document.xmlJesse Rosenthal2-0/+2
2019-01-26Improve writing metadata for docx, pptx and odt (#5252)Agustín Martín Barbero4-0/+4
* docx writer: support custom properties. Solves the writer part of #3024. Also supports additional core properties: `subject`, `lang`, `category`, `description`. * odt writer: improve standard properties, including the following core properties: `generator` (Pandoc/VERSION), `description`, `subject`, `keywords`, `initial-creator` (from authors), `creation-date` (actual creation date). Also fix date. * pptx writer: support custom properties. Also supports additional core properties: `subject`, `category`, `description`. * Includes golden tests. * MANUAL: document metadata support for docx, odt, pptx writers
2018-12-10Docx: add test for lists with level overrides.Jesse Rosenthal2-0/+37
2018-11-20Docx writer: Fix bookmarks to headers with long titles.John MacFarlane1-0/+0
Word has a 40 character limit for bookmark names. In addition, bookmarks must begin with a letter. Since pandoc's auto-generated identifiers may not respect these constraints, some internal links did not work. With this change, pandoc uses a bookmark name based on the SHA1 hash of the identifier when the identifier isn't a legal bookmark name. Closes #5091.
2018-10-09Docx writer: added framework for custom properties.John MacFarlane26-0/+0
So far, we don't actually write any custom properties, but we have the infrastructure to add this. See #3034.
2018-05-08Support underline in docx writer.John MacFarlane1-0/+0
Updated golden test and confirmed validity of file. Closes #4633.
2018-04-25Remove nonfree ICC profiles from thumbnails in test docx files.John MacFarlane18-0/+0
Closes #4588.
2018-04-17Docx reader tests: Test for combining adjacent code blocks.Jesse Rosenthal2-0/+6
2018-04-05Changes to tests to accommodate changes in pandoc-types.John MacFarlane1-0/+3
In https://github.com/jgm/pandoc-types/pull/36 we changed the table builder to pad cells. This commit changes tests (and two readers) to accord with this behavior.
2018-03-13Docx reader: add tests for nested smart tags.Jesse Rosenthal2-0/+7
2018-02-28Docx reader: Handle nested sdt tags.Jesse Rosenthal2-0/+3
Previously we had only unwrapped one level of sdt tags. Now we recurse if we find them. Closes: #4415
2018-02-23Docx reader: Don't look up dependant run styles if +styles is enabled.Jesse Rosenthal1-1/+1
It makes more sense not to interpret -- otherwise using the original document as the reference-doc would produce two of everything: the interpreted version and the uninterpreted style version.
2018-02-23Docx test: adjust test for fix of buglaptop1\Andrew3-5/+8
This commit adjusts the test cases for the Docx writer after the fix of #3930. - Adjusted test cases with inline images. The inline images now have the correct sizing, title and description. - Modified the test case to include an image multiple times with different sizing each time. - Tested on Windows 8.1 with Word 2007 (12.0.6705.5000) The files are not corrupted and display exactly what is expected.
2018-02-22Docx reader: Move pandoc inline styling inside custom-style spanJesse Rosenthal1-1/+1
Previously Emph, Strong, etc were outside the custom-style span. This moves them inside in order to make it easier to write filters that act on the formatting in these contents. Tests and MANUAL example are changed to match.
2018-02-22Docx reader: Avoid repeated spans in custom styles.Jesse Rosenthal1-1/+1
The previous commit had a bug where custom-style spans would be read with every recurrsion. This fixes that, and changes the example given in the manual.
2018-02-22Docx reader tests: test custom style extension.Jesse Rosenthal2-0/+11
2018-02-15Docx reader: Pick table width from the longest row or headerdanse2-0/+13
This change is intended to preserve as much of the table content as possible Closes #4360
2018-01-27Docx writer tests: Add tests for custom stylesJesse Rosenthal3-0/+0
2018-01-27Docx writer tests: Use new golden frameworkJesse Rosenthal26-0/+7
These are based off the reader tests, with some removed (where the reader output was identical, based on different docx inputs). There are still more to be added. In particular, tests for custom-styles need to be added. All golden docx files have been checked in MS Word 2013 (windows). There is no corruption. There is questionable output in the `tables` test: the three tables seemed to be joined. This will be addressed in a future commit, and the golden docx file will be changed.
2018-01-16Docx reader: Add test for hyperlinks in instrText tagJesse Rosenthal2-0/+1
This is difficult to recreate with a modern version of Word, so I'm using the file submitted with the bug report. It would be preferable to find a smaller example with Latin characters, though, so as not to confuse the issue being tested.