aboutsummaryrefslogtreecommitdiff
path: root/test/Tests/Readers/Docx.hs
AgeCommit message (Collapse)AuthorFilesLines
2021-10-10Avoid blockquote when parent style has more indentMilan Bracke1-0/+4
When a paragraph has an indentation different from the parent (named) style, it used to be considered a blockquote. But this only makes sense when the paragraph has more indentation. So this commit adds a check for the indentation of the parent style.
2021-05-28Docx reader: Support new table features.Emily Bourke1-0/+16
* Column spans * Row spans - The spec says that if the `val` attribute is ommitted, its value should be assumed to be `continue`, and that its values are restricted to {`restart`, `continue`}. If the value has any other value, I think it seems reasonable to default it to `continue`. It might cause problems if the spec is extended in the future by adding a third possible value, in which case this would probably give incorrect behaviour, and wouldn't error. * Allow multiple header rows * Include table description in simple caption - The table description element is like alt text for a table (along with the table caption element). It seems like we should include this somewhere, but I’m not 100% sure how – I’m pairing it with the simple caption for the moment. (Should it maybe go in the block caption instead?) * Detect table captions - Check for caption paragraph style /and/ either the simple or complex table field. This means the caption detection fails for captions which don’t contain a field, as in an example doc I added as a test. However, I think it’s better to be too conservative: a missed table caption will still show up as a paragraph next to the table, whereas if I incorrectly classify something else as a table caption it could cause havoc by pairing it up with a table it’s not at all related to, or dropping it entirely. * Update tests and add new ones Partially fixes: #6316
2021-05-24MediaBag improvements.John MacFarlane1-5/+5
In the current dev version, we will sometimes add a version of an image with a hashed name, keeping the original version with the original name, which would leave to undesirable duplication. This change separates the media's filename from the media's canonical name (which is the path of the link in the document itself). Filenames are based on SHA1 hashes and assigned automatically. In Text.Pandoc.MediaBag: - Export MediaItem type [API change]. - Change MediaBag type to a map from Text to MediaItem [API change]. - `lookupMedia` now returns a `MediaItem` [API change]. - Change `insertMedia` so it sets the `mediaPath` to a filename based on the SHA1 hash of the contents. This will be used when contents are extracted. In Text.Pandoc.Class.PandocMonad: - Remove `fetchMediaResource` [API change]. Lua MediaBag module has been changed minimally. In the future it would be better, probably, to give Lua access to the full MediaItem type.
2021-04-29Docx reader: add handling of vml image objects (jgm#4735) (#7257)mbrackeantidot1-0/+4
They represent images, the same way as other images in vml format.
2021-02-07Avoid unnecessary use of NoImplicitPrelude pragma (#7089)Albert Krewinkel1-2/+0
2020-10-06DOCX reader: Allow empty dates in comments and tracked changes (#6726)Diego Balseiro1-0/+4
For security reasons, some legal firms delete the date from comments and tracked changes. * Make date optional (Maybe) in tracked changes and comments datatypes * Add tests
2020-09-13Fix hlint suggestions, update hlint.yaml (#6680)Christian Despres1-1/+1
* Fix hlint suggestions, update hlint.yaml Most suggestions were redundant brackets. Some required LambdaCase. The .hlint.yaml file had a small typo, and didn't ignore camelCase suggestions in certain modules.
2020-03-28More cleanup (#6209)Joseph C. Sible1-3/+2
* Simplify by collapsing a do block into a single <$> * Remove an unnecessary variable: `all` takes any Foldable, so only blocksToInlines needs toList.
2020-03-13Update copyright year (#6186)Albert Krewinkel1-1/+1
* Update copyright year * Copyright: add notes for Lua and Jira modules
2020-02-08Use <$> instead of >>= and return (#6128)Joseph C. Sible1-1/+1
2020-02-07Apply linter suggestions. Add fix_spacing to lint target in Makefile.John MacFarlane1-1/+1
2019-11-12Switch to new pandoc-types and use Text instead of String [API change].despresc1-1/+2
PR #5884. + Use pandoc-types 1.20 and texmath 0.12. + Text is now used instead of String, with a few exceptions. + In the MediaBag module, some of the types using Strings were switched to use FilePath instead (not Text). + In the Parsing module, new parsers `manyChar`, `many1Char`, `manyTillChar`, `many1TillChar`, `many1Till`, `manyUntil`, `mantyUntilChar` have been added: these are like their unsuffixed counterparts but pack some or all of their output. + `glob` in Text.Pandoc.Class still takes String since it seems to be intended as an interface to Glob, which uses strings. It seems to be used only once in the package, in the EPUB writer, so that is not hard to change.
2019-11-03Docx reader: fix list number resumption for sublists. Closes #4324.John MacFarlane1-0/+4
The first list item of a sublist should not resume numbering from the number of the last sublist item of the same level, if that sublist was a sublist of a different list item. That is, we should not get: ``` 1. one 1. sub one 2. sub two 2. two 3. sub one ```
2019-09-21[Docx Reader] Use style names, not ids, for assigning semantic meaningNikolay Yakimov1-0/+9
Motivating issues: #5523, #5052, #5074 Style name comparisons are case-insensitive, since those are case-insensitive in Word. w:styleId will be used as style name if w:name is missing (this should only happen for malformed docx and is kept as a fallback to avoid failing altogether on malformed documents) Block quote detection code moved from Docx.Parser to Readers.Docx Code styles, i.e. "Source Code" and "Verbatim Char" now honor style inheritance Docx Reader now honours "Compact" style (used in Pandoc-generated docx). The side-effect is that "Compact" style no longer shows up in docx+styles output. Styles inherited from "Compact" will still show up. Removed obsolete list-item style from divsToKeep. That didn't really do anything for a while now. Add newtypes to differentiate between style names, ids, and different style types (that is, paragraph and character styles) Since docx style names can have spaces in them, and pandoc-markdown classes can't, anywhere when style name is used as a class name, spaces are replaced with ASCII dashes `-`. Get rid of extraneous intermediate types, carrying styleId information. Instead, styleId is saved with other style data. Use RunStyle for inline style definitions only (lacking styleId and styleName); for Character Styles use CharStyle type (which is basicaly RunStyle with styleId and StyleName bolted onto it).
2019-07-28Use doctemplates 0.3, change type of writerTemplate.John MacFarlane1-1/+1
* Require recent doctemplates. It is more flexible and supports partials. * Changed type of writerTemplate to Maybe Template instead of Maybe String. * Remove code from the LaTeX, Docbook, and JATS writers that looked in the template for strings to determine whether it is a book or an article, or whether csquotes is used. This was always kludgy and unreliable. To use csquotes for LaTeX, set `csquotes` in your variables or metadata. It is no longer sufficient to put `\usepackage{csquotes}` in your template or header includes. To specify a book style, use the `documentclass` variable or `--top-level-division`. * Change template code to use new API for doctemplates.
2019-02-18Docx reader: add tests for trimming last inline.Jesse Rosenthal1-0/+4
2019-02-12Docx reader: Add test for reading sdts in footnotes.Jesse Rosenthal1-0/+4
2019-02-06Docx reader: Tests for alternate document.xmlJesse Rosenthal1-2/+7
2019-02-04Add missing copyright notices and remove license boilerplate (#5112)Albert Krewinkel1-0/+11
Quite a few modules were missing copyright notices. This commit adds copyright notices everywhere via haddock module headers. The old license boilerplate comment is redundant with this and has been removed. Update copyright years to 2019. Closes #4592.
2018-12-10Docx: add test for lists with level overrides.Jesse Rosenthal1-0/+4
2018-04-17Docx reader tests: Test for combining adjacent code blocks.Jesse Rosenthal1-0/+4
2018-03-18Use NoImplicitPrelude and explicitly import Prelude.John MacFarlane1-0/+2
This seems to be necessary if we are to use our custom Prelude with ghci. Closes #4464.
2018-03-13Docx reader: add tests for nested smart tags.Jesse Rosenthal1-0/+4
2018-02-28Docx reader: Handle nested sdt tags.Jesse Rosenthal1-0/+4
Previously we had only unwrapped one level of sdt tags. Now we recurse if we find them. Closes: #4415
2018-02-22Docx reader tests: test custom style extension.Jesse Rosenthal1-0/+11
2018-02-15Docx reader: Pick table width from the longest row or headerdanse1-0/+4
This change is intended to preserve as much of the table content as possible Closes #4360
2018-01-19hlint code improvements.John MacFarlane1-6/+5
2018-01-16Docx reader: Add test for hyperlinks in instrText tagJesse Rosenthal1-0/+4
This is difficult to recreate with a modern version of Word, so I'm using the file submitted with the bug report. It would be preferable to find a smaller example with Latin characters, though, so as not to confuse the issue being tested.
2018-01-02Docx reader: Add tests for paragraph insertion/deletion.Jesse Rosenthal1-0/+12
2017-12-31Docx reader: tests for overlapping targets (anchor spans).Jesse Rosenthal1-0/+4
2017-12-30Docx reader: tests for removing unused anchors.Jesse Rosenthal1-0/+4
2017-12-27Docx reader: add tests for structured document tags unwrapping.Jesse Rosenthal1-0/+4
2017-12-13Docx writer: Add tests for list continuation.Jesse Rosenthal1-0/+8
2017-12-04Add `empty_paragraphs` extension.John MacFarlane1-20/+7
* Deprecate `--strip-empty-paragraphs` option. Instead we now use an `empty_paragraphs` extension that can be enabled on the reader or writer. By default, disabled. * Add `Ext_empty_paragraphs` constructor to `Extension`. * Revert "Docx reader: don't strip out empty paragraphs." This reverts commit d6c58eb836f033a48955796de4d9ffb3b30e297b. * Implement `empty_paragraphs` extension in docx reader and writer, opendocument writer, html reader and writer. * Add tests for `empty_paragraphs` extension.
2017-12-02Docx reader: don't strip out empty paragraphs.John MacFarlane1-7/+20
We now have the `--strip-empty-paragraphs` option for that, if you want it. Closes #2252. Updated docx reader tests. We use stripEmptyParagraphs to avoid changing too many tests. We should add new tests for empty paragraphs.
2017-10-27Automatic reformating by stylish-haskell.John MacFarlane1-4/+4
2017-10-27Removed old adjacent_links test for docx reader.John MacFarlane1-4/+0
See #2270 for background -- this test blocked the consistent underline change and was hard to revise, so for now we are removing it.
2017-08-06Docx reader: Add tests for avoiding zero-level header.Jesse Rosenthal1-0/+4
2017-06-11Switched Writer types to use Text.John MacFarlane1-1/+2
* XML.toEntities: changed type to Text -> Text. * Shared.tabFilter -- fixed so it strips out CRs as before. * Modified writers to take Text. * Updated tests, benchmarks, trypandoc. [API change] Closes #3731.
2017-06-10Changed all readers to take Text instead of String.John MacFarlane1-1/+3
Readers: Renamed StringReader -> TextReader. Updated tests. API change.
2017-03-14Got rid of distracting warning in test output.John MacFarlane1-2/+2
2017-03-14Use tasty for tests rather than test-framework.John MacFarlane1-15/+15
2017-03-04Stylish-haskell automatic formatting changes.John MacFarlane1-7/+7
2017-02-11Use new warnings throughout the code base.John MacFarlane1-1/+1
2017-02-04Moved tests/ -> test/.John MacFarlane1-0/+344