aboutsummaryrefslogtreecommitdiff
path: root/src/Text/Pandoc/Writers/Docx.hs
AgeCommit message (Collapse)AuthorFilesLines
2021-02-16Rename Text.Pandoc.XMLParser -> Text.Pandoc.XML.Light...John MacFarlane1-133/+130
..and add new definitions isomorphic to xml-light's, but with Text instead of String. This allows us to keep most of the code in existing readers that use xml-light, but avoid lots of unnecessary allocation. We also add versions of the functions from xml-light's Text.XML.Light.Output and Text.XML.Light.Proc that operate on our modified XML types, and functions that convert xml-light types to our types (since some of our dependencies, like texmath, use xml-light). Update golden tests for docx and pptx. OOXML test: Use `showContent` instead of `ppContent` in `displayDiff`. Docx: Do a manual traversal to unwrap sdt and smartTag. This is faster, and needed to pass the tests. Benchmarks: A = prior to 8ca191604dcd13af27c11d2da225da646ebce6fc (Feb 8) B = as of 8ca191604dcd13af27c11d2da225da646ebce6fc (Feb 8) C = this commit | Reader | A | B | C | | ------- | ----- | ------ | ----- | | docbook | 18 ms | 12 ms | 10 ms | | opml | 65 ms | 62 ms | 35 ms | | jats | 15 ms | 11 ms | 9 ms | | docx | 72 ms | 69 ms | 44 ms | | odt | 78 ms | 41 ms | 28 ms | | epub | 64 ms | 61 ms | 56 ms | | fb2 | 14 ms | 5 ms | 4 ms |
2021-02-11Use getTimestamp instead of getCurrentTime in writers.John MacFarlane1-1/+1
Setting SOURCE_DATE_EPOCH will allow reproducible builds. Partially addresses #7093. This does not suffice to fully enable reproducible in EPUB, since a unique id is being generated for each build.
2021-01-12Docx writer: handle table header using styles.John MacFarlane1-17/+20
Instead of hard-coding the border and header cell vertical alignment, we now let this be determined by the Table style, making use of Word's "conditional formatting" for the table's first row. For headerless tables, we use the tblLook element to tell Word not to apply conditional first-row formatting. Closes #7008.
2021-01-08Update copyright notices for 2021 (#7012)Albert Krewinkel1-1/+1
2020-12-30Undo the "Use fromRight" hlint hint.John MacFarlane1-2/+1
2020-12-30Hlint fixesJohn MacFarlane1-1/+2
2020-12-29Improve fix to #6983.John MacFarlane1-1/+3
If we have a paragraph then a bookmarkEnd, we don't need to insert the empty paragraph (and in fact it alters the spacing). Closes #6983.
2020-12-28Docx writer: fix nested tables with captions.John MacFarlane1-4/+6
Previously we got unreadable content, because docx seems to want a `<w:p>` element (even an empty one) at the end of every table cell. Closes #6983.
2020-12-13Docx writer: keep raw openxml strings verbatim.Albert Krewinkel1-2/+5
Closes: #6933
2020-12-13Docx writer: use Content instead of Element.Albert Krewinkel1-59/+75
2020-12-03Docx writer: Support bold and italic in "complex script."John MacFarlane1-2/+6
Previously bold and italics didn't work properly in LTR text. This commit causes the w:bCs and w:iCs attributes to be used, in addition to w:b and w:i, for bold and italics respectively. Closes #6911.
2020-11-26Docx writer: Fix bullets/lists indentationcholonam1-3/+3
Fix appearance of bullets/numbered lists (the first level is slightly indented to the right instead of right on the margin). New golden files have been tested using Word 2010 on Windows 10.
2020-10-06DOCX reader: Allow empty dates in comments and tracked changes (#6726)Diego Balseiro1-11/+7
For security reasons, some legal firms delete the date from comments and tracked changes. * Make date optional (Maybe) in tracked changes and comments datatypes * Add tests
2020-10-02Docx writer: better handle list items whose contents are lists (#6522)Michael Hoffmann1-3/+13
If the first element of a bulleted or ordered list is another list, then that first item will disappear if the target format is docx. This changes the docx writer so that it prepends an empty string for those cases. With this, no items will disappear. Closes #5948.
2020-09-21Add built-in citation support using new citeproc library.John MacFarlane1-0/+12
This deprecates the use of the external pandoc-citeproc filter; citation processing is now built in to pandoc. * Add dependency on citeproc library. * Add Text.Pandoc.Citeproc module (and some associated unexported modules under Text.Pandoc.Citeproc). Exports `processCitations`. [API change] * Add data files needed for Text.Pandoc.Citeproc: default.csl in the data directory, and a citeproc directory that is just used at compile-time. Note that we've added file-embed as a mandatory rather than a conditional depedency, because of the biblatex localization files. We might eventually want to use readDataFile for this, but it would take some code reorganization. * Text.Pandoc.Loging: Add `CiteprocWarning` to `LogMessage` and use it in `processCitations`. [API change] * Add tests from the pandoc-citeproc package as command tests (including some tests pandoc-citeproc did not pass). * Remove instructions for building pandoc-citeproc from CI and release binary build instructions. We will no longer distribute pandoc-citeproc. * Markdown reader: tweak abbreviation support. Don't insert a nonbreaking space after a potential abbreviation if it comes right before a note or citation. This messes up several things, including citeproc's moving of note citations. * Add `csljson` as and input and output format. This allows pandoc to convert between `csljson` and other bibliography formats, and to generate formatted versions of CSL JSON bibliographies. * Add module Text.Pandoc.Writers.CslJson, exporting `writeCslJson`. [API change] * Add module Text.Pandoc.Readers.CslJson, exporting `readCslJson`. [API change] * Added `bibtex`, `biblatex` as input formats. This allows pandoc to convert between BibLaTeX and BibTeX and other bibliography formats, and to generated formatted versions of BibTeX/BibLaTeX bibliographies. * Add module Text.Pandoc.Readers.BibTeX, exporting `readBibTeX` and `readBibLaTeX`. [API change] * Make "standalone" implicit if output format is a bibliography format. This is needed because pandoc readers for bibliography formats put the bibliographic information in the `references` field of metadata; and unless standalone is specified, metadata gets ignored. (TODO: This needs improvement. We should trigger standalone for the reader when the input format is bibliographic, and for the writer when the output format is markdown.) * Carry over `citationNoteNum` to `citationNoteNumber`. This was just ignored in pandoc-citeproc. * Text.Pandoc.Filter: Add `CiteprocFilter` constructor to Filter. [API change] This runs the processCitations transformation. We need to treat it like a filter so it can be placed in the sequence of filter runs (after some, before others). In FromYAML, this is parsed from `citeproc` or `{type: citeproc}`, so this special filter may be specified either way in a defaults file (or by `citeproc: true`, though this gives no control of positioning relative to other filters). TODO: we need to add something to the manual section on defaults files for this. * Add deprecation warning if `upandoc-citeproc` filter is used. * Add `--citeproc/-C` option to trigger citation processing. This behaves like a filter and will be positioned relative to filters as they appear on the command line. * Rewrote the manual on citatations, adding a dedicated Citations section which also includes some information formerly found in the pandoc-citeproc man page. * Look for CSL styles in the `csl` subdirectory of the pandoc user data directory. This changes the old pandoc-citeproc behavior, which looked in `~/.csl`. Users can simply symlink `~/.csl` to the `csl` subdirectory of their pandoc user data directory if they want the old behavior. * Add support for CSL bibliography entry formatting to LaTeX, HTML, Ms writers. Added CSL-related CSS to styles.html.
2020-09-13Fix hlint suggestions, update hlint.yaml (#6680)Christian Despres1-1/+1
* Fix hlint suggestions, update hlint.yaml Most suggestions were redundant brackets. Some required LambdaCase. The .hlint.yaml file had a small typo, and didn't ignore camelCase suggestions in certain modules.
2020-09-11Use the original tail instead of deconstructing and reconstructing it (#6678)Joseph C. Sible1-2/+2
2020-08-24Docx writer: separate adjacent tables.John MacFarlane1-1/+9
Word combines adjacent tables, so to prevent this we insert an empty paragraph between two adjacent tables. Closes #4315.
2020-07-22Docx writer: support --number-sections.John MacFarlane1-4/+17
Closes #1413.
2020-05-16Docx writer: enable column and row bands for tables.John MacFarlane1-1/+6
This change will not have any effect with the default style. However, it enables users to use a style (via a reference.docx) that turns on row and/or column bands. Closes #6371.
2020-04-28Support new Underline element in readers and writers (#6277)Vaibhav Sagar1-3/+3
Deprecate `underlineSpan` in Shared in favor of `Text.Pandoc.Builder.underline`.
2020-04-15Adapt to the newest Table type, fix some previous adaptation issuesdespresc1-1/+1
- Writers.Native is now adapted to the new Table type. - Inline captions should now be conditionally wrapped in a Plain, not a Para block. - The toLegacyTable function now lives in Writers.Shared.
2020-04-15Implement the new Table typedespresc1-3/+4
2020-03-29Clean up and simplify Text.Pandoc.Writers.Docx (#6229)Joseph C. Sible1-56/+48
* Use <|> to simplify the Semigroup instance * Use map instead of reimplementing it * Simplify isValidChar * Remove an unnecessary nested do block * Simplify pgContentWidth * Simplify addLang * Simplify newStyles * Avoid an unnecessary fmap in headerFooterEntries * Remove unnecessary monadicity from mkNumbering and mkAbstractNum * Use randomRs instead of constantly messing with the RNG state * Lift common functions out of ifs * Hoist not * Clarify withTextPropM and withParaPropM
2020-03-29Clean up some fmaps (#6226)Joseph C. Sible1-3/+3
* Avoid fmapping when we're just binding right after anyway * Clean up unnecessary fmaps in the LaTeX reader
2020-03-22Finer grained imports of Text.Pandoc.Class submodules (#6203)Albert Krewinkel1-2/+2
This should speed-up recompilation after changes in `Text.Pandoc.Class`, as the number of modules affected by a change will be smaller in general. It also offers faster insights into the parts of `T.P.Class` used within a module.
2020-03-15Use implicit Prelude (#6187)Albert Krewinkel1-2/+0
* Use implicit Prelude The previous behavior was introduced as a fix for #4464. It seems that this change alone did not fix the issue, and `stack ghci` and `cabal repl` only work with GHC 8.4.1 or newer, as no custom Prelude is loaded for these versions. Given this, it seems cleaner to revert to the implicit Prelude. * PandocMonad: remove outdated check for base version Only base versions 4.9 and later are supported, the check for `MIN_VERSION_base(4,8,0)` is therefore unnecessary. * Always use custom prelude Previously, the custom prelude was used only with older GHC versions, as a workaround for problems with ghci. The ghci problems are resolved by replacing package `base` with `base-noprelude`, allowing for consistent use of the custom prelude across all GHC versions.
2020-03-13Update copyright year (#6186)Albert Krewinkel1-1/+1
* Update copyright year * Copyright: add notes for Lua and Jira modules
2020-01-19Docx writer: fix regression with Compact style on tight lists. (#6073)John MacFarlane1-1/+9
Starting in 2.8, the docx writer no longer distinguishes between tight and loose lists, since the Compact style is omitted. This is a side-effect of the fix to #5670, as explained in the changelog: + Preserve built-in styles in DOCX with custom style (Ben Steinberg, #5670). This change prevents custom styles on divs and spans from overriding styles on certain elements inside them, like headings, blockquotes, and links. On those elements, the "native" style is required for the element to display correctly. This change also allows nesting of custom styles; in order to do so, it removes the default "Compact" style applied to Plain blocks, except when inside a table. This patch fixes the problem by extending the exception currently offered to Plain blocks inside tables to Plain blocks inside list items. Closes #6072.
2019-11-12Switch to new pandoc-types and use Text instead of String [API change].despresc1-82/+87
PR #5884. + Use pandoc-types 1.20 and texmath 0.12. + Text is now used instead of String, with a few exceptions. + In the MediaBag module, some of the types using Strings were switched to use FilePath instead (not Text). + In the Parsing module, new parsers `manyChar`, `many1Char`, `manyTillChar`, `many1TillChar`, `many1Till`, `manyUntil`, `mantyUntilChar` have been added: these are like their unsuffixed counterparts but pack some or all of their output. + `glob` in Text.Pandoc.Class still takes String since it seems to be intended as an interface to Glob, which uses strings. It seems to be used only once in the package, in the EPUB writer, so that is not hard to change.
2019-11-11Fix typos (#5896)Brian Wignall1-1/+1
2019-09-28More throwError in place of fail.John MacFarlane1-0/+1
2019-09-28Replace some more fails with throwErrors.John MacFarlane1-2/+5
2019-09-28Use Prelude.fail to avoid ambiguity with fail from GHC.Base.John MacFarlane1-1/+1
2019-09-22[Docx Writer] Re-use Readers.Docx.Parse for StyleMap (#5766)Nikolay Yakimov1-31/+30
* [Docx Parser] Move style-parsing-specific code to a new module * [Docx Writer] Re-use Readers.Docx.Parse.Styles for StyleMap * [Docx Writer] Move Readers.Docx.StyleMap to Writers.Docx.StyleMap It's never used outside of writer code, so it makes more sense to scope it under writers really.
2019-09-21[Docx Writer] Consistently use style names, not style idsNikolay Yakimov1-27/+25
Styles that this change affects: paragraph styles: Author, Abstract, Compact, Figure, Captioned Figure, Image Caption, First Paragraph, Source Code, Table Caption, Definition, Definition Term; character styles: Verbatim Char, token styles (those with names ending in Tok)
2019-09-21[Docx Writer] Code clean-upNikolay Yakimov1-40/+37
Reduce code duplication, remove redundant brackets
2019-09-20Preserve built-in styles in DOCX with custom style (#5670)Ben Steinberg1-22/+55
This commit prevents custom styles on divs and spans from overriding styles on certain elements inside them, like headings, blockquotes, and links. On those elements, the "native" style is required for the element to display correctly. This change also allows nesting of custom styles; in order to do so, it removes the default "Compact" style applied to Plain blocks, except when inside a table.
2019-09-08Replace Element and makeHierarchical with makeSections.John MacFarlane1-1/+1
Text.Pandoc.Shared: + Remove `Element` type [API change] + Remove `makeHierarchicalize` [API change] + Add `makeSections` [API change] + Export `deLink` [API change] Now that we have Divs, we can use them to represent the structure of sections, and we don't need a special Element type. `makeSections` reorganizes a block list, adding Divs with class `section` around sections, and adding numbering if needed. This change also fixes some longstanding issues recognizing section structure when the document contains Divs. Closes #3057, see also #997. All writers have been changed to use `makeSections`. Note that in the process we have reverted the change c1d058aeb1c6a331a2cc22786ffaab17f7118ccd made in response to #5168, which I'm not completely sure was a good idea. Lua modules have also been adjusted accordingly. Existing lua filters that use `hierarchicalize` will need to be rewritten to use `make_sections`.
2019-08-23add proofState to settingsList (#5703)Krystof Beuermann1-0/+1
2019-07-19Change order of ilvl and numId in document.xml (#5647)Agustín Martín Barbero1-3/+3
Workaround for Word Online shortcomming. Fixes #5645 Also, make list para properties go first. This reordering of properties shouldn't be necessary but it seems Word Online does not understand the docx correctly otherwise.
2019-03-21Docx writer: Use w:br without attributes for line breaks.John MacFarlane1-4/+1
We previously added the attribute `type="textWrapping"`, but this causes problems on Word Online. Closes #5377.
2019-03-11docx writer: avoid extra copy of abstractNum and num elements...John MacFarlane1-1/+9
...in numbering.xml. This caused pandoc-produced docx files to be uneditable using Word Online. The problem was that recent versions of reference.docx include samples of various kinds of text, including lists. The numering elements for these were getting copied over to the new docx, where they clashed with the autogenerated elements produced by pandoc. This didn't confuse Desktop Word, but it did confuse Word Online. Closes #5358.
2019-03-01Remove license boilerplate.John MacFarlane1-18/+0
The haddock module header contains essentially the same information, so the boilerplate is redundant and just one more thing to get out of sync.
2019-02-04Add missing copyright notices and remove license boilerplate (#5112)Albert Krewinkel1-2/+2
Quite a few modules were missing copyright notices. This commit adds copyright notices everywhere via haddock module headers. The old license boilerplate comment is redundant with this and has been removed. Update copyright years to 2019. Closes #4592.
2019-01-26Improve writing metadata for docx, pptx and odt (#5252)Agustín Martín Barbero1-2/+17
* docx writer: support custom properties. Solves the writer part of #3024. Also supports additional core properties: `subject`, `lang`, `category`, `description`. * odt writer: improve standard properties, including the following core properties: `generator` (Pandoc/VERSION), `description`, `subject`, `keywords`, `initial-creator` (from authors), `creation-date` (actual creation date). Also fix date. * pptx writer: support custom properties. Also supports additional core properties: `subject`, `category`, `description`. * Includes golden tests. * MANUAL: document metadata support for docx, odt, pptx writers
2018-12-31Replace read with safeRead (#5186)Mauro Bieg1-5/+5
closes #5180
2018-11-20Docx writer: Fix bookmarks to headers with long titles.John MacFarlane1-4/+18
Word has a 40 character limit for bookmark names. In addition, bookmarks must begin with a letter. Since pandoc's auto-generated identifiers may not respect these constraints, some internal links did not work. With this change, pandoc uses a bookmark name based on the SHA1 hash of the identifier when the identifier isn't a legal bookmark name. Closes #5091.
2018-11-19Fix compiler warning.John MacFarlane1-1/+1
2018-11-19For bibliography match Div with id 'refs', not class 'references'.John MacFarlane1-2/+2
This was a mismatch between pandoc's docx, epub, latex, and markdown writers and the behavior of pandoc-citeproc, which actually looks for a div with id 'refs' rather than one with class 'references'.