pandoc - Conversion between markup formats

Age	Commit message (Collapse)	Author	Files	Lines
2021-10-18	Docx reader: fix handling of empty fields	Milan Bracke	1	-0/+4
	Some fields only have an instrText and no content, Pandoc didn't understand these, causing other fields to be misunderstood because it seemed like a field was still open when it wasn't.
2021-10-18	Docx parser: implement PAGEREF fields	Milan Bracke	1	-0/+4
	These fields, often used in tables of contents, can be a hyperlink.
2021-10-18	Docx reader: fix handling of nested fields	Milan Bracke	1	-0/+4
	Fields delimited by fldChar elements can contain other fields. Before, the nested fields would be ignored, except for the end, which would be considered the end of the parent field. To fix this issue, fields needed to be considered containing ParParts instead of Runs, since a Run can't represent complex enough structures. This also impacted Hyperlinks since they can originate from a field.
2021-10-10	Avoid blockquote when parent style has more indent	Milan Bracke	1	-0/+4
	When a paragraph has an indentation different from the parent (named) style, it used to be considered a blockquote. But this only makes sense when the paragraph has more indentation. So this commit adds a check for the indentation of the parent style.
2021-05-28	Docx reader: Support new table features.	Emily Bourke	1	-0/+16
	* Column spans * Row spans - The spec says that if the `val` attribute is ommitted, its value should be assumed to be `continue`, and that its values are restricted to {`restart`, `continue`}. If the value has any other value, I think it seems reasonable to default it to `continue`. It might cause problems if the spec is extended in the future by adding a third possible value, in which case this would probably give incorrect behaviour, and wouldn't error. * Allow multiple header rows * Include table description in simple caption - The table description element is like alt text for a table (along with the table caption element). It seems like we should include this somewhere, but I’m not 100% sure how – I’m pairing it with the simple caption for the moment. (Should it maybe go in the block caption instead?) * Detect table captions - Check for caption paragraph style /and/ either the simple or complex table field. This means the caption detection fails for captions which don’t contain a field, as in an example doc I added as a test. However, I think it’s better to be too conservative: a missed table caption will still show up as a paragraph next to the table, whereas if I incorrectly classify something else as a table caption it could cause havoc by pairing it up with a table it’s not at all related to, or dropping it entirely. * Update tests and add new ones Partially fixes: #6316
2021-05-24	MediaBag improvements.	John MacFarlane	1	-5/+5
	In the current dev version, we will sometimes add a version of an image with a hashed name, keeping the original version with the original name, which would leave to undesirable duplication. This change separates the media's filename from the media's canonical name (which is the path of the link in the document itself). Filenames are based on SHA1 hashes and assigned automatically. In Text.Pandoc.MediaBag: - Export MediaItem type [API change]. - Change MediaBag type to a map from Text to MediaItem [API change]. - `lookupMedia` now returns a `MediaItem` [API change]. - Change `insertMedia` so it sets the `mediaPath` to a filename based on the SHA1 hash of the contents. This will be used when contents are extracted. In Text.Pandoc.Class.PandocMonad: - Remove `fetchMediaResource` [API change]. Lua MediaBag module has been changed minimally. In the future it would be better, probably, to give Lua access to the full MediaItem type.
2021-04-29	Docx reader: add handling of vml image objects (jgm#4735) (#7257)	mbrackeantidot	1	-0/+4
	They represent images, the same way as other images in vml format.
2021-02-07	Avoid unnecessary use of NoImplicitPrelude pragma (#7089)	Albert Krewinkel	1	-2/+0

2020-10-06	DOCX reader: Allow empty dates in comments and tracked changes (#6726)	Diego Balseiro	1	-0/+4
	For security reasons, some legal firms delete the date from comments and tracked changes. * Make date optional (Maybe) in tracked changes and comments datatypes * Add tests
2020-09-13	Fix hlint suggestions, update hlint.yaml (#6680)	Christian Despres	1	-1/+1
	* Fix hlint suggestions, update hlint.yaml Most suggestions were redundant brackets. Some required LambdaCase. The .hlint.yaml file had a small typo, and didn't ignore camelCase suggestions in certain modules.
2020-03-28	More cleanup (#6209)	Joseph C. Sible	1	-3/+2
	* Simplify by collapsing a do block into a single <$> * Remove an unnecessary variable: `all` takes any Foldable, so only blocksToInlines needs toList.
2020-03-13	Update copyright year (#6186)	Albert Krewinkel	1	-1/+1
	* Update copyright year * Copyright: add notes for Lua and Jira modules
2020-02-08	Use <$> instead of >>= and return (#6128)	Joseph C. Sible	1	-1/+1

2020-02-07	Apply linter suggestions. Add fix_spacing to lint target in Makefile.	John MacFarlane	1	-1/+1

2019-11-12	Switch to new pandoc-types and use Text instead of String [API change].	despresc	1	-1/+2
	PR #5884. + Use pandoc-types 1.20 and texmath 0.12. + Text is now used instead of String, with a few exceptions. + In the MediaBag module, some of the types using Strings were switched to use FilePath instead (not Text). + In the Parsing module, new parsers `manyChar`, `many1Char`, `manyTillChar`, `many1TillChar`, `many1Till`, `manyUntil`, `mantyUntilChar` have been added: these are like their unsuffixed counterparts but pack some or all of their output. + `glob` in Text.Pandoc.Class still takes String since it seems to be intended as an interface to Glob, which uses strings. It seems to be used only once in the package, in the EPUB writer, so that is not hard to change.
2019-11-03	Docx reader: fix list number resumption for sublists. Closes #4324.	John MacFarlane	1	-0/+4
	The first list item of a sublist should not resume numbering from the number of the last sublist item of the same level, if that sublist was a sublist of a different list item. That is, we should not get: ``` 1. one 1. sub one 2. sub two 2. two 3. sub one ```
2019-09-21	[Docx Reader] Use style names, not ids, for assigning semantic meaning	Nikolay Yakimov	1	-0/+9
	Motivating issues: #5523, #5052, #5074 Style name comparisons are case-insensitive, since those are case-insensitive in Word. w:styleId will be used as style name if w:name is missing (this should only happen for malformed docx and is kept as a fallback to avoid failing altogether on malformed documents) Block quote detection code moved from Docx.Parser to Readers.Docx Code styles, i.e. "Source Code" and "Verbatim Char" now honor style inheritance Docx Reader now honours "Compact" style (used in Pandoc-generated docx). The side-effect is that "Compact" style no longer shows up in docx+styles output. Styles inherited from "Compact" will still show up. Removed obsolete list-item style from divsToKeep. That didn't really do anything for a while now. Add newtypes to differentiate between style names, ids, and different style types (that is, paragraph and character styles) Since docx style names can have spaces in them, and pandoc-markdown classes can't, anywhere when style name is used as a class name, spaces are replaced with ASCII dashes `-`. Get rid of extraneous intermediate types, carrying styleId information. Instead, styleId is saved with other style data. Use RunStyle for inline style definitions only (lacking styleId and styleName); for Character Styles use CharStyle type (which is basicaly RunStyle with styleId and StyleName bolted onto it).
2019-07-28	Use doctemplates 0.3, change type of writerTemplate.	John MacFarlane	1	-1/+1
	* Require recent doctemplates. It is more flexible and supports partials. * Changed type of writerTemplate to Maybe Template instead of Maybe String. * Remove code from the LaTeX, Docbook, and JATS writers that looked in the template for strings to determine whether it is a book or an article, or whether csquotes is used. This was always kludgy and unreliable. To use csquotes for LaTeX, set `csquotes` in your variables or metadata. It is no longer sufficient to put `\usepackage{csquotes}` in your template or header includes. To specify a book style, use the `documentclass` variable or `--top-level-division`. * Change template code to use new API for doctemplates.
2019-02-18	Docx reader: add tests for trimming last inline.	Jesse Rosenthal	1	-0/+4

2019-02-12	Docx reader: Add test for reading sdts in footnotes.	Jesse Rosenthal	1	-0/+4

2019-02-06	Docx reader: Tests for alternate document.xml	Jesse Rosenthal	1	-2/+7

2019-02-04	Add missing copyright notices and remove license boilerplate (#5112)	Albert Krewinkel	1	-0/+11
	Quite a few modules were missing copyright notices. This commit adds copyright notices everywhere via haddock module headers. The old license boilerplate comment is redundant with this and has been removed. Update copyright years to 2019. Closes #4592.
2018-12-10	Docx: add test for lists with level overrides.	Jesse Rosenthal	1	-0/+4

2018-04-17	Docx reader tests: Test for combining adjacent code blocks.	Jesse Rosenthal	1	-0/+4

2018-03-18	Use NoImplicitPrelude and explicitly import Prelude.	John MacFarlane	1	-0/+2
	This seems to be necessary if we are to use our custom Prelude with ghci. Closes #4464.
2018-03-13	Docx reader: add tests for nested smart tags.	Jesse Rosenthal	1	-0/+4

2018-02-28	Docx reader: Handle nested sdt tags.	Jesse Rosenthal	1	-0/+4
	Previously we had only unwrapped one level of sdt tags. Now we recurse if we find them. Closes: #4415
2018-02-22	Docx reader tests: test custom style extension.	Jesse Rosenthal	1	-0/+11

2018-02-15	Docx reader: Pick table width from the longest row or header	danse	1	-0/+4
	This change is intended to preserve as much of the table content as possible Closes #4360
2018-01-19	hlint code improvements.	John MacFarlane	1	-6/+5

2018-01-16	Docx reader: Add test for hyperlinks in instrText tag	Jesse Rosenthal	1	-0/+4
	This is difficult to recreate with a modern version of Word, so I'm using the file submitted with the bug report. It would be preferable to find a smaller example with Latin characters, though, so as not to confuse the issue being tested.
2018-01-02	Docx reader: Add tests for paragraph insertion/deletion.	Jesse Rosenthal	1	-0/+12

2017-12-31	Docx reader: tests for overlapping targets (anchor spans).	Jesse Rosenthal	1	-0/+4

2017-12-30	Docx reader: tests for removing unused anchors.	Jesse Rosenthal	1	-0/+4

2017-12-27	Docx reader: add tests for structured document tags unwrapping.	Jesse Rosenthal	1	-0/+4

2017-12-13	Docx writer: Add tests for list continuation.	Jesse Rosenthal	1	-0/+8

2017-12-04	Add `empty_paragraphs` extension.	John MacFarlane	1	-20/+7
	* Deprecate `--strip-empty-paragraphs` option. Instead we now use an `empty_paragraphs` extension that can be enabled on the reader or writer. By default, disabled. * Add `Ext_empty_paragraphs` constructor to `Extension`. * Revert "Docx reader: don't strip out empty paragraphs." This reverts commit d6c58eb836f033a48955796de4d9ffb3b30e297b. * Implement `empty_paragraphs` extension in docx reader and writer, opendocument writer, html reader and writer. * Add tests for `empty_paragraphs` extension.
2017-12-02	Docx reader: don't strip out empty paragraphs.	John MacFarlane	1	-7/+20
	We now have the `--strip-empty-paragraphs` option for that, if you want it. Closes #2252. Updated docx reader tests. We use stripEmptyParagraphs to avoid changing too many tests. We should add new tests for empty paragraphs.
2017-10-27	Automatic reformating by stylish-haskell.	John MacFarlane	1	-4/+4

2017-10-27	Removed old adjacent_links test for docx reader.	John MacFarlane	1	-4/+0
	See #2270 for background -- this test blocked the consistent underline change and was hard to revise, so for now we are removing it.
2017-08-06	Docx reader: Add tests for avoiding zero-level header.	Jesse Rosenthal	1	-0/+4

2017-06-11	Switched Writer types to use Text.	John MacFarlane	1	-1/+2
	* XML.toEntities: changed type to Text -> Text. * Shared.tabFilter -- fixed so it strips out CRs as before. * Modified writers to take Text. * Updated tests, benchmarks, trypandoc. [API change] Closes #3731.
2017-06-10	Changed all readers to take Text instead of String.	John MacFarlane	1	-1/+3
	Readers: Renamed StringReader -> TextReader. Updated tests. API change.
2017-03-14	Got rid of distracting warning in test output.	John MacFarlane	1	-2/+2

2017-03-14	Use tasty for tests rather than test-framework.	John MacFarlane	1	-15/+15

2017-03-04	Stylish-haskell automatic formatting changes.	John MacFarlane	1	-7/+7

2017-02-11	Use new warnings throughout the code base.	John MacFarlane	1	-1/+1

2017-02-04	Moved tests/ -> test/.	John MacFarlane	1	-0/+344