pandoc - Conversion between markup formats

Age	Commit message (Collapse)	Author	Files	Lines
2021-02-16	Rename Text.Pandoc.XMLParser -> Text.Pandoc.XML.Light...	John MacFarlane	1	-7/+9
	..and add new definitions isomorphic to xml-light's, but with Text instead of String. This allows us to keep most of the code in existing readers that use xml-light, but avoid lots of unnecessary allocation. We also add versions of the functions from xml-light's Text.XML.Light.Output and Text.XML.Light.Proc that operate on our modified XML types, and functions that convert xml-light types to our types (since some of our dependencies, like texmath, use xml-light). Update golden tests for docx and pptx. OOXML test: Use `showContent` instead of `ppContent` in `displayDiff`. Docx: Do a manual traversal to unwrap sdt and smartTag. This is faster, and needed to pass the tests. Benchmarks: A = prior to 8ca191604dcd13af27c11d2da225da646ebce6fc (Feb 8) B = as of 8ca191604dcd13af27c11d2da225da646ebce6fc (Feb 8) C = this commit \| Reader \| A \| B \| C \| \| ------- \| ----- \| ------ \| ----- \| \| docbook \| 18 ms \| 12 ms \| 10 ms \| \| opml \| 65 ms \| 62 ms \| 35 ms \| \| jats \| 15 ms \| 11 ms \| 9 ms \| \| docx \| 72 ms \| 69 ms \| 44 ms \| \| odt \| 78 ms \| 41 ms \| 28 ms \| \| epub \| 64 ms \| 61 ms \| 56 ms \| \| fb2 \| 14 ms \| 5 ms \| 4 ms \|
2021-02-11	Use getTimestamp instead of getCurrentTime in writers.	John MacFarlane	1	-1/+1
	Setting SOURCE_DATE_EPOCH will allow reproducible builds. Partially addresses #7093. This does not suffice to fully enable reproducible in EPUB, since a unique id is being generated for each build.
2021-02-10	Add new unexported module T.P.XMLParser.	John MacFarlane	1	-13/+16
	This exports functions that uses xml-conduit's parser to produce an xml-light Element or [Content]. This allows existing pandoc code to use a better parser without much modification. The new parser is used in all places where xml-light's parser was previously used. Benchmarks show a significant performance improvement in parsing XML-based formats (especially ODT and FB2). Note that the xml-light types use String, so the conversion from xml-conduit types involves a lot of extra allocation. It would be desirable to avoid that in the future by gradually switching to using xml-conduit directly. This can be done module by module. The new parser also reports errors, which we report when possible. A new constructor PandocXMLError has been added to PandocError in T.P.Error [API change]. Closes #7091, which was the main stimulus. These changes revealed the need for some changes in the tests. The docbook-reader.docbook test lacked definitions for the entities it used; these have been added. And the docx golden tests have been updated, because the new parser does not preserve the order of attributes. Add entity defs to docbook-reader.docbook. Update golden tests for docx.
2021-01-08	Update copyright notices for 2021 (#7012)	Albert Krewinkel	1	-1/+1

2020-09-13	Fix hlint suggestions, update hlint.yaml (#6680)	Christian Despres	1	-1/+1
	* Fix hlint suggestions, update hlint.yaml Most suggestions were redundant brackets. Some required LambdaCase. The .hlint.yaml file had a small typo, and didn't ignore camelCase suggestions in certain modules.
2020-03-22	Finer grained imports of Text.Pandoc.Class submodules (#6203)	Albert Krewinkel	1	-2/+2
	This should speed-up recompilation after changes in `Text.Pandoc.Class`, as the number of modules affected by a change will be smaller in general. It also offers faster insights into the parts of `T.P.Class` used within a module.
2020-03-15	Use implicit Prelude (#6187)	Albert Krewinkel	1	-2/+0
	* Use implicit Prelude The previous behavior was introduced as a fix for #4464. It seems that this change alone did not fix the issue, and `stack ghci` and `cabal repl` only work with GHC 8.4.1 or newer, as no custom Prelude is loaded for these versions. Given this, it seems cleaner to revert to the implicit Prelude. * PandocMonad: remove outdated check for base version Only base versions 4.9 and later are supported, the check for `MIN_VERSION_base(4,8,0)` is therefore unnecessary. * Always use custom prelude Previously, the custom prelude was used only with older GHC versions, as a workaround for problems with ghci. The ghci problems are resolved by replacing package `base` with `base-noprelude`, allowing for consistent use of the custom prelude across all GHC versions.
2020-03-13	Update copyright year (#6186)	Albert Krewinkel	1	-1/+1
	* Update copyright year * Copyright: add notes for Lua and Jira modules
2020-02-07	Apply linter suggestions. Add fix_spacing to lint target in Makefile.	John MacFarlane	1	-7/+5

2020-02-07	Resolve HLint warnings	Albert Krewinkel	1	-1/+1
	All warnings are either fixed or, if more appropriate, HLint is configured to ignore them. HLint suggestions remain. * Ignore "Use camelCase" warnings in Lua and legacy code * Fix or ignore remaining HLint warnings * Remove redundant brackets * Remove redundant `return`s * Remove redundant as-pattern * Fuse mapM_/map * Use `.` to shorten code * Remove redundant `fmap` * Remove unused LANGUAGE pragmas * Hoist `not` in Text.Pandoc.App * Use fewer imports for `Text.DocTemplates` * Remove redundant `do`s * Remove redundant `$`s * Jira reader: remove unnecessary parentheses
2019-11-12	Switch to new pandoc-types and use Text instead of String [API change].	despresc	1	-23/+24
	PR #5884. + Use pandoc-types 1.20 and texmath 0.12. + Text is now used instead of String, with a few exceptions. + In the MediaBag module, some of the types using Strings were switched to use FilePath instead (not Text). + In the Parsing module, new parsers `manyChar`, `many1Char`, `manyTillChar`, `many1TillChar`, `many1Till`, `manyUntil`, `mantyUntilChar` have been added: these are like their unsuffixed counterparts but pack some or all of their output. + `glob` in Text.Pandoc.Class still takes String since it seems to be intended as an interface to Glob, which uses strings. It seems to be used only once in the package, in the EPUB writer, so that is not hard to change.
2019-08-25	Use new doctemplates, doclayout.	John MacFarlane	1	-1/+1
	+ Remove Text.Pandoc.Pretty; use doclayout instead. [API change] + Text.Pandoc.Writers.Shared: remove metaToJSON, metaToJSON' [API change]. + Text.Pandoc.Writers.Shared: modify `addVariablesToContext`, `defField`, `setField`, `getField`, `resetField` to work with Context rather than JSON values. [API change] + Text.Pandoc.Writers.Shared: export new function `endsWithPlain` [API change]. + Use new templates and doclayout in writers. + Use Doc-based templates in all writers. + Adjust three tests for minor template rendering differences. + Added indentation to body in docbook4, docbook5 templates. The main impact of this change is better reflowing of content interpolated into templates. Previously, interpolated variables were rendered independently and intepolated as strings, which could lead to overly long lines. Now the templates interpolated as Doc values which may include breaking spaces, and reflowing occurs after template interpolation rather than before.
2019-03-01	Remove license boilerplate.	John MacFarlane	1	-18/+0
	The haddock module header contains essentially the same information, so the boilerplate is redundant and just one more thing to get out of sync.
2019-02-04	Add missing copyright notices and remove license boilerplate (#5112)	Albert Krewinkel	1	-2/+2
	Quite a few modules were missing copyright notices. This commit adds copyright notices everywhere via haddock module headers. The old license boilerplate comment is redundant with this and has been removed. Update copyright years to 2019. Closes #4592.
2019-01-26	Improve writing metadata for docx, pptx and odt (#5252)	Agustín Martín Barbero	1	-9/+27
	* docx writer: support custom properties. Solves the writer part of #3024. Also supports additional core properties: `subject`, `lang`, `category`, `description`. * odt writer: improve standard properties, including the following core properties: `generator` (Pandoc/VERSION), `description`, `subject`, `keywords`, `initial-creator` (from authors), `creation-date` (actual creation date). Also fix date. * pptx writer: support custom properties. Also supports additional core properties: `subject`, `category`, `description`. * Includes golden tests. * MANUAL: document metadata support for docx, odt, pptx writers
2019-01-17	odt writer: fix typo in custom properties (#5231)	Agustín Martín Barbero	1	-2/+2
	fixes #2839
2018-11-22	Hlint suggestions.	John MacFarlane	1	-2/+1

2018-10-08	ODT writer: improve metadata.	John MacFarlane	1	-7/+26
	- Author, date added to metadata. - Remaining metadata properties (besides author, date, title, lang) are added as meta:user-defined tags.
2018-09-07	Fix percentage image scaling in ODT (#4881)	Nils Carlson	1	-2/+2
	Image scaling in ODT was broken when a width was set to a percentage. The width was passed to the svg:width field as a pecentage, which is not correct according to the ODT standard. Instead the real dimensions should be passed as width and height and the style:rel-width attribute should be set to the percentage while style:rel-heigh attribute should be set to "scale". The converse is true if a percentage height is given. This is now fixed and documents produced are now properly scaled.
2018-03-18	Use NoImplicitPrelude and explicitly import Prelude.	John MacFarlane	1	-0/+2
	This seems to be necessary if we are to use our custom Prelude with ghci. Closes #4464.
2018-01-05	Update copyright notices to include 2018	Albert Krewinkel	1	-2/+2

2017-12-28	improve formatting of formulas in OpenDocument	oltolm	1	-8/+30

2017-11-01	hlint	Alexander Krotov	1	-2/+2

2017-10-29	hlint suggestions.	John MacFarlane	1	-11/+9

2017-10-27	Automatic reformating by stylish-haskell.	John MacFarlane	1	-1/+1

2017-09-30	Removed writerSourceURL, add source URL to common state.	John MacFarlane	1	-1/+1
	Removed `writerSourceURL` from `WriterOptions` (API change). Added `stSourceURL` to `CommonState`. It is set automatically by `setInputFiles`. Text.Pandoc.Class now exports `setInputFiles`, `setOutputFile`. The type of `getInputFiles` has changed; it now returns `[FilePath]` instead of `Maybe [FilePath]`. Functions in Class that formerly took the source URL as a parameter now have one fewer parameter (`fetchItem`, `downloadOrRead`, `setMediaResource`, `fillMediaBag`). Removed `WriterOptions` parameter from `makeSelfContained` in `SelfContained`.
2017-08-11	Added support for translations (localization) (see #3559).	John MacFarlane	1	-2/+2
	* readDataFile, readDefaultDataFile, getReferenceDocx, getReferenceODT have been removed from Shared and moved into Class. They are now defined in terms of PandocMonad primitives, rather than being primitve methods of the class. * toLang has been moved from BCP47 to Class. * NoTranslation and CouldNotLoudTranslations have been added to LogMessage. * New module, Text.Pandoc.Translations, exporting Term, Translations, readTranslations. * New functions in Class: translateTerm, setTranslations. Note that nothing is loaded from data files until translateTerm is used; setTranslation just sets the language to be used. * Added two translation data files in data/translations. * LaTeX reader: Support `\setmainlanguage` or `\setdefaultlanguage` (polyglossia) and `\figurename`.
2017-08-10	Removed datadir param from readDataFile and getDefaultTemplate.	John MacFarlane	1	-2/+1
	In Text.Pandoc.Class and Text.Pandoc.Template, resp. We now get the datadir from CommonState.
2017-06-25	BCP47: split toLang from getLang, rearranged types.	John MacFarlane	1	-2/+2

2017-06-25	Moved BCP47 specific functions from Writers.Shared to new module.	John MacFarlane	1	-2/+2
	Text.Pandoc.BCP47 (unexported, internal module). `getLang`, `Lang(..)`, `parseBCP47`.
2017-06-25	Writers.Shared: improve type of Lang and bcp47 parser.	John MacFarlane	1	-3/+3
	Use a real parsec parser for BCP47, include variants.
2017-06-25	Writers.Shared: refactored getLang, splitLang...	John MacFarlane	1	-13/+13
	into `Lang(..)`, `getLang`, `parceBCP47`.
2017-06-25	Support `lang` attribute in OpenDocument and ODT writers.	John MacFarlane	1	-7/+41
	This adds the required attributes to the temporary styles, and also replaces existing language attributes in styles.xml. Support for lang attributes on Div and Span has also been added. Closes #1667.
2017-06-17	Use Control.Monad.State.Strict throughout.	John MacFarlane	1	-1/+1
	This gives 20-30% speedup and reduction of memory usage in most of the writers.
2017-06-11	Switched Writer types to use Text.	John MacFarlane	1	-2/+3
	* XML.toEntities: changed type to Text -> Text. * Shared.tabFilter -- fixed so it strips out CRs as before. * Modified writers to take Text. * Updated tests, benchmarks, trypandoc. [API change] Closes #3731.
2017-05-13	Update dates in copyright notices	Albert Krewinkel	1	-2/+2
	This follows the suggestions given by the FSF for GPL licensed software. <https://www.gnu.org/prep/maintain/html_node/Copyright-Notices.html>
2017-05-02	Added PandocResourceNotFound error.	John MacFarlane	1	-5/+1
	Use this instead of PandocIOError when a resource is not found in path. This improves the error message in this case, see #3629.
2017-03-04	Stylish-haskell automatic formatting changes.	John MacFarlane	1	-20/+20

2017-03-01	ODT writer: calculate aspect ratio for percentage-sized images (#3478)	Mauro Bieg	1	-2/+2
	closes #3239
2017-02-24	Use catchError instead of runExceptT.	John MacFarlane	1	-10/+12

2017-02-22	imageSize interface change	mb21	1	-1/+1
	`imageSize img` is now `imageSize opts img`
2017-02-11	Use new warnings throughout the code base.	John MacFarlane	1	-4/+4

2017-01-25	Text.Pandoc.Shared: Removed fetchItem, fetchItem'.	John MacFarlane	1	-3/+4
	Made changes where these are used, so that the version of fetchItem from PandocMonad can be used instead.
2017-01-25	Class: Removed getDefaultReferenceDocx/ODT from PandocMonad.	John MacFarlane	1	-1/+2
	We don't need these, since the default docx and odt can be retrieved using `readDataFile datadir "reference.docx"` (or odt).
2017-01-25	Simplified reference-docx/reference-odt to reference-doc.	John MacFarlane	1	-1/+1
	* Text.Pandoc.Options.WriterOptions: removed writerReferenceDocx and writerReferenceODT, replaced them with writerReferenceDoc. This can hold either an ODT or a Docx. In this way, writerReferenceDoc is like writerTemplate, which can hold templates of different formats. [API change] * Removed `--reference-docx` and `--reference-odt` options. * Added `--reference-doc` option.
2017-01-25	Class: rename addWarning[WithPos] to warning[WithPos].	John MacFarlane	1	-2/+2
	There's already a function addWarning in Parsing! Maybe we can dispense with that now, but I still like 'warning' better as a name.
2017-01-25	Class: Renamed 'warn' to 'addWarning' and consolidated RTF writer.	John MacFarlane	1	-2/+2
	* Renaming Text.Pandoc.Class.warn to addWarning avoids conflict with Text.Pandoc.Shared.warn. * Removed writeRTFWithEmbeddedImages from Text.Pandoc.Writers.RTF. This is no longer needed; we automatically handle embedded images using the PandocM functions. [API change]
2017-01-25	Convert all writers to use PandocMonad.	Jesse Rosenthal	1	-1/+1
	Since PandocMonad is an instance of MonadError, this will allow us, in a future commit, to change all invocations of `error` to `throwError`, which will be preferable for the pure versions. At the moment, we're disabling the lua custom writers (this is temporary). This requires changing the type of the Writer in Text.Pandoc. Right now, we run `runIOorExplode` in pandoc.hs, to make the conversion easier. We can switch it to the safer `runIO` in the future. Note that this required a change to Text.Pandoc.PDF as well. Since running an external program is necessarily IO, we can be clearer about using PandocIO.
2017-01-25	Convert writers to use PandocMonad typeclass.	Jesse Rosenthal	1	-15/+12
	Instead of Free Monad with runIO
2017-01-25	ODT Writer: fix compiler complaint.	Jesse Rosenthal	1	-1/+0