Age | Commit message (Collapse) | Author | Files | Lines |
|
|
|
HLint's automatic refactoring isn't quite perfect, so some of its
changes were overcomplicated, wrong, or created new findings.
Clean these up.
|
|
|
|
All warnings are either fixed or, if more appropriate, HLint is
configured to ignore them. HLint suggestions remain.
* Ignore "Use camelCase" warnings in Lua and legacy code
* Fix or ignore remaining HLint warnings
* Remove redundant brackets
* Remove redundant `return`s
* Remove redundant as-pattern
* Fuse mapM_/map
* Use `.` to shorten code
* Remove redundant `fmap`
* Remove unused LANGUAGE pragmas
* Hoist `not` in Text.Pandoc.App
* Use fewer imports for `Text.DocTemplates`
* Remove redundant `do`s
* Remove redundant `$`s
* Jira reader: remove unnecessary parentheses
|
|
* Use concatMap instead of reimplementing it
* Replace an unnecessary multi-way if with a regular if
* Use sortOn instead of sortBy and comparing
* Use guards instead of lots of indents for if and else
* Remove redundant do blocks
* Extract common functions from both branches of maybe
Whenever both the Nothing and the Just branch of maybe do the same
function, do that function on the result of maybe instead.
* Use fmap instead of reimplementing it from maybe
* Use negative forms instead of negating the positive forms
* Use mapMaybe instead of mapping and then using catMaybes
* Use zipWith instead of mapping over the result of zip
* Use unwords instead of reimplementing it
* Use <$ instead of <$> and const
* Replace case of Bool with if and else
* Use find instead of listToMaybe and filter
* Use zipWithM instead of mapM and zip
* Inline lambda wrappers into the real functions
* We get zipWithM from Text.Pandoc.Writers.Shared
* Use maybe instead of fromMaybe and fmap
I'm not sure how this one slipped past me.
* Increase a bit of indentation
|
|
PR #5884.
+ Use pandoc-types 1.20 and texmath 0.12.
+ Text is now used instead of String, with a few exceptions.
+ In the MediaBag module, some of the types using Strings
were switched to use FilePath instead (not Text).
+ In the Parsing module, new parsers `manyChar`, `many1Char`,
`manyTillChar`, `many1TillChar`, `many1Till`, `manyUntil`,
`mantyUntilChar` have been added: these are like their
unsuffixed counterparts but pack some or all of their output.
+ `glob` in Text.Pandoc.Class still takes String since it seems
to be intended as an interface to Glob, which uses strings.
It seems to be used only once in the package, in the EPUB writer,
so that is not hard to change.
|
|
The left-to-right direction setting in docx is used in the spec only
for overriding an explicit right-to-left setting. We only process it
when it happens in a paragraph set with BiDi.
This is especially important for docs exported from Google Docs, which
explicitly (and unnecessarily) set "rtl=0" for every paragraph.
Closes: #5723
|
|
* [Docx Parser] Move style-parsing-specific code to a new module
* [Docx Writer] Re-use Readers.Docx.Parse.Styles for StyleMap
* [Docx Writer] Move Readers.Docx.StyleMap to Writers.Docx.StyleMap
It's never used outside of writer code, so it makes more sense to scope it under writers really.
|
|
Motivating issues: #5523, #5052, #5074
Style name comparisons are case-insensitive, since those are
case-insensitive in Word.
w:styleId will be used as style name if w:name is missing (this should
only happen for malformed docx and is kept as a fallback to avoid
failing altogether on malformed documents)
Block quote detection code moved from Docx.Parser to Readers.Docx
Code styles, i.e. "Source Code" and "Verbatim Char" now honor style
inheritance
Docx Reader now honours "Compact" style (used in Pandoc-generated docx).
The side-effect is that "Compact" style no longer shows up in
docx+styles output. Styles inherited from "Compact" will still
show up.
Removed obsolete list-item style from divsToKeep. That didn't
really do anything for a while now.
Add newtypes to differentiate between style names, ids, and
different style types (that is, paragraph and character styles)
Since docx style names can have spaces in them, and pandoc-markdown
classes can't, anywhere when style name is used as a class name,
spaces are replaced with ASCII dashes `-`.
Get rid of extraneous intermediate types, carrying styleId information.
Instead, styleId is saved with other style data.
Use RunStyle for inline style definitions only (lacking styleId and styleName);
for Character Styles use CharStyle type (which is basicaly RunStyle with styleId
and StyleName bolted onto it).
|
|
Reduce code duplication, remove redundant brackets, use newtype instead of data where appropriate
|
|
Closes #5545.
|
|
The haddock module header contains essentially the
same information, so the boilerplate is redundant and
just one more thing to get out of sync.
|
|
This module is one of the most opaque parts of the docx reader: it
deals with the fact that runs have non-nesting formatting, so we have
to figure out the nesting on the fly as we combine them.
We start adding commenting, so new developers can understand and, if
necessary, modify this module. Specific function comments will be
added in the future, but this offers a global description of the
purpose of the module.
|
|
We have to add one final mempty when we're combining in order to trim
inlines appropriately. (We need to use our own trimming routines here
due to the way that formatted inlines are smushed together when
converting from docx.)
Closes #5273
|
|
We had previously walked the document to unwrap sdt/sdtContent and
smartTag tags in `word/document.xml`, but not in the
`word/{foot/end}note.xml` and `word/comments.xml`.
Closes #5302
|
|
Some paths in archives are absolute (have an opening slash) which, for
reasons unknown, produces a failure in the test suite on MS
Windows. This fixes that by removing the leading slash if it exists.
Closes #5277 (previously closed with 4cce0ef but reopened due to this bug).
|
|
This reverts commit 2142bbe572cea00b7bb5ad3e10a3afb26845a1f7.
|
|
Try fixing a parsing error on windows by insisting that the parser use
a Posix filepath library for splitting doc paths in a zipfile. (It
might default on Windows to using a backslash as a separator, while
it's always a forward-slash in zip archives.)
|
|
* clarify function name. We had previously used `getDocumentPath`,
but `Document` is an overdetermined term here. Use
`getDocumentXmlPath` to make clear what we're doing.
* Use field notation for setting ReaderEnv. As we've added (and
continue to add) fields, the assignment by position has gotten
harder to read.
* figure out document.xml path once at the beginning of parsing, and
add it to the environment, so we can avoid repeated lookups.
|
|
Getting the location used to depend on a hard-coded .rels file based
on "word/document.xml". We now dynamically detect that file based on
the document.xml file specified in "_rels/.rels"
|
|
The desktop Word program places the main document file in
"word/document.xml", but the online word places it in
"word/document2.xml". This file path is actually stated in the root
"_rels/.rels" file, in the "Relationship" element with an
"http://../officedocument" type.
Closes #5277
|
|
For some reason, Word in Office 365 Online uses `document2.xml`
for the content, instead of `document.xml`. This causes pandoc
not to be able to parse docx.
This quick fix has the parser check for both `document.xml`
and `document2.xml`.
Addresses #5277, but a more robust solution would be to
get the name of the main document dynamically (who knows
whether it might change again?).
|
|
Quite a few modules were missing copyright notices.
This commit adds copyright notices everywhere via haddock module
headers. The old license boilerplate comment is redundant with this and has
been removed.
Update copyright years to 2019.
Closes #4592.
|
|
|
|
There can be overrides for the definitions of certain levels in
numbering definitions. This implements that behavior.
Closes: #5134
|
|
|
|
It had previously been an alias for a tuple.
|
|
These are variants for "complex scripts" like Arabic and
are now treated just like b, i (bold, italic).
Colses #4947.
|
|
|
|
|
|
|
|
This prevents a multiline codeblock in word from being read as
different paragraphs. This takes place in the Combine module to occur
during the normal combining of divs during conversion.
Note that this specifies that the attributes of the `CodeBlock`s must
be the same. The docx reader creates codeBlocks without attrs, so this
is trivially satisified.
|
|
This seems to be necessary if we are to use our custom Prelude
with ghci.
Closes #4464.
|
|
|
|
Make unwrapSDT into a general `unwrap` function that can unwrap both
nested SDT tags and smartTags. This makes the SmartTags constructor in
the Docx type unnecessary, so we remove it.
Closes #4446
|
|
Previously we had only unwrapped one level of sdt tags. Now we recurse
if we find them.
Closes: #4415
|
|
This fixes an import error in the last commit.
|
|
Previously, unquoted string required a space at the end of the
line (and consumed it). Now we either take a space (and don't consume
it), or end of input.
|
|
|
|
We introduce a new module, Text.Pandoc.Readers.Docx.Fields which
contains a simple parsec parser. At the moment, only simple hyperlink
fields are accepted, but that can be extended in the future.
|
|
This will allow us to parse instrTxt inside fldChar tags.
|
|
|
|
This will tell us whether a paragraph break was inserted or
deleted. We add a generalized track-changes parsing function, and use
it in `elemToParPart` as well.
|
|
We're going to want to use it elsewhere as well, in upcoming tracking
of paragraph insertion/deletion.
|
|
Previously we had only read the first child of an sdtContents tag. Now
we replace sdt with all children of the sdtContents tag.
This changes the expected test result of our nested_anchors test,
since now we read docx's generated TOCs.
|
|
We walk through the document (using the zipper in
Text.XML.Light.Cursor) to unwrap the sdt tags before doing the rest of
the parsing of the document. Note that the function is generically
named `walkDocument` in case we need to do any further preprocessing
in the future.
Closes #4190
|
|
|
|
|
|
|
|
|