Age | Commit message (Collapse) | Author | Files | Lines |
|
The functions `isElem` and `elemName` (defined in Docx/Util.hs) make the
code a lot cleaner than the original XML.Light functions, but they had
been used inconsistently. This puts them in wherever applicable.
|
|
This adds simple track-changes comment parsing to the docx reader. It is
turned on with `--track-changes=all`. All comments are converted to
inlines, which can list some information. In the future a warning will
be added for comments with formatting that seems like it will be
excessively denatured.
Note that comments can extend across blocks. For that reason there are
two spans: `comment-start` and `comment-end`. `comment-start` will
contain the comment. `comment-end` will always be empty. The two will be
associated by a numeric id.
|
|
`moveTo` and `moveFrom` are track-changes tags that are used when a
block of text is moved in the document. We now recognize these tags and
treat them the same as `insert` and `delete`, respectively. So,
`--track-changes=accept` will show the moved version, while
`--track-changes=reject` will show the original version.
|
|
Some word functions -- especially graphics -- give various choices for
content so there can be backwards compatibility. This follows the
largely undocumented feature by working through the choices until we
find one that works.
Note that we had to split out the processing of child elems of runs into
a separate function so we can recurse properly. Any processing of an
element *within* a run (other than a plain run) should go into
`childElemToRun`.
|
|
Word uses list numbering styles to number its headings. We only call
something a numbered list if it does not also heave a heading style.
|
|
In order to be able to collect warnings during parsing, we add a state
monad transformer to the D monad. At the moment, this only includes a
list of warning strings (nothing currently triggers them, however). We
use StateT instead of WriterT to correspond more closely with the
warnings behavior in T.P.Parsing.
|
|
|
|
Change 5527465c introduced a `DummyListItem` type in Docx/Parse.hs. In
retrospect, this seems like it mixes parsing and iterpretation
excessively. What's *really* going on is that we have a list item
without and associate level or numeric info. We can decide what to do
what that in Docx.hs (treat it like a list paragraph), but the parser
shouldn't make that decision.
This commit makes what is going on a bit more explicit. `LevelInfo` is
now a Maybe value in the `ListItem` type. If it's a Nothing, we treat
it as a ListParagraph. If it's a Just, it's a normal list item.
|
|
A residue of a recent change was left around in the form of a
commented-out function. Let's clean that up.
|
|
These come up when people create a list item and then delete the
bullet. It doesn't refer to any real list item, and we used to ignore
it.
We handle it with a DummyListItem type, which, in Docx.hs, is turned
into a normal paragraph with a "ListParagraph" class. If it follow
another list item, it is folded as another paragraph into that item. If
it doesn't, it's just its own (usually indented, and therefore
block-quoted) paragraph.
|
|
There are separate relationship (link) files for foot and
endnotes. These had previously been grouped together which led to
links not working correctly in notes. This should finally fix that.
|
|
This reverts commit c423dbb5a34c2d1195020e0f0ca3aae883d0749b.
|
|
This is needed for ghci to work with pandoc, given that we
now use a custom prelude.
Closes #2503.
|
|
- The (non-exported) prelude is in prelude/Prelude.hs.
- It exports Monoid and Applicative, like base 4.8 prelude,
but works with older base versions.
- It exports (<>) for mappend.
- It hides 'catch' on older base versions.
This allows us to remove many imports of Data.Monoid
and Control.Applicative, and remove Text.Pandoc.Compat.Monoid.
It should allow us to use -Wall again for ghc 7.10.
|
|
|
|
|
|
This allows inherited styles with numbering (lists). It works like this:
1. check to see if the style has numbering info.
2. if the paragraph has explicit numbering info in the doc that takes
precedence.
3. if not we use the numbering info in the style, if it's there.
4. otherwise normal paragraph.
We no longer assume it's not a numbering element if it doesn't have an
explicit level---we just set that level to 1. (In the style files, the
examples I've seen don't have that explicit level.)
|
|
Some older versions of word use vml (vector markup language) and put
their images in a "v:imagedata" tag inside a "w:pict". We read those as
we read the more modern "blip" inside a "w:drawing".
Note that this does not mean the reader knows anything about vml. It
just looks for a `v:imagdata`. It's possible that, with more complicated
uses of images in vml, it won't do the right thing.
|
|
Previously, if a URL had an anchor, such as
http://johnmacfarlane.net/pandoc/README.html#synopsis
the reader would incorrectly identify it as an internal link
and return "#synopsis" for the link in output.
|
|
This patch builds paragraph styles tree, then checks if paragraph has
style.styleId or style/name.val matching predetermined patterns.
Works with "Heading#" (name.val="heading #") for headings and
"Quote"|"BlockQuote"|"BlockQuotation" (name.val="Quote"|"Block Text")
for block quotes.
|
|
Don't use os-sensitive "combine", since we always want the paths in our
zip-archive to use forward-slashes.
|
|
Two points here: (1) We're going bottom-up, from styles not based on
anything, to avoid circular dependencies or any other sort of
maliciousness/incompetence. And (2) each style points to its
parent. That way, we don't need the whole tree to pass a style over to
Docx.hs
|
|
|
|
This will make it easier to build the style map from the bottom up (to
avoid any infinite references).
|
|
We want to be able to read user-defined styles. Eventually we'll be able
to figure out styles in terms of inheritance as well. The actual
cascading will happen in the docx reader.
|
|
In docx, super- and subscript are attributes of Vertalign. It makes more
sense to follow this, and have different possible values of Vertalign in
runStyle. This is mainly a preparatory step for real style parsing,
since it can distinguish between vertical align being explicitly turned
off and it not being set.
In addition, it makes parsing a bit clearer, and makes sure we don't do
docx-impossible things like being simultaneously super and sub.
|
|
Note that "Italic" can be on, and, from the last commit, `<w:i>` can be
present, but be turned off. In that case, the turned-off tag takes
precedence. So, we have to distinguish between something being off and
something not being there. Hence, isItalic, isBold, isStrike, and
isSmallCaps have become Maybes.
|
|
Before we just checked for the existence of a tag. Now, we make sure to
check for its on/off value.
|
|
|
|
|
|
|
|
range
|
|
|
|
This changes the signature of the exported `readOMML` to `String ->
Either String [Exp]`, so it can now, in theory, be slotted into
TeXMath. It doesn't have any real error reporting yet, but that might
make more sense once I put it in a branch, and understand how it works
in the other readers.
It also now reads strings that parse to either oMath or oMathPara
elements. Note that the distinction is lost in the output. It's up to
the caller to remember the display type.
|
|
This gets rid of commented-out functions, cleans up whitespace errors,
and exports and imports the correct functions.
|
|
We still need to test against prefixes, but this is only going to look
at oMath fragments, so we're not going to be worried about looking up
the real namespace.
|
|
Math module
|
|
Previous drawings that were under some other toplevel run (i.e., a
hyperlink) wouldn't be properly handled. This should fix that.
|
|
Could use some cleanup, but this is the first step for getting
an OMML reader into TeXMath.
|
|
|
|
The new version of TeXMath can translate from its type system into
LaTeX. So instead of writing the LaTeX ourself, we write to the TeXMath
`Exp` type, and let TeXMath do the rest.
|
|
The parser had been changing footnotes and endnotes into footnotes. This
isn't a problem, because pandoc collapses them, but the parser should
maintain as much of the docx structure as is collapsed, and let the
toplevel reader worry about how to translate it into Pandoc. (This would
be an issue when, as is planned, the docx parser spins off into its
own module.)
The output is the same, so no test change is required.
|
|
Image data will not be put in a media bag map, which will be output
along with the pandoc output.
|
|
|
|
mtl switched from ErrorT to ExceptT, but we're not sure which mtl we'll
be dealing with. This should make errors work with both.
The main difference (beside the name of the module and the monad
transformer) is that Except doesn't require an instance of an Error
Typeclass. So we define that for compatability. When we switch to a
later mtl, using Control.Monad.Exception, we can just erase the instance
declaration, and all should work fine.
|
|
This modifies the Docx type in the parser to avoid all the extra files
(Notes, numbering, etc). A reader monad keeps track of these, and applies
them at the end. The reader monad is stacked with ErrorT to enable better
error-handling than the old Maybes. (Note that the better error handling
isn't really there yet, but it is now possible.)
One long-term goal of these changes is to make it easier to write the Docx
type. This should make it easier to develop a standalone docx package in the
future.
|
|
|
|
This lets us keep more information about the indentation, and act
accordingly in the reader.
|
|
Remove some redundant ways of dealing with Maybe.
|
|
mapMaybe does the filtering for us.
|