Age | Commit message (Collapse) | Author | Files | Lines |
|
This will allow us to parse instrTxt inside fldChar tags.
|
|
|
|
This will tell us whether a paragraph break was inserted or
deleted. We add a generalized track-changes parsing function, and use
it in `elemToParPart` as well.
|
|
We're going to want to use it elsewhere as well, in upcoming tracking
of paragraph insertion/deletion.
|
|
Previously we had only read the first child of an sdtContents tag. Now
we replace sdt with all children of the sdtContents tag.
This changes the expected test result of our nested_anchors test,
since now we read docx's generated TOCs.
|
|
We walk through the document (using the zipper in
Text.XML.Light.Cursor) to unwrap the sdt tags before doing the rest of
the parsing of the document. Note that the function is generically
named `walkDocument` in case we need to do any further preprocessing
in the future.
Closes #4190
|
|
|
|
|
|
|
|
|
|
We used to parse paragraphs styled with "HeadingN" as "nth-level
header." But if a document has a custom style named "Heading0", this
will produce a 0-level header, which shouldn't exist. We only parse
this style if N>0. Otherwise we treat it as a normal style name, and
follow its dependencies, if any.
Closes #3830.
|
|
This gives 20-30% speedup and reduction of memory
usage in most of the writers.
|
|
This follows the suggestions given by the FSF for GPL licensed software.
<https://www.gnu.org/prep/maintain/html_node/Copyright-Notices.html>
|
|
|
|
Previously we didn't recognize math, for example, when
the xmlns declaration occured on the element and not the root.
Now we recognize either.
Closes #3365.
This patch defines findChildByName, findChildrenByName,
and findAttrByName in Util, and uses these in Parse.
|
|
This just parses inside smartTags and yields their contents,
ignoring the attributes of the smartTag. @jkr, you may want
to adjust this, but I wanted to get a fix in as fast as possible
for the dropped content.
Closes #2242; see also #3412.
|
|
|
|
+ Removed Text.Pandoc.Readers.Docx.Fonts
+ Moved its code to texmath; we now use (from texmath 0.9)
Text.TeXMath.Unicode.Fonts
+ Use texmath 0.9 (currently from git).
+ Updated epub tests because texmath now handles more mathml.
|
|
We wrap `[CHART]` in a `<span class="chart">`. Note that it maps to
inlines because, in docx, anything in a drawing tag can be part of a
larger paragraph.
|
|
We not only want "w:drawing", because that could also include
charts. Now we specify "w:drawing"//"pic:pic". This shouldn't change
behavior at all, but it's a first step toward allowing other sorts of
drawing data as well.
|
|
|
|
We use the "description" field as alt text and the "title" field as
title. These can be accessed through the "Format Picture" dialog in
Word.
|
|
Previously, if given an empty namespace:
(elemName ns "" "foo")
`elemName` would output a QName with a `Just ""` namespace. This is
never what we want. Now we output a `Nothing`. If someone *does* want a
`Just ""` in the namespace, they can enter the QName value explicitly.
|
|
Highly influenced by the docx support, refactored
some code to avoid DRY.
|
|
|
|
|
|
We're going to want `getMap` in the Docx Writer.
|
|
The functions `isElem` and `elemName` (defined in Docx/Util.hs) make the
code a lot cleaner than the original XML.Light functions, but they had
been used inconsistently. This puts them in wherever applicable.
|
|
This adds simple track-changes comment parsing to the docx reader. It is
turned on with `--track-changes=all`. All comments are converted to
inlines, which can list some information. In the future a warning will
be added for comments with formatting that seems like it will be
excessively denatured.
Note that comments can extend across blocks. For that reason there are
two spans: `comment-start` and `comment-end`. `comment-start` will
contain the comment. `comment-end` will always be empty. The two will be
associated by a numeric id.
|
|
`moveTo` and `moveFrom` are track-changes tags that are used when a
block of text is moved in the document. We now recognize these tags and
treat them the same as `insert` and `delete`, respectively. So,
`--track-changes=accept` will show the moved version, while
`--track-changes=reject` will show the original version.
|
|
Some word functions -- especially graphics -- give various choices for
content so there can be backwards compatibility. This follows the
largely undocumented feature by working through the choices until we
find one that works.
Note that we had to split out the processing of child elems of runs into
a separate function so we can recurse properly. Any processing of an
element *within* a run (other than a plain run) should go into
`childElemToRun`.
|
|
Word uses list numbering styles to number its headings. We only call
something a numbered list if it does not also heave a heading style.
|
|
In order to be able to collect warnings during parsing, we add a state
monad transformer to the D monad. At the moment, this only includes a
list of warning strings (nothing currently triggers them, however). We
use StateT instead of WriterT to correspond more closely with the
warnings behavior in T.P.Parsing.
|
|
The docx reader used to use a Modifiable typeclass to combine both
Blocks and Inlines. But all the work was in the inlines. So most of the
generality was wasted, at the expense of making the code harder to
understand. This gets rid of the generality, and adds functions for
Blocks and Inlines. It should be a bit easier to work with going forward.
|
|
We want to make sure that links have their spaces removed, and are
appropriately smushed together.
This closes #2689
|
|
|
|
Change 5527465c introduced a `DummyListItem` type in Docx/Parse.hs. In
retrospect, this seems like it mixes parsing and iterpretation
excessively. What's *really* going on is that we have a list item
without and associate level or numeric info. We can decide what to do
what that in Docx.hs (treat it like a list paragraph), but the parser
shouldn't make that decision.
This commit makes what is going on a bit more explicit. `LevelInfo` is
now a Maybe value in the `ListItem` type. If it's a Nothing, we treat
it as a ListParagraph. If it's a Just, it's a normal list item.
|
|
A residue of a recent change was left around in the form of a
commented-out function. Let's clean that up.
|
|
These come up when people create a list item and then delete the
bullet. It doesn't refer to any real list item, and we used to ignore
it.
We handle it with a DummyListItem type, which, in Docx.hs, is turned
into a normal paragraph with a "ListParagraph" class. If it follow
another list item, it is folded as another paragraph into that item. If
it doesn't, it's just its own (usually indented, and therefore
block-quoted) paragraph.
|
|
There are separate relationship (link) files for foot and
endnotes. These had previously been grouped together which led to
links not working correctly in notes. This should finally fix that.
|
|
This reverts commit c423dbb5a34c2d1195020e0f0ca3aae883d0749b.
|
|
This is needed for ghci to work with pandoc, given that we
now use a custom prelude.
Closes #2503.
|
|
- The (non-exported) prelude is in prelude/Prelude.hs.
- It exports Monoid and Applicative, like base 4.8 prelude,
but works with older base versions.
- It exports (<>) for mappend.
- It hides 'catch' on older base versions.
This allows us to remove many imports of Data.Monoid
and Control.Applicative, and remove Text.Pandoc.Compat.Monoid.
It should allow us to use -Wall again for ghc 7.10.
|
|
|
|
I've messed up badly with it, so it didn't work properly most of the
time. At the plus side, fallback mechanic is working wonderfully.
|
|
|
|
|
|
This allows inherited styles with numbering (lists). It works like this:
1. check to see if the style has numbering info.
2. if the paragraph has explicit numbering info in the doc that takes
precedence.
3. if not we use the numbering info in the style, if it's there.
4. otherwise normal paragraph.
We no longer assume it's not a numbering element if it doesn't have an
explicit level---we just set that level to 1. (In the style files, the
examples I've seen don't have that explicit level.)
|
|
Some older versions of word use vml (vector markup language) and put
their images in a "v:imagedata" tag inside a "w:pict". We read those as
we read the more modern "blip" inside a "w:drawing".
Note that this does not mean the reader knows anything about vml. It
just looks for a `v:imagdata`. It's possible that, with more complicated
uses of images in vml, it won't do the right thing.
|
|
Previously, if a URL had an anchor, such as
http://johnmacfarlane.net/pandoc/README.html#synopsis
the reader would incorrectly identify it as an internal link
and return "#synopsis" for the link in output.
|