aboutsummaryrefslogtreecommitdiff
path: root/src/Text/Pandoc/Readers
diff options
context:
space:
mode:
authorJesse Rosenthal <jrosenthal@jhu.edu>2019-02-21 08:32:57 -0500
committerJesse Rosenthal <jrosenthal@jhu.edu>2019-02-21 08:32:57 -0500
commit69d433d37a2b50b2d07f588603a6fbc03041c0af (patch)
tree4a0707345765a5d25a62b08cf7f5701379a9e6fa /src/Text/Pandoc/Readers
parentba065cb7f4826244cc4f088ddaa4b72efa2ad6ca (diff)
downloadpandoc-69d433d37a2b50b2d07f588603a6fbc03041c0af.tar.gz
Docx reader: Start adding comment to combine module
This module is one of the most opaque parts of the docx reader: it deals with the fact that runs have non-nesting formatting, so we have to figure out the nesting on the fly as we combine them. We start adding commenting, so new developers can understand and, if necessary, modify this module. Specific function comments will be added in the future, but this offers a global description of the purpose of the module.
Diffstat (limited to 'src/Text/Pandoc/Readers')
-rw-r--r--src/Text/Pandoc/Readers/Docx/Combine.hs40
1 files changed, 40 insertions, 0 deletions
diff --git a/src/Text/Pandoc/Readers/Docx/Combine.hs b/src/Text/Pandoc/Readers/Docx/Combine.hs
index 2fba3394b..da40a80ea 100644
--- a/src/Text/Pandoc/Readers/Docx/Combine.hs
+++ b/src/Text/Pandoc/Readers/Docx/Combine.hs
@@ -14,6 +14,46 @@
Flatten sequences of elements.
-}
+
+{-
+The purpose of this module is to combine the formatting of separate
+runs, which have *non-nesting* formatting. Because the formatting
+doesn't nest, you can't actually tell the nesting order until you
+combine with the runs that follow.
+
+For example, say you have a something like `<em><strong>foo</strong>
+bar</em>`. Then in ooxml, you'll get these two runs:
+
+~~~
+<w:r>
+ <w:rPr>
+ <w:b />
+ <w:i />
+ </w:rPr>
+ <w:t>Foo</w:t>
+</w:r>
+<w:r>
+ <w:rPr>
+ <w:i />
+ </w:rPr>
+ <w:t> Bar</w:t>
+</w:r>
+~~~
+
+Note that this is an ideal situation. In practice, it will probably be
+more---if, for example, the user turned italics
+off and then on.
+
+So, when you get the first run, which is marked as both bold and italic,
+you have no idea whether it's `Strong [Emph [Str "Foo"]]` or `Emph
+[Strong [Str "Foo"]]`.
+
+We combine two runs, then, by taking off the formatting that modifies an
+inline, seeing what is shared between them, and rebuilding an inline. We
+fold this to combine the inlines.
+
+-}
+
module Text.Pandoc.Readers.Docx.Combine ( smushInlines
, smushBlocks
)