diff options
author | Jesse Rosenthal <jrosenthal@jhu.edu> | 2019-02-21 08:32:57 -0500 |
---|---|---|
committer | Jesse Rosenthal <jrosenthal@jhu.edu> | 2019-02-21 08:32:57 -0500 |
commit | 69d433d37a2b50b2d07f588603a6fbc03041c0af (patch) | |
tree | 4a0707345765a5d25a62b08cf7f5701379a9e6fa /src/Text/Pandoc/Readers | |
parent | ba065cb7f4826244cc4f088ddaa4b72efa2ad6ca (diff) | |
download | pandoc-69d433d37a2b50b2d07f588603a6fbc03041c0af.tar.gz |
Docx reader: Start adding comment to combine module
This module is one of the most opaque parts of the docx reader: it
deals with the fact that runs have non-nesting formatting, so we have
to figure out the nesting on the fly as we combine them.
We start adding commenting, so new developers can understand and, if
necessary, modify this module. Specific function comments will be
added in the future, but this offers a global description of the
purpose of the module.
Diffstat (limited to 'src/Text/Pandoc/Readers')
-rw-r--r-- | src/Text/Pandoc/Readers/Docx/Combine.hs | 40 |
1 files changed, 40 insertions, 0 deletions
diff --git a/src/Text/Pandoc/Readers/Docx/Combine.hs b/src/Text/Pandoc/Readers/Docx/Combine.hs index 2fba3394b..da40a80ea 100644 --- a/src/Text/Pandoc/Readers/Docx/Combine.hs +++ b/src/Text/Pandoc/Readers/Docx/Combine.hs @@ -14,6 +14,46 @@ Flatten sequences of elements. -} + +{- +The purpose of this module is to combine the formatting of separate +runs, which have *non-nesting* formatting. Because the formatting +doesn't nest, you can't actually tell the nesting order until you +combine with the runs that follow. + +For example, say you have a something like `<em><strong>foo</strong> +bar</em>`. Then in ooxml, you'll get these two runs: + +~~~ +<w:r> + <w:rPr> + <w:b /> + <w:i /> + </w:rPr> + <w:t>Foo</w:t> +</w:r> +<w:r> + <w:rPr> + <w:i /> + </w:rPr> + <w:t> Bar</w:t> +</w:r> +~~~ + +Note that this is an ideal situation. In practice, it will probably be +more---if, for example, the user turned italics +off and then on. + +So, when you get the first run, which is marked as both bold and italic, +you have no idea whether it's `Strong [Emph [Str "Foo"]]` or `Emph +[Strong [Str "Foo"]]`. + +We combine two runs, then, by taking off the formatting that modifies an +inline, seeing what is shared between them, and rebuilding an inline. We +fold this to combine the inlines. + +-} + module Text.Pandoc.Readers.Docx.Combine ( smushInlines , smushBlocks ) |