aboutsummaryrefslogtreecommitdiff
path: root/src/Text/Pandoc/Readers
AgeCommit message (Collapse)AuthorFilesLines
2014-08-17Docx Style parser: Basic one now just takes a parent style.Jesse Rosenthal1-13/+15
This will make it easier to build the style map from the bottom up (to avoid any infinite references).
2014-08-17Docx reader: work with new rStyle.Jesse Rosenthal1-4/+4
Just discards info at the moment, so at least it works the same.
2014-08-17Parser: Framework for parsing styles.Jesse Rosenthal1-11/+44
We want to be able to read user-defined styles. Eventually we'll be able to figure out styles in terms of inheritance as well. The actual cascading will happen in the docx reader.
2014-08-17Docx reader: Change behavior of Super/SubscriptJesse Rosenthal2-16/+17
In docx, super- and subscript are attributes of Vertalign. It makes more sense to follow this, and have different possible values of Vertalign in runStyle. This is mainly a preparatory step for real style parsing, since it can distinguish between vertical align being explicitly turned off and it not being set. In addition, it makes parsing a bit clearer, and makes sure we don't do docx-impossible things like being simultaneously super and sub.
2014-08-16HTML reader: Parse appropriately styled span as SmallCaps.John MacFarlane1-1/+6
2014-08-16Docx reader: Remove unnecessary plural functionsJesse Rosenthal1-11/+5
functions like runElemsToInlines and parPartsToInlines are just defined in terms of concatting and mapping their singular version (e.g. `runElemToInlines`). Having two functions with almost identical names makes it easier to introduce errors. It's easy enough to just concat and map inline, and it makes it clearer what is going on in the code.
2014-08-16Docx reader: Fix bug in character styles.Jesse Rosenthal1-2/+2
Style handling has been cleaned up, but introduced a bug here. There wasn't previously a test to catch it.
2014-08-16Rewrite Docx.hs and Reducible to use Builder.Jesse Rosenthal2-415/+368
The big news here is a rewrite of Docx to use the builder functions. As opposed to previous attempts, we now see a significant speedup -- times are cut in half (or more) in a few informal tests. Reducible has also been rewritten. It can doubtless be simplified and clarified further. We can consider this, at the moment, a reference for correct behavior.
2014-08-14Markdown reader: Better handle quote characters in inline links.John MacFarlane1-2/+4
This was previously failing to be recognized as a link: [Test](http://en.wikipedia.org/wiki/Ward's_method) Closes #1534.
2014-08-13Docx reader: Interpret "Strong" and Emphasis run styles.Jesse Rosenthal1-2/+6
2014-08-13Docx: Reducible forgot about smallcapsJesse Rosenthal1-0/+2
2014-08-12Docx Reader: Trim line breaks from the beginning and end of SectionJesse Rosenthal1-2/+10
Headers. We might also want to do this elsewhere (for pars, for example).
2014-08-12Docx: More robust handling of multiple bookmarks in header.Jesse Rosenthal1-6/+8
2014-08-12Docx reader: Check for null-id'd anchors too.Jesse Rosenthal1-1/+0
Otherwise they get left dangling in the document.
2014-08-12Docx reader: accept explicit "Italic" and "Bold" rStyles.Jesse Rosenthal2-18/+31
Note that "Italic" can be on, and, from the last commit, `<w:i>` can be present, but be turned off. In that case, the turned-off tag takes precedence. So, we have to distinguish between something being off and something not being there. Hence, isItalic, isBold, isStrike, and isSmallCaps have become Maybes.
2014-08-12Docx reader: Add "BlockQuotation" to divs list.Jesse Rosenthal1-1/+1
2014-08-12Docx Reader: Fix font style parsing.Jesse Rosenthal1-12/+27
Before we just checked for the existence of a tag. Now, we make sure to check for its on/off value.
2014-08-12Merge pull request #1528 from mpickering/epubtitlepageJohn MacFarlane1-4/+10
EPUB Reader: Ignores titlepage attribute
2014-08-12EPUB Reader: Ignore title pagesMatthew Pickering1-4/+10
2014-08-12DocBook: Support equations with mathml.John MacFarlane1-4/+16
equation, informalequation, inlineequation and mml:math elements.
2014-08-12Merge pull request #1524 from jkr/dropCap3John MacFarlane2-3/+11
Docx reader: move dropcap combining logic to Reducible
2014-08-12Markdown reader: Improved parsing of indented code in list items.John MacFarlane1-25/+42
Indented code at the beginning of a list item must be indented eight spaces from the margin (or from the edge of the container), or four spaces past the list marker, whichever is farther. Some examples in `tests/markdown-reader-more.txt`.
2014-08-12Docx reader: move combining logic to ReducibleJesse Rosenthal2-3/+11
Introduces a new function in Reducibles, concatR. The idea is that if we have two list of Reducibles (blocks or inlines), we can combine them and just perform the reduction on the joining parts (the last element of the first list, the first element of the second list). This is useful in cases where the two lists are already reduced, and we're only worried about the joining elements. This actually improves the efficiency a bit further, because concatR can be smart about empty lists.
2014-08-12Docx reader: Make dropcap combining more efficient.Jesse Rosenthal1-1/+3
Before, we had to run reduceList on the whole combined paragraph, which was redundant, and could take some time for long paragraphs. We only need to combine the drop cap with the first inline of the next paragraph.
2014-08-11Docx reader: combine inlines properly in dropcaps.Jesse Rosenthal1-1/+1
Make sure that adjacent inlines are combined properly in dropcaps. This updates the test results as well.
2014-08-11Docx reader: Use dropcap state.Jesse Rosenthal1-9/+17
If we get to a dropcap, we keep hold the inlines until the next paragraph, and combine it there.
2014-08-11Add dropCap to paragraph style.Jesse Rosenthal1-2/+12
2014-08-11EPUB reader: use walk instead of bottomUp.John MacFarlane1-2/+1
This should be more efficient.
2014-08-11Merge pull request #1521 from jkr/emptyEmphJohn MacFarlane1-5/+6
Discard empty formatters
2014-08-11Merge pull request #1519 from mpickering/moreJohn MacFarlane1-1/+1
EPUB Normalisation and anchors for div blocks in tex
2014-08-11Textile reader: list and HTML block parsing improvements.John MacFarlane1-16/+13
Closes #1513. Lists can now start without an intervening blank line. Also, html block-level tags that don't start a line are parsed as RawInline and don't interrupt paragraphs, as in RedCloth.
2014-08-11Docx reader: handle empty reducibles.Jesse Rosenthal1-5/+6
2014-08-11EPUB Reader: Fixed another normalisation problem..Matthew Pickering1-1/+1
2014-08-11Merge pull request #1516 from mpickering/epubmetadataJohn MacFarlane1-6/+7
EPUB improvements
2014-08-11Docx Parse: Improved font recognition when specified in rFonts elementMatthew Pickering1-8/+27
2014-08-11Docx Fonts: Derives Show and EqMatthew Pickering1-0/+1
2014-08-11EPUB Reader: Can now parse multiple meta data fieldsMatthew Pickering1-2/+2
2014-08-11EPUB reader: Fixed bug where filepaths weren't sufficiently normalisedMatthew Pickering1-4/+5
2014-08-10Merge pull request #1510 from jkr/spacefixJohn MacFarlane1-10/+12
Docx reader: Fix spacing issue.
2014-08-10Removed OMath module, depend on texmath >= 0.8.John MacFarlane2-439/+1
2014-08-10Change head/tail to pattern guards.Jesse Rosenthal1-7/+8
2014-08-09Docx reader: Fix spacing issue.Jesse Rosenthal1-9/+10
Previously spaces at the beginning of Emph/Strong/etc were kept inside. This makes sure they are moved out.
2014-08-09Docx Parse: Recognises code points in sym elements which are in the private ↵Matthew Pickering1-1/+4
range
2014-08-09Added Text.Pandoc.Readers.Docx.FontsMatthew Pickering1-0/+237
2014-08-09Docx Reader: Added recognition of sym element in paragraphsMatthew Pickering1-0/+19
2014-08-10EPUB: Fixed another mediabag related regression..Matthew Pickering1-3/+5
2014-08-09EPUB Reader: Changed image paths to be relative to manifest fileMatthew Pickering1-6/+6
2014-08-08Merge branch 'newbranch' of https://github.com/mpickering/pandoc into ↵John MacFarlane1-28/+19
mpickering-newbranch Conflicts: src/Text/Pandoc/Readers/EPUB.hs
2014-08-08Added `native_divs` and `native_spans` extensions.John MacFarlane3-10/+9
This allows users to turn off the default pandoc behavior of parsing contents of div and span tags in markdown and HTML as native pandoc Div blocks and Span inlines. Setting of default epub extensions has been moved from the EPUB reader to Text.Pandoc.
2014-08-08EPUB Reader: Improved robustness of image extractionMatthew Pickering1-7/+9
We now maintain the invariant that when fetchImages is called, all images have absolute paths. This patch fixes several bugs relating to this as there are three places where images can be introduced. (1) During the HTML parse (2) As spine elements (3) As a cover image For (1), the paths are corrected by the transformation renameImages For (2) and (3), we need to append the "root" to the path we parse from the spine