aboutsummaryrefslogtreecommitdiff
path: root/src/Text/Pandoc/Readers
AgeCommit message (Collapse)AuthorFilesLines
2016-10-17Infer caption from the text following the imgHubert Plociniczak1-20/+47
Frame can contain other frames with the text boxes. This is something that has not been considered before and meant that the whole construction of images was broken in those cases. Also the captions were fixed/ignored.
2016-10-17RST reader: skip whitespace before note.Jesse Rosenthal1-2/+3
RST requires a space before a footnote marker. We discard those spaces so that footnotes will be adjacent to the text that comes before it. This is in line with what rst2latex does. rst2html does not discard the space, but its html output is different than pandoc's, so this seems the most semantically correct approach. Closes #3163
2016-10-14Org reader: allow figure with empty captionAlbert Krewinkel1-3/+1
A `#+CAPTION` attribute before an image is enough to turn an image into a figure. This wasn't the case because the `parseFromString` function, which processes the caption value, would fail on empty values. Adding a newline character to the caption value fixes this. Fixes: #3161
2016-10-14Merge pull request #3146 from hubertp-lshift/feature/odt-list-start-valueJohn MacFarlane2-13/+21
[ODT Parser] Include list's starting value
2016-10-14Added tests and a corner case for starting numberHubert Plociniczak1-0/+1
Review revealed that we didn't handle the case when the starting point is an empty string. While this is not a valid .odt file, we simply added a special case to deal with it. Also added tests for the new feature.
2016-10-13Parse line-oriented markup as LineBlockAlbert Krewinkel4-9/+9
Markup-features focusing on lines as distinctive part of the markup are read into `LineBlock` elements. This currently means line blocks in reStructuredText and Markdown (the latter only if the `line_block` extension is enabled), the `linegroup`/`line` combination from the Docbook 5.1 working draft, and Org-mode `VERSE` blocks.
2016-10-12[ODT Parser] Include list's starting valueHubert Plociniczak2-13/+20
Previously the starting value of the lists' items has been hardcoded to 1. In reality ODT's list style definition can provide a new starting value in one of its attributes. Writers already handle the modified start value so no need to change anything in that area.
2016-10-12Basic support for images in ODT documentsHubert Plociniczak3-38/+115
Highly influenced by the docx support, refactored some code to avoid DRY.
2016-10-10Org reader: trim verse lines properlyAlbert Krewinkel1-2/+4
An empty verse line should not result in `Str ""` but in `mempty`.
2016-10-02MediaWiki writer: transform filename with underscores in images.John MacFarlane1-1/+1
`foo bar.jpg` becomes `foo_bar.jpg`. This was already done for internal links, but it also needs to happen for images. Closes #3052.
2016-09-28Markdown reader: added bracket syntax for native spans.John MacFarlane1-0/+8
See #168. Text.Pandoc.Options.Extension has a new constructor `Ext_brackted_spans`, which is enabled by default in pandoc's Markdown.
2016-09-02Remove TagSoup compatJesse Rosenthal2-5/+5
We already lower-bound tagsoup at 0.13.7, which means we were always running the compatibility layer (it was conditional on min value 0.13). Better to just use `lookupEntity` from the library directly, and convert a string to a char if need be.
2016-09-02Remove directory compatJesse Rosenthal1-1/+1
directory 1.1 depends on base 4.5 (ghc 7.4) which we are no longer supporting. So we don't have to use a compatibility layer for it.
2016-09-02Remove Text.Pandoc.Compat.ExceptJesse Rosenthal5-8/+5
2016-09-02Fix grouping of imports.Jesse Rosenthal7-7/+8
Some source files keep imports in tidy groups. Changing `Text.Pandoc.Compat.Monoid` to `Data.Monoid` could upset that. This restores tidiness.
2016-09-02Remove Compat.MonoidJesse Rosenthal14-14/+14
This was only necessary for GHC versions with base below 4.5 (i.e., ghc < 7.4).
2016-08-30Org reader: respect unnumbered header propertyAlbert Krewinkel1-2/+10
Sections the `unnumbered` property should, as the name implies, be excluded from the automatic numbering of section provided by some output formats. The Pandoc convention for this is to add an "unnumbered" class to the header. The reader treats properties as key-value pairs per default, so a special case is added to translate the above property to a class instead. Closes #3095.
2016-08-29Docx reader: make all compilers happy with traversable.Jesse Rosenthal1-1/+3
The last attempt to make 7.8 happy made 7.10 unhappy. So we need some conditional logic to appease all versions.
2016-08-29Docx reader: Import traverse for ghc 7.8Jesse Rosenthal1-0/+1
The GHC 7.8 build was erroring without it.
2016-08-29Docx reader: clean up function with `traverse`Jesse Rosenthal1-6/+1
2016-08-29Merge branch 'org-meta-handling'Albert Krewinkel4-69/+195
2016-08-29Org reader: respect `creator` export optionAlbert Krewinkel3-5/+8
The `creator` option controls whether the creator meta-field should be included in the final markup. Setting `#+OPTIONS: creator:nil` will drop the creator field from the final meta-data output. Org-mode recognizes the special value `comment` for this field, causing the creator to be included in a comment. This is difficult to translate to Pandoc internals and is hence interpreted the same as other truish values (i.e. the meta field is kept if it's present).
2016-08-29Org reader: respect `email` export optionAlbert Krewinkel3-5/+7
The `email` option controls whether the email meta-field should be included in the final markup. Setting `#+OPTIONS: email:nil` will drop the email field from the final meta-data output.
2016-08-29Org reader: respect `author` export optionAlbert Krewinkel4-4/+23
The `author` option controls whether the author should be included in the final markup. Setting `#+OPTIONS: author:nil` will drop the author from the final meta-data output.
2016-08-29Org reader: read HTML_head as header-includesAlbert Krewinkel1-0/+2
HTML-specific head content can be defined in `#+HTML_head` lines. They are parsed as format-specific inlines to ensure that they will only show up in HTML output.
2016-08-29Org reader: set classoption meta from LaTeX_class_optionsAlbert Krewinkel1-1/+8
2016-08-29Org reader: set documentclass meta from LaTeX_classAlbert Krewinkel1-0/+1
2016-08-29Org reader: read LaTeX_header as header-includesAlbert Krewinkel1-9/+31
LaTeX-specific header commands can be defined in `#+LaTeX_header` lines. They are parsed as format-specific inlines to ensure that they will only show up in LaTeX output.
2016-08-29Org reader: give precedence to later meta linesAlbert Krewinkel1-1/+1
The last meta-line of any given type is the significant line. Previously the value of the first line was kept, even if more lines of the same type were encounterd.
2016-08-29Org reader: allow multiple, comma-separated authorsAlbert Krewinkel1-1/+9
Multiple authors can be specified in the `#+AUTHOR` meta line if they are given as a comma-separated list.
2016-08-29Org reader: read markup only for special meta keysAlbert Krewinkel1-5/+20
Most meta-keys should be read as normal string values, only a few are interpreted as marked-up text.
2016-08-29Org reader: extract meta parsing code to moduleAlbert Krewinkel2-64/+111
Parsing of meta-data is well separable from other block parsing tasks. Moving into new module to get small files and clearly arranged code.
2016-08-28Docx reader: update copyright.Jesse Rosenthal3-6/+6
2016-08-28Docx reader: use all anchor spans for header ids.Jesse Rosenthal1-1/+1
Previously we only used the first anchor span to affect header ids. This allows us to use all the anchor spans in a header, whether they're nested or not. Along with 62882f97, this closes #3088.
2016-08-28Docx reader: Let headers use exisiting id.Jesse Rosenthal1-6/+10
Previously we always generated an id for headers (since they wouldn't bring one from Docx). Now we let it use an existing one if possible. This should allow us to recurs through anchor spans.
2016-08-28Docx reader: Handle anchor spans with content in headers.Jesse Rosenthal1-7/+8
Previously, we would only be able to figure out internal links to a header in a docx if the anchor span was empty. We change that to read the inlines out of the first anchor span in a header. This still leaves another problem: what to do if there are multiple anchor spans in a header. That will be addressed in a future commit.
2016-08-15StyleMap: export functions on StyleMap instancesJesse Rosenthal1-0/+2
We're going to want `getMap` in the Docx Writer.
2016-08-13Docx parser: Use xml convenience functionsJesse Rosenthal1-38/+27
The functions `isElem` and `elemName` (defined in Docx/Util.hs) make the code a lot cleaner than the original XML.Light functions, but they had been used inconsistently. This puts them in wherever applicable.
2016-08-11Merge pull request #3048 from tarleb/latex-mini-fixJohn MacFarlane1-1/+1
LaTeX reader: drop duplicate `*` in bibtexKeyChars
2016-08-09Merge pull request #3065 from tarleb/org-verse-indentJohn MacFarlane1-1/+10
Org reader: preserve indentation of verse lines
2016-08-09Org reader: ensure image sources are proper linksAlbert Krewinkel3-39/+53
Image sources as those in plain images, image links, or figures, must be proper URIs or relative file paths to be recognized as images. This restriction is now enforced for all image sources. This also fixes the reader's usage of uncleaned image sources, leading to `file:` prefixes not being deleted from figure images (e.g. `[[file:image.jpg]]` leading to a broken image `<img src="file:image.jpg"/>) Thanks to @bsag for noticing this bug.
2016-08-08Org reader: preserve indentation of verse linesAlbert Krewinkel1-1/+10
Leading spaces in verse lines are converted to non-breaking spaces, so indentation is preserved. This fixes #3064.
2016-08-06MediaWiki reader: properly interpret XML tags in pre environments.John MacFarlane1-3/+2
They are meant to be interpreted as literal text in textile. Closes #3042.
2016-08-06Improved mediawiki reader's treatment of verbatim constructions.John MacFarlane1-7/+13
Previously these yielded strings of alternating Code and Space elements; we now incorporate the spaces into the Code. Emphasis etc. is still possible inside these. Closes #3055.
2016-08-06Fix for unquoted attribute values in mediawiki tables.John MacFarlane1-1/+1
Previously an unquoted attribute value in a table row could cause parsing problems. Fixes #3053 (well, proper rowspans and colspans aren't created, but that's a bigger limitation with the current Pandoc document model for tables).
2016-07-29LaTeX reader: drop duplicate `*` in bibtexKeyCharsAlbert Krewinkel1-1/+1
2016-07-22Textile reader: disallow empty URL in explicit link.John MacFarlane1-1/+1
Closes #3036.
2016-07-22Textile reader: support `bc..` extended code blocks.John MacFarlane1-5/+25
Also, remove trailing newline in code blocks (consistently with Markdown reader).
2016-07-20LaTeX reader: be more forgiving of non-standard characters.John MacFarlane1-1/+1
E.g. `^` outside of math. Some custom environments give these a meaning, so we should try not to fall over when we encounter them.
2016-07-20LaTeX reader: more robust parsing of unknown environments.John MacFarlane1-2/+9
We no longer fail on things like `^` inside options for tikz. Closes #3026.