aboutsummaryrefslogtreecommitdiff
path: root/src/Text/Pandoc/Readers
AgeCommit message (Collapse)AuthorFilesLines
2016-11-19LaTeX reader: improved parsing of tables.John MacFarlane1-5/+13
Reader can now parse simple LaTeX tables such as those generated by pandoc itself. We still can't handle pandoc multiline tables which involve minipages and column widths. Partially addresses #2669.
2016-11-19Fixed xref lookup in DocBook reader. Closes #3243.John MacFarlane1-4/+6
It previously only worked when the qnames lacked the docbook namespace URI.
2016-11-19Org reader: Ensure images in paragraphs are not parsed as figuresAlbert Krewinkel3-15/+32
This fixes a regression introduced in 7e5220b57c5a48fabe6e43ba270db812593d3463.
2016-11-16Small caps in Bracketed Spans (#3191)ickc1-1/+7
* Markdown reader: modify bracketedSpan to check small caps * MANUAL.txt: add description on the use of `bracketed_spans` in small cap * Improve markdown readers: bracketedSpan function EXACTLY as spanHtml
2016-11-15Allow alignments to be specified in Markdown grid tables.John MacFarlane1-17/+23
2016-11-13HTML reader: only treat "a" element as link if it has href.John MacFarlane1-7/+19
Otherwise treat as span. Closes #3226.
2016-11-10Docx reader: add a placeholder value for CHART.Jesse Rosenthal2-0/+17
We wrap `[CHART]` in a `<span class="chart">`. Note that it maps to inlines because, in docx, anything in a drawing tag can be part of a larger paragraph.
2016-11-10Docx reader: Be more specific in parsing imagesJesse Rosenthal1-6/+10
We not only want "w:drawing", because that could also include charts. Now we specify "w:drawing"//"pic:pic". This shouldn't change behavior at all, but it's a first step toward allowing other sorts of drawing data as well.
2016-11-09Org reader: allow HTML attribs on non-figure imagesAlbert Krewinkel1-6/+8
Images which are the only element in a paragraph can still be given HTML attributes, even if the image does not have a caption and is hence not a figure. The following will add set the `width` attribute of the image to `50%`: #+ATTR_HTML: :width 50% [[file:image.jpg]] Closes: #3222
2016-11-08Inline code when text has a special styleHubert Plociniczak1-6/+20
When a piece of text has a text 'Source_Text' then we assume that this is a piece of the document that represents a code that needs to be inlined. Addapted an odt writer to also reflect that change; previously it was just writing a 'preformatted' text using a non-distinguishable font style. Code blocks are still not recognized by the ODT reader. That's a separate issue.
2016-11-05Markdown reader: Allow reference link labels starting with @...John MacFarlane1-1/+2
...if citations extension disabled. Example: in [link text][@a] [@a]: url `link text` isn't hyperlinked because `[@a]` is parsed as a citation. Previously this happened whether or not the `citations` extension was enabled. Now it happens only if the `citations` extension is enabled. Closes #3209.
2016-11-02Docx Reader: abstract out function to avoid code repetition.Jesse Rosenthal1-16/+14
2016-11-02Docx reader: Handle Alt text and titles in images.Jesse Rosenthal2-11/+28
We use the "description" field as alt text and the "title" field as title. These can be accessed through the "Format Picture" dialog in Word.
2016-11-02Docx reader utils: handle empty namespace in elemNameJesse Rosenthal1-1/+2
Previously, if given an empty namespace: (elemName ns "" "foo") `elemName` would output a QName with a `Just ""` namespace. This is never what we want. Now we output a `Nothing`. If someone *does* want a `Just ""` in the namespace, they can enter the QName value explicitly.
2016-11-02HTML reader: treat `<math>` as MathML by default...John MacFarlane1-8/+11
unless something else is explicitly specified in xmlns. Provided it parses as MathML, of course. Also fixed default which should be to inline math if no display attribute is used.
2016-11-02LaTeX reader: Handle BVerbatim from fancyvrb. Fixes #3203.John MacFarlane1-10/+15
2016-11-01Handle hungarumlaut in LaTeX reader. Closes #3201.John MacFarlane1-0/+16
2016-11-01[odt] Infer tables' header props from rows (#3199)hubertp-lshift1-2/+9
ODT reader simply provided an empty header list which meant that the contents of the whole table, even if not empty, was simply ignored. While we still do not infer headers we at least have to provide default properties of columns.
2016-10-31LaTeX reader: allow for []s inside LaTeX optional args.John MacFarlane1-1/+2
Fixes cases like: \begin{center} \begin{tikzpicture}[baseline={([yshift=+-.5ex]current bounding box.center)}, level distance=24pt] \Tree [.{S} [.NP John\index{i} ] [.VP [.V likes ] [.NP himself\index{i,*j} ]]] \end{tikzpicture} \end{center}
2016-10-30Org reader: support `ATTR_HTML` for special blocksAlbert Krewinkel1-9/+22
Special blocks (i.e. blocks with unrecognized names) can be prefixed with an `ATTR_HTML` block attribute. The attributes defined in that meta-directive are added to the `Div` which is used to represent the special block. Closes: #3182
2016-10-30Org reader: support the `todo` export optionAlbert Krewinkel3-2/+7
The `todo` export option allows to toggle the inclusion of TODO keywords in the output. Setting this to `nil` causes TODO keywords to be dropped from headlines. The default is to include the keywords.
2016-10-30Org reader: add support for todo-markersAlbert Krewinkel3-5/+98
Headlines can have optional todo-markers which can be controlled via the `#+TODO`, `#+SEQ_TODO`, or `#+TYP_TODO` meta directive. Multiple such directives can be given, each adding a new set of recognized todo-markers. If no custom todo-markers are defined, the default `TODO` and `DONE` markers are used. Todo-markers are conceptually separate from headline text and are hence excluded when autogenerating headline IDs. The markers are rendered as spans and labelled with two classes: One class is the markers name, the other signals the todo-state of the marker (either `todo` or `done`).
2016-10-26Markdown Reader: add attributes for autolink (#3183)Daniele D'Orazio1-1/+3
2016-10-24Export Text.Pandoc.Error in Text.Pandoc.John MacFarlane1-3/+2
[API change]
2016-10-22Added `angle_brackets_escapable` extension.John MacFarlane1-0/+2
This is needed because github flavored Markdown has a slightly different set of escapable symbols than original Markdown; it includes angle brackets. Closes #2846.
2016-10-22EPUB reader: don't add root path to data: URIs.John MacFarlane1-1/+3
Closes #3150. Thanks to @lep for the bug report and patch.
2016-10-19Image with a caption needs special formattingHubert Plociniczak1-2/+6
Latex Writer only handles captions if the image's title is prefixed with 'fig:'.
2016-10-18Merge pull request #3166 from hubertp-lshift/bug/3134John MacFarlane1-3/+2
Issue 3143: Don't duplicate text for anchors
2016-10-18Merge pull request #3165 from hubertp-lshift/feature/odt-imageJohn MacFarlane3-38/+138
[odt] images parser
2016-10-18Better fix for the problem with ghc 7.8.John MacFarlane1-1/+3
2016-10-18Try to fix build error on ghc 7.8.John MacFarlane1-1/+1
@tarleb this is an interesting one, see the build log in https://travis-ci.org/jgm/pandoc/jobs/168612017 It only failed on ghc 7.8; I think this must have to do with the change making Monad a superclass of Applicative, hence this change.
2016-10-18Issue 3143: Don't duplicate text for anchorsHubert Plociniczak1-3/+2
When creating an anchor element we were adding its representation as well as the original content, leading to text duplication.
2016-10-17Minor refactoringHubert Plociniczak1-10/+6
2016-10-17Infer caption from the text following the imgHubert Plociniczak1-20/+47
Frame can contain other frames with the text boxes. This is something that has not been considered before and meant that the whole construction of images was broken in those cases. Also the captions were fixed/ignored.
2016-10-17RST reader: skip whitespace before note.Jesse Rosenthal1-2/+3
RST requires a space before a footnote marker. We discard those spaces so that footnotes will be adjacent to the text that comes before it. This is in line with what rst2latex does. rst2html does not discard the space, but its html output is different than pandoc's, so this seems the most semantically correct approach. Closes #3163
2016-10-14Org reader: allow figure with empty captionAlbert Krewinkel1-3/+1
A `#+CAPTION` attribute before an image is enough to turn an image into a figure. This wasn't the case because the `parseFromString` function, which processes the caption value, would fail on empty values. Adding a newline character to the caption value fixes this. Fixes: #3161
2016-10-14Merge pull request #3146 from hubertp-lshift/feature/odt-list-start-valueJohn MacFarlane2-13/+21
[ODT Parser] Include list's starting value
2016-10-14Added tests and a corner case for starting numberHubert Plociniczak1-0/+1
Review revealed that we didn't handle the case when the starting point is an empty string. While this is not a valid .odt file, we simply added a special case to deal with it. Also added tests for the new feature.
2016-10-13Parse line-oriented markup as LineBlockAlbert Krewinkel4-9/+9
Markup-features focusing on lines as distinctive part of the markup are read into `LineBlock` elements. This currently means line blocks in reStructuredText and Markdown (the latter only if the `line_block` extension is enabled), the `linegroup`/`line` combination from the Docbook 5.1 working draft, and Org-mode `VERSE` blocks.
2016-10-12[ODT Parser] Include list's starting valueHubert Plociniczak2-13/+20
Previously the starting value of the lists' items has been hardcoded to 1. In reality ODT's list style definition can provide a new starting value in one of its attributes. Writers already handle the modified start value so no need to change anything in that area.
2016-10-12Basic support for images in ODT documentsHubert Plociniczak3-38/+115
Highly influenced by the docx support, refactored some code to avoid DRY.
2016-10-10Org reader: trim verse lines properlyAlbert Krewinkel1-2/+4
An empty verse line should not result in `Str ""` but in `mempty`.
2016-10-02MediaWiki writer: transform filename with underscores in images.John MacFarlane1-1/+1
`foo bar.jpg` becomes `foo_bar.jpg`. This was already done for internal links, but it also needs to happen for images. Closes #3052.
2016-09-28Markdown reader: added bracket syntax for native spans.John MacFarlane1-0/+8
See #168. Text.Pandoc.Options.Extension has a new constructor `Ext_brackted_spans`, which is enabled by default in pandoc's Markdown.
2016-09-02Remove TagSoup compatJesse Rosenthal2-5/+5
We already lower-bound tagsoup at 0.13.7, which means we were always running the compatibility layer (it was conditional on min value 0.13). Better to just use `lookupEntity` from the library directly, and convert a string to a char if need be.
2016-09-02Remove directory compatJesse Rosenthal1-1/+1
directory 1.1 depends on base 4.5 (ghc 7.4) which we are no longer supporting. So we don't have to use a compatibility layer for it.
2016-09-02Remove Text.Pandoc.Compat.ExceptJesse Rosenthal5-8/+5
2016-09-02Fix grouping of imports.Jesse Rosenthal7-7/+8
Some source files keep imports in tidy groups. Changing `Text.Pandoc.Compat.Monoid` to `Data.Monoid` could upset that. This restores tidiness.
2016-09-02Remove Compat.MonoidJesse Rosenthal14-14/+14
This was only necessary for GHC versions with base below 4.5 (i.e., ghc < 7.4).
2016-08-30Org reader: respect unnumbered header propertyAlbert Krewinkel1-2/+10
Sections the `unnumbered` property should, as the name implies, be excluded from the automatic numbering of section provided by some output formats. The Pandoc convention for this is to add an "unnumbered" class to the header. The reader treats properties as key-value pairs per default, so a special case is added to translate the above property to a class instead. Closes #3095.