aboutsummaryrefslogtreecommitdiff
path: root/src/Text/Pandoc/Readers
AgeCommit message (Collapse)AuthorFilesLines
2017-01-25Working on readers.Jesse Rosenthal24-1111/+1269
2017-01-25Changed readNative to use PandocMonad.John MacFarlane1-3/+6
2017-01-25Deleted whitespace at end of source lines.John MacFarlane2-3/+3
2017-01-25Added page breaks into Pandoc.Hubert Plociniczak2-10/+36
This requires an updated version of pandoc-types that introduces PageBreak definition. Not that this initial commit only introduces ODT pagebreaks and distinguishes for it page breaks before, after, or both, the paragraph, as read from the style definition.
2017-01-19Org reader: allow short hand for single-line raw blocksAlbert Krewinkel2-8/+17
Single-line raw blocks can be given via `#+FORMAT: raw line`, where `FORMAT` must be one of `latex`, `beamer`, `html`, or `texinfo`. Closes: #3366
2017-01-19MediaWiki reader: improved handling of display math.John MacFarlane1-2/+3
Sometimes display math is indented with more than one colon. Previously we handled these cases badly, generating definition lists and missing the math. Closes #3362.
2017-01-08Fixed -f markdown_github-hard_line_breaks+escaped_line_breaks.John MacFarlane1-0/+1
Previously this did not properly enable escaped line breaks. Closes #3341.
2017-01-06Remove pipe char irking the haddock coverage toolAlbert Krewinkel1-1/+1
Haddock documentation strings must be associated with functions. Remove pipe char from a comment that was moved into a `do` block in `Readers/Org/Inlines.hs`.
2017-01-06Org reader: accept org-ref citations followed by commasAlbert Krewinkel1-15/+16
Bugfix for an issue which, whenever the citation was immediately followed by a comma, prevented correct parsing of org-ref citations.
2017-01-05Org reader: ensure emphasis markup can be nestedAlbert Krewinkel1-0/+3
Nested emphasis markup (e.g. `/*strong and emphasized*/`) was interpreted incorrectly in that the inner markup was not recognized.
2017-01-05MediaWiki reader: Fix quotation mark parsing (#3336)tgkokk1-6/+3
Change MediaWiki reader's behavior when the smart option is parsed to match other readers' behavior. Fix #2012.
2016-12-24markdown reader: disallow space between inline code and attributes (#3326)Mauro Bieg1-2/+2
closes #3323
2016-12-13Docx reader: Empty header should be list of lists.Jesse Rosenthal1-9/+11
In the past, the docx reader wrote an empty header as an empty list. It should have the same width as a row (and be filled with empty cells). (Note that I've reordered the code here slightly to get rid of a call to `head`. It wasn't unsafe because it tested for null, but it was a bit of a smell.)
2016-12-08Docx reader: Ensure one-row tables don't have header.Jesse Rosenthal1-1/+2
Tables in MS Word are set by default to have special first-row formatting, which pandoc uses to determine whether or not they have a header. This means that one-row tables will, by default, have only a header -- which we imagine is not what people want. This change ensures that a one-row table is not understood to be a header only. Note that this means that it is impossible to produce a header-only table from docx, even though it is legal pandoc. But we believe that in nearly all cases, it will be an accidental (and unwelcome) result Closes #3285.
2016-12-08Removed debug trace from HTML reader.John MacFarlane1-2/+1
2016-12-07HTML reader: Understand `style=width:` as well as `width` in `col`.John MacFarlane1-2/+7
Closes #3286.
2016-12-07RST reader: print warnings when keys, substitition, notes not found.John MacFarlane1-6/+26
Previously the parsers failed and we got raw text. Now we get a link with an empty URL, or empty inlines in the case of a note or substitution.
2016-12-07RST reader: fix hyperlink aliases.John MacFarlane1-2/+10
`link <google_>`_ .. _google: https://google.com is really a reference link. Closes #3283.
2016-12-06Fixed some bad regressions in HTML table parser.John MacFarlane1-3/+3
This regression leads to the introduction of empty rows in some circumstances. Closes #3280.
2016-11-30Use new module from texmath to lookup MS font codepoints.John MacFarlane2-243/+1
+ Removed Text.Pandoc.Readers.Docx.Fonts + Moved its code to texmath; we now use (from texmath 0.9) Text.TeXMath.Unicode.Fonts + Use texmath 0.9 (currently from git). + Updated epub tests because texmath now handles more mathml.
2016-11-26HTML reader: improved table parsing.John MacFarlane1-11/+24
We now check explicitly for non-1 rowspan or colspan attributes, and fail when we encounter them. Previously we checked that each row had the same number of cells, but that could be true even with rowspans/colspans. And there are cases where it isn't true in tables that we can handle fine -- e.g. when a tr element is empty. So now we just pad rows with empty cells when needed. Closes #3027.
2016-11-26[odt] Infer table's caption from the paragraph (#3224)hubertp-lshift1-6/+21
ODT's reader always put empty captions for the parsed tables. This commit 1) checks paragraphs that follow the table definition 2) treats specially a paragraph with a style named 'Table' 3) does some postprocessing of the paragraphs that combines tables followed immediately by captions The ODT writer used 'TableCaption' style name for the caption paragraph. This commit follows the open office approach which allows for appending captions to table but uses a built-in style named 'Table' instead of 'TableCaption'. Any users of odt format (both writer and reader) are therefore required to change the style's name to 'Table', if necessary.
2016-11-26LaTeX reader: don't treat `\vspace` and `\hspace` as block commands.John MacFarlane1-1/+0
Fixed an error which came up, for example, with `\vspace` inside a caption. (Captions expect inlines.) Closes #3256.
2016-11-24Org reader: respect column width settingsAlbert Krewinkel2-28/+48
Table column properties can optionally specify a column's width with which it is displayed in the buffer. Some exporters, notably the ODT exporter in org-mode v9.0, use these values to calculate relative column widths. The org reader now implements the same behavior. Note that the org-mode LaTeX and HTML exporters in Emacs don't support this feature yet, which should be kept in mind by users who use the column widths parameters. Closes: #3246
2016-11-20Allow beamer-style <...> options in raw LaTeX (also in Markdown).John MacFarlane1-1/+13
This allows use of things like `\only<2,3>{my content}` in Markdown that is going to be converted to beamer. Closes #3184.
2016-11-19LaTeX reader: improved table handling.John MacFarlane1-4/+13
We can now parse all of the tables emitted by pandoc in our tests. The only thing we don't get yet are alignments and column widths in more complex tables. See #2669.
2016-11-19LaTeX reader: limited support for minipage.John MacFarlane1-0/+2
2016-11-19Un-break Travis buildAlbert Krewinkel1-2/+2
Remove whitespace before function documentation The extra spaced cause problems with documentation tools and Travis tests are failing because of this.
2016-11-19LaTeX reader: improved parsing of tables.John MacFarlane1-5/+13
Reader can now parse simple LaTeX tables such as those generated by pandoc itself. We still can't handle pandoc multiline tables which involve minipages and column widths. Partially addresses #2669.
2016-11-19Fixed xref lookup in DocBook reader. Closes #3243.John MacFarlane1-4/+6
It previously only worked when the qnames lacked the docbook namespace URI.
2016-11-19Org reader: Ensure images in paragraphs are not parsed as figuresAlbert Krewinkel3-15/+32
This fixes a regression introduced in 7e5220b57c5a48fabe6e43ba270db812593d3463.
2016-11-16Small caps in Bracketed Spans (#3191)ickc1-1/+7
* Markdown reader: modify bracketedSpan to check small caps * MANUAL.txt: add description on the use of `bracketed_spans` in small cap * Improve markdown readers: bracketedSpan function EXACTLY as spanHtml
2016-11-15Allow alignments to be specified in Markdown grid tables.John MacFarlane1-17/+23
2016-11-13HTML reader: only treat "a" element as link if it has href.John MacFarlane1-7/+19
Otherwise treat as span. Closes #3226.
2016-11-10Docx reader: add a placeholder value for CHART.Jesse Rosenthal2-0/+17
We wrap `[CHART]` in a `<span class="chart">`. Note that it maps to inlines because, in docx, anything in a drawing tag can be part of a larger paragraph.
2016-11-10Docx reader: Be more specific in parsing imagesJesse Rosenthal1-6/+10
We not only want "w:drawing", because that could also include charts. Now we specify "w:drawing"//"pic:pic". This shouldn't change behavior at all, but it's a first step toward allowing other sorts of drawing data as well.
2016-11-09Org reader: allow HTML attribs on non-figure imagesAlbert Krewinkel1-6/+8
Images which are the only element in a paragraph can still be given HTML attributes, even if the image does not have a caption and is hence not a figure. The following will add set the `width` attribute of the image to `50%`: #+ATTR_HTML: :width 50% [[file:image.jpg]] Closes: #3222
2016-11-08Inline code when text has a special styleHubert Plociniczak1-6/+20
When a piece of text has a text 'Source_Text' then we assume that this is a piece of the document that represents a code that needs to be inlined. Addapted an odt writer to also reflect that change; previously it was just writing a 'preformatted' text using a non-distinguishable font style. Code blocks are still not recognized by the ODT reader. That's a separate issue.
2016-11-05Markdown reader: Allow reference link labels starting with @...John MacFarlane1-1/+2
...if citations extension disabled. Example: in [link text][@a] [@a]: url `link text` isn't hyperlinked because `[@a]` is parsed as a citation. Previously this happened whether or not the `citations` extension was enabled. Now it happens only if the `citations` extension is enabled. Closes #3209.
2016-11-02Docx Reader: abstract out function to avoid code repetition.Jesse Rosenthal1-16/+14
2016-11-02Docx reader: Handle Alt text and titles in images.Jesse Rosenthal2-11/+28
We use the "description" field as alt text and the "title" field as title. These can be accessed through the "Format Picture" dialog in Word.
2016-11-02Docx reader utils: handle empty namespace in elemNameJesse Rosenthal1-1/+2
Previously, if given an empty namespace: (elemName ns "" "foo") `elemName` would output a QName with a `Just ""` namespace. This is never what we want. Now we output a `Nothing`. If someone *does* want a `Just ""` in the namespace, they can enter the QName value explicitly.
2016-11-02HTML reader: treat `<math>` as MathML by default...John MacFarlane1-8/+11
unless something else is explicitly specified in xmlns. Provided it parses as MathML, of course. Also fixed default which should be to inline math if no display attribute is used.
2016-11-02LaTeX reader: Handle BVerbatim from fancyvrb. Fixes #3203.John MacFarlane1-10/+15
2016-11-01Handle hungarumlaut in LaTeX reader. Closes #3201.John MacFarlane1-0/+16
2016-11-01[odt] Infer tables' header props from rows (#3199)hubertp-lshift1-2/+9
ODT reader simply provided an empty header list which meant that the contents of the whole table, even if not empty, was simply ignored. While we still do not infer headers we at least have to provide default properties of columns.
2016-10-31LaTeX reader: allow for []s inside LaTeX optional args.John MacFarlane1-1/+2
Fixes cases like: \begin{center} \begin{tikzpicture}[baseline={([yshift=+-.5ex]current bounding box.center)}, level distance=24pt] \Tree [.{S} [.NP John\index{i} ] [.VP [.V likes ] [.NP himself\index{i,*j} ]]] \end{tikzpicture} \end{center}
2016-10-30Org reader: support `ATTR_HTML` for special blocksAlbert Krewinkel1-9/+22
Special blocks (i.e. blocks with unrecognized names) can be prefixed with an `ATTR_HTML` block attribute. The attributes defined in that meta-directive are added to the `Div` which is used to represent the special block. Closes: #3182
2016-10-30Org reader: support the `todo` export optionAlbert Krewinkel3-2/+7
The `todo` export option allows to toggle the inclusion of TODO keywords in the output. Setting this to `nil` causes TODO keywords to be dropped from headlines. The default is to include the keywords.
2016-10-30Org reader: add support for todo-markersAlbert Krewinkel3-5/+98
Headlines can have optional todo-markers which can be controlled via the `#+TODO`, `#+SEQ_TODO`, or `#+TYP_TODO` meta directive. Multiple such directives can be given, each adding a new set of recognized todo-markers. If no custom todo-markers are defined, the default `TODO` and `DONE` markers are used. Todo-markers are conceptually separate from headline text and are hence excluded when autogenerating headline IDs. The markers are rendered as spans and labelled with two classes: One class is the markers name, the other signals the todo-state of the marker (either `todo` or `done`).