aboutsummaryrefslogtreecommitdiff
path: root/src/Text/Pandoc/Readers/HTML.hs
AgeCommit message (Collapse)AuthorFilesLines
2015-07-21HTML reader: handle type attribute on ol.John MacFarlane1-1/+8
E.g. `<ol type="i">`. Closes #2313.
2015-07-10Avoid parsing partial URLs as HTML tags.John MacFarlane1-1/+8
Closes #2277.
2015-06-04HTML reader: allow `<body>` to close `<head>`.John MacFarlane1-0/+1
2015-05-13HTML reader: Support base tag.John MacFarlane1-7/+28
We only support the href attribute, as there's no place for "target" in the Pandoc document model for links. Added HTML reader test module, with tests for this feature. Closes #1751.
2015-05-11HTML reader: Fixed detection of self-closing tags.John MacFarlane1-2/+2
Earlier versions had a bug and would wrongly think opening tags containing attributes with slashes in them were self-closing. Closes #2146.
2015-04-29HTML reader: Allow multiple colgroups in table.John MacFarlane1-1/+1
Closes #2122.
2015-04-26Updated copyright notices to -2015. Closes #2111.John MacFarlane1-2/+2
2015-04-17More principled fix for #1820.John MacFarlane1-5/+7
If the tag parses as a comment, we check to see if the input starts with `<!--`. If not, it's bogus comment mode and we fail htmlTag. Includes test case. Closes #1820.
2015-04-17Fixed `htmlTag` in HTML reader.John MacFarlane1-1/+1
Require that `<!` or `<?` be followed by nonspace. This prevents `</ div>` from being parsed as a comment. Closes #1820.
2015-02-18Move utility error functions to Text.Pandoc.SharedMatthew Pickering1-1/+1
2015-02-18Change return type of HTML readerMatthew Pickering1-5/+12
2015-01-25fixes #1859 HTML Reader table parsingmb211-11/+22
2014-11-16Make `embed` tag either block or inline.John MacFarlane1-2/+2
Closes #1756.
2014-09-25HTML Reader: Recognise <br> tags inside <pre> blocksmpickering1-1/+6
Closes #1620
2014-08-18HTML reader: improved handling of tags that can be block or inline.John MacFarlane1-5/+13
Previously a section like this would be enclosed in a paragraph, with RawInline for the video tags (since video is a tag that can be either block or inline): <video controls="controls"> <source src="../videos/test.mp4" type="video/mp4" /> <source src="../videos/test.webm" type="video/webm" /> <p> The videos can not be played back on your system.<br/> Try viewing on Youtube (requires Internet connection): <a href="http://youtu.be/etE5urBps_w">Relative Velocity on Youtube</a>. </p> </video> This change will cause the video and source tags to be parsed as RawBlock instead, giving better output. The general change is this: when we're parsing a "plain" sequence of inlines, we don't parse anything that COULD be a block-level tag.
2014-08-16HTML reader: Parse appropriately styled span as SmallCaps.John MacFarlane1-1/+6
2014-08-12EPUB Reader: Ignore title pagesMatthew Pickering1-4/+10
2014-08-08Added `native_divs` and `native_spans` extensions.John MacFarlane1-1/+4
This allows users to turn off the default pandoc behavior of parsing contents of div and span tags in markdown and HTML as native pandoc Div blocks and Span inlines. Setting of default epub extensions has been moved from the EPUB reader to Text.Pandoc.
2014-08-08HTML EPUB exts: switch element can now be in either the inline or block positionMatthew Pickering1-9/+10
2014-08-07HTML reader: Really ignore DOCTYPE and xml declarations.John MacFarlane1-2/+2
This actually does what d71b013841f3c9c8c595591e312a31df16a728cb said it did. Revised epub tests to remove the repeated DOCTYPE and xml tags.
2014-08-04HTML reader: ignore <?xml..> and <DOCTYPE..> tags.John MacFarlane1-1/+1
Previously they were parsed as raw.
2014-08-04Use texmath 0.7 interface.John MacFarlane1-2/+2
2014-07-31HTML Reader: Added ability to read MathML formatted <math> blocksMatthew Pickering1-0/+16
2014-07-31HTML Reader: Added support for anchors on links and list itemsMatthew Pickering1-4/+22
2014-07-31HTML Reader: Extended HTML Reader to recognise EPUB specific elementsMatthew Pickering1-28/+178
2014-07-26Generalised more in Parsing.hs to enable the use of custom stateMatthew Pickering1-18/+61
2014-07-20HTML reader: parse Div and Span elements even without `--parse-raw`.John MacFarlane1-2/+0
Closes #1434.
2014-07-11Removed (>>~) functionMatthew Pickering1-3/+3
This function is equivalent to the more general (<*) which is defined in Control.Applicative. This change makes pandoc code easier to understand for those not familar with the codebase.
2014-07-07HTML reader: adjust `blockTags` and `eitherBlockOrInline`.John MacFarlane1-9/+13
- Added `audio` and `source` in `eitherBlockOrInline`. - Moved `video`, `svg`, `progress`, `script`, `noscript`, `svg` from `blockTags` to `eitherBlockOrInline`. - `map` and `object` were mistakenly in both lists; they have been removed from `blockTags`.
2014-06-20HTML reader: Fix performance issue with malformed HTML tables.John MacFarlane1-0/+2
We let a `</table>` tag close an open `<tr>` or `<td>`. Closes #1167.
2014-06-20Support --trace in HTML reader.John MacFarlane1-1/+10
2014-06-19HTML reader: Allow space between `<col>` and `</col>`.John MacFarlane1-0/+1
Test case: ``` <table border="1"> <colgroup> <col> </col> <col></col> </colgroup> <tbody> <tr> <td>X</td> <td>Y</td> </tr> <tr> <td>1</td> <td>2</td> </tr> </tbody> </table> ```
2014-06-16HTML reader: Fixed major parsing problem with HTML tables.John MacFarlane1-15/+11
Table cells were being combined into one cell. Closes #1341.
2014-06-16Moved extractSpaces to Shared.hsmpickering1-13/+4
Generalised and move the extractSpaces function from `HTML.hs` to `Shared.hs` so that the docx reader can also use it.
2014-05-09Update copyright notices for 2014, add missing noticesAlbert Krewinkel1-2/+2
2014-04-11HTML reader: Treat processing instructions & declarations as block.John MacFarlane1-5/+9
Previously these were treated as inline, and included in paragraph tags in HTML or DocBook output, which is generally not what is wanted. Closes #1233.
2014-04-05HTML reader: Updated `closes` with rules from HTML5 spec.John MacFarlane1-5/+12
2014-04-01HTML reader: idiomatic rewriting for clarity.John MacFarlane1-5/+4
2014-04-01Converted HTML reader to use builder. Fixes #1162.Matthew Pickering1-109/+126
2014-01-20HTML reader: Fixed bug reading inline math with `$$`.John MacFarlane1-2/+2
See #225.
2014-01-01HTML reader: Parse name/content pairs from meta tags as metadata.John MacFarlane1-1/+10
Closes #1106.
2013-12-19HLint: use fromMaybeHenry de Valence1-2/+2
Replace uses of `maybe x id` with `fromMaybe x`.
2013-12-06HTML reader: Parse LaTeX math if appropriate options are set.John MacFarlane1-1/+8
* Moved inlineMath, displayMath from Markdown reader to Parsing. * Export them from Parsing. (API change.) * Generalize their types.
2013-11-07recognize svg tag in HTML ReaderMinRK1-1/+1
avoids adding lots of `<p>` tags in embedded SVG content, for instance in markdown to HTML.
2013-11-03HTML reader: Use pandoc Div and Span for raw "<div>", "<span>".John MacFarlane1-10/+25
Only if --parse-raw.
2013-08-10Adjustments for new Format newtype.John MacFarlane1-2/+2
2013-07-16HTML reader: read widths from col tags if present.John MacFarlane1-6/+23
Closes #893.
2013-07-16HTML reader: Handle non-simple tables (#893).John MacFarlane1-3/+9
Column widths are divided equally. TODO: Get column widths from col tags if present.
2013-07-16HTML reader: Generalized table parser.John MacFarlane1-4/+9
This commit doesn't change the present behavior at all, but it will make it easier to support non-simple tables in the future.
2013-06-24Use new flexible metadata type.John MacFarlane1-23/+20
* Depend on pandoc 1.12. * Added yaml dependency. * `Text.Pandoc.XML`: Removed `stripTags`. (API change.) * `Text.Pandoc.Shared`: Added `metaToJSON`. This will be used in writers to create a JSON object for use in the templates from the pandoc metadata. * Revised readers and writers to use the new Meta type. * `Text.Pandoc.Options`: Added `Ext_yaml_title_block`. * Markdown reader: Added support for YAML metadata block. Note that it must come at the beginning of the document. * `Text.Pandoc.Parsing.ParserState`: Replace `stateTitle`, `stateAuthors`, `stateDate` with `stateMeta`. * RST reader: Improved metadata. Treat initial field list as metadata when standalone specified. Previously ALL fields "title", "author", "date" in field lists were treated as metadata, even if not at the beginning. Use `subtitle` metadata field for subtitle. * `Text.Pandoc.Templates`: Export `renderTemplate'` that takes a string instead of a compiled template.. * OPML template: Use 'for' loop for authors. * Org template: '#+TITLE:' is inserted before the title. Previously the writer did this.