aboutsummaryrefslogtreecommitdiff
path: root/src/Text/Pandoc/Parsing.hs
AgeCommit message (Collapse)AuthorFilesLines
2011-01-19More small parser rewrites for small performance gains.John MacFarlane1-9/+11
2011-01-19Parsing: Rewrote spaceChar for significant speedup in readers.John MacFarlane1-1/+1
2011-01-14Parsing: Fixed bug in grid table parser.John MacFarlane1-5/+5
Spaces at end of line were not being stripped properly, resulting in unintended LineBreaks.
2011-01-05Fixed macro parsing.John MacFarlane1-8/+10
2011-01-04Moved 'macro' and 'applyMacros'' from markdown reader to Parsing.John MacFarlane1-2/+27
2010-12-30New HTML reader using tagsoup as a lexer.John MacFarlane1-3/+3
* The new reader is faster and more accurate. * API changes for Text.Pandoc.Readers.HTML: - removed rawHtmlBlock, anyHtmlBlockTag, anyHtmlInlineTag, anyHtmlTag, anyHtmlEndTag, htmlEndTag, extractTagType, htmlBlockElement, htmlComment - added htmlTag, htmlInBalanced, isInlineTag, isBlockTag, isTextTag * tagsoup is a new dependency. * Text.Pandoc.Parsing: Generalized type on readWith. * Benchmark.hs: Added length calculation to force full evaluation. * Updated HTML reader tests. * Updated markdown and textile readers to use the functions from the HTML reader. * Note: The markdown reader now correctly handles some cases it did not before. For example: <hr/> is reproduced without adding a space. <script> a = '<b>'; </script> is parsed correctly.
2010-12-24Use functions from Text.Pandoc.Generic instead of processWith(M).John MacFarlane1-1/+2
2010-12-17Added new prettyprinting module.John MacFarlane1-2/+3
* Added Text.Pandoc.Pretty. This is better suited for pandoc than the 'pretty' package. One advantage is that we now get proper wrapping; Emph [Inline] is no longer treated as a big unwrappable unit. Previously we only got breaks for spaces at the "outer level." We can also more easily avoid doubled blank lines. Performance is significantly better as well. * Removed Text.Pandoc.Blocks. Text.Pandoc.Pretty allows you to define blocks and concatenate them. * Modified markdown, RST, org readers to use Text.Pandoc.Pretty instead of Text.PrettyPrint.HughesPJ. * Text.Pandoc.Shared: Added writerColumns to WriterOptions. * Markdown, RST, Org writers now break text at writerColumns. * Added --columns command-line option, which sets stColumns and writerColumns. * Table parsing: If the size of the header > stColumns, use the header size as 100% for purposes of calculating relative widths of columns.
2010-12-10Removed HTML sanitization.John MacFarlane1-2/+0
This is better done on the resulting HTML; use the xss-sanitize library for this. xss-sanitize is based on pandoc's sanitization, but improves it. - Removed stateSanitize from ParserState. - Removed --sanitize-html option.
2010-12-07Smart punctuation: recognize entities.John MacFarlane1-8/+22
Now &ldquo;Hi&rdquo; gets parsed as a Quoted DoubleQuote inline.
2010-12-07Smart punctuation: don't alllow ellipses containing spaces.John MacFarlane1-1/+1
Previously we allowed '. . .', ' . . . ', etc. This caused too many complications, and removed author's flexibility in combining ellipses with spaces and periods.
2010-12-07Moved smartPunctuation from Markdown to Parsing.John MacFarlane1-3/+92
+ Parameterized smartPunctuation on an inline parser. + Handle smartPunctuation in Textile reader.
2010-12-05Fix regression: markdown references should be case-insensitive.John MacFarlane1-38/+17
This broke when we added the Key type. We had assumed that the custom case-insensitive Ord instance would ensure case-insensitive matching, but that is not how Data.Map works. * Added a test case for case-insensitivity in markdown-reader-more * Removed old refsMatch from Text.Pandoc.Parsing module; * hid the 'Key' constructor; * dropped the custom Ord and Eq instances, deriving instead; * added fromKey and toKey to convert between Keys and Inline lists; * toKey ensures that keys are case-insensitive, since this is the only way the API provides to construct a Key. Resolves Issue #272.
2010-11-06Removed CITEPROC CPP conditionals from library code.John MacFarlane1-4/+0
By Cabal policy, the API should not change depending on flags.
2010-10-26Process LaTeX macros in markdown, and apply to TeX math.John MacFarlane1-2/+7
Example: \newcommand{\plus}[2]{#1 + #2} $\plus{3}{4}$ yields: 3+4
2010-07-13Parse \chapter{} in latex.John MacFarlane1-2/+4
+ Added stateHasChapters to ParserState. + If a \chapter command is encountered, this is set to True and subsequent \section commands (etc.) will be bumped up one level.
2010-07-11Merge branch 'atlists'. Added auto-numbered example lists.John MacFarlane1-5/+27
2010-07-06Allow language-neutral table captions.John MacFarlane1-1/+4
+ Captions may now begin simply with ':', instead of 'Table:' + Captions may now appear either above or below the table. + Resolves Issue #227.
2010-07-05More refactoring of grid table code.John MacFarlane1-8/+60
2010-07-05Minor reformatting.John MacFarlane1-2/+4
2010-07-05Moved generic grid table functions from RST reader -> Parsing.John MacFarlane1-3/+85
Here they can be used by the Markdown reader as well.
2010-07-05Moved parsing functions from Text.Pandoc.Shared to new module.John MacFarlane1-0/+537
+ Text.Pandoc.Parsing