aboutsummaryrefslogtreecommitdiff
path: root/src/Text/Pandoc/Parsing.hs
AgeCommit message (Collapse)AuthorFilesLines
2012-07-25Got rid of stateStandalone, which was hardly used anyway.John MacFarlane1-2/+0
The only possible effect will be with rst fragments that begin with an rst title block, which will now cause the header transform.
2012-07-25Moved stateOldDashes to readerOldDashes in ReaderOptions.John MacFarlane1-5/+1
2012-07-25Moved stateTabStop to readerTabStop in ReaderOptions.John MacFarlane1-3/+0
2012-07-25Moved stateColumns to readerColumns in ReaderOptions.John MacFarlane1-3/+1
2012-07-25Moved ParseRaw from ParserState to ReaderOptions.John MacFarlane1-2/+0
2012-07-25Text.Pandoc.Parsing: Added getOption.John MacFarlane1-4/+6
2012-07-25Options -> ReaderOptions.John MacFarlane1-3/+3
Better to keep reader and writer options separate.
2012-07-25Put smart, strict in separate options field in state.John MacFarlane1-8/+7
This is the beginning of a larger transition that will make Options, not ParserState, the parameter of the read functions. (Options will also be used in writers, in place of WriterOptions.) Next step is to remove strict, replacing it with granular tests for different extensions.
2012-07-24Better algorithm for oneOfStrings.John MacFarlane1-2/+9
This goes character by character, not backtracking.
2012-07-24Refactored table parsers, captions now not part of core tableWith.John MacFarlane1-10/+4
2012-07-22Revised code for pipe tables.John MacFarlane1-94/+4
* All tables now require at least one body row. * Renamed from 'extra' to 'pipe' tables. * Moved functions from Parsing to Readers.Markdown. * Cleaned up code; revised to parse in one pass rather than parsing a raw string, splitting it, and parsing the components. * Allow pipe tables without pipes on the ends (as PHP Markdown Extra does).
2012-07-22Merge pull request #510 from mytskine/markdown-extraJohn MacFarlane1-1/+97
Markdown extra tables [part of the multi-markdown syntax for tables]
2012-07-20Use Parser as type synonym for Parsec.John MacFarlane1-1/+3
2012-07-20Text.Pandoc.Parsing: Export all Parsec functions used in pandoc code.John MacFarlane1-1/+52
No other module directly imports Parsec. This will make it easier to change the parsing backend in the future, if we want to.
2012-07-20Use Text.Parsec instead of Text.ParserCombinators.Parsec.John MacFarlane1-103/+103
2012-07-19Provide Data.Default instances for ParserState and WriterOptions.John MacFarlane1-2/+6
Now you can use def (which is re-exported by Text.Pandoc) instead of defaultParserState or defaultWriterOptions. For now, these are still defined too, so existing code need not change. Closes #546.
2012-06-29Changed macro parser so it returns raw macro if stateApplyMacros false.John MacFarlane1-5/+8
Closes #554.
2012-04-24textile reader improvements : better conformance to RedCloth Textile inlinespaul.rivier1-0/+5
2012-03-24Add parsing support for the rST default-role directive.Greg Maslov1-2/+4
2012-02-21Added support for markdown-extra tables in the markdown parserFrançois Gannaz1-1/+97
Only tables whose lines begin with a "|" are supported. There are 2 warnings about unused variables when compiling.
2012-02-07Limit nesting of strong/emph.John MacFarlane1-0/+2
This avoids exponential lookahead in parasitic cases, like a**a*a**a*a**a*a**a*a**a*a**a*a**a*a**. Added stateMaxNestingLevel to ParserState. We set this to 6, so you can still have Emph inside Emph, just not indefinitely.
2012-02-05Parsing: Make characterReference fail if entity not found.John MacFarlane1-2/+2
2012-02-05Removed module Text.Pandoc.CharacterReferences.John MacFarlane1-1/+11
Moved characterReference parser to Text.Pandoc.Parsing. decodeCharacterReferences is now replaced by fromEntities in Text.Pandoc.XML.
2012-02-04Complete rewrite of LaTeX reader.John MacFarlane1-4/+20
* The new reader is more robust, accurate, and extensible. It is still quite incomplete, but it should be easier now to add features. * Text.Pandoc.Parsing: Added withRaw combinator. * Markdown reader: do escapedChar before raw latex inline. Otherwise we capture commands like \{. * Fixed latex citation tests for new citeproc. * Handle \include{} commands in latex. This is done in pandoc.hs, not the (pure) latex reader. But the reader exports the needed function, handleIncludes. * Moved err and warn from pandoc.hs to Shared. * Fixed tests - raw tex should sometimes have trailing space. * Updated lhs-test for highlighting-kate changes.
2012-01-27Fixed table parsing with wide or combining characters.John MacFarlane1-1/+1
Closes #348. Closes #108.
2012-01-01New treatment of dashes in --smart mode.John MacFarlane1-5/+29
* `---` is always em-dash, `--` is always en-dash. * pandoc no longer tries to guess when `-` should be en-dash. * A new option, `--old-dashes`, is provided for legacy documents. Rationale: The rules for en-dash are too complex and language-dependent for a guesser to work reliably. This change gives users greater control. The alternative of using unicode isn't very good, since unicode em- and en- dashes are barely distinguishable in a monospace font.
2011-12-29Better smart quote parsing.John MacFarlane1-1/+7
* Added stateLastStrPos to ParserState. This lets us keep track of whether we're parsing the position immediately after a 'str'. If we encounter a ' in such a location, it must be an apostrophe, and can't be a single quote start. * Set this in the markdown, textile, html, and rst str parsers. * Closes #360.
2011-12-27Replaced Apostrophe, Ellipses, EmDash, EnDash w/ unicode strings.John MacFarlane1-6/+6
2011-12-27Pretty: return Str with unicode instead of Apostrophe.John MacFarlane1-1/+1
2011-12-05Parsing: Removed charsInBalanced', added param to charsInBalanced.John MacFarlane1-20/+13
The extra parameter is a character parser. This is needed for proper handling of escapes, etc.
2011-12-05Parsing: Changed type of escaped to return CharJohn MacFarlane1-5/+2
2011-07-30Added nonspaceChar to Text.Pandoc.Parsing.John MacFarlane1-0/+5
2011-07-25Smart quotes: handle '...hi' properly.John MacFarlane1-1/+2
Also added test case.
2011-07-23Properly handle characters in the 128..159 range.John MacFarlane1-7/+7
These aren't valid in HTML, but many HTML files produced by Windows tools contain them. We substitute correct unicode characters.
2011-04-29Revert "Parsing: Use new type aliases, PandocParser, GeneralParser."John MacFarlane1-123/+118
This reverts commit ec5410bc4e9d228b7dc0123061d80f9addf825bf.
2011-04-29Parsing: Use new type aliases, PandocParser, GeneralParser.John MacFarlane1-118/+123
This should make it easier to change the types later.
2011-03-18Changed uri parser so it doesn't include trailing punctuation.John MacFarlane1-3/+19
So, in RST, 'http://google.com.' should be parsed as a link to 'http://google.com' followed by a period. The parser is smart enough to recognize balanced parentheses, as often occur in wikipedia links: 'http://foo.bar/baz_(bam)'. Also added ()s to RST specialChars, so '(http://google.com)' will be parsed as a link in parens. Added test cases. Resolves Issue #291.
2011-01-26Add support for attributes in inline Code.John MacFarlane1-1/+1
Additional related changes: * URLs in Code in autolinks now use class "url". * Require highlighting-kate 0.2.8.2, which omits the final <br/> tag, essential for inline code.
2011-01-26Bumped version to 1.8; depend on pandoc-types 1.8.John MacFarlane1-7/+6
The old TeX, HtmlInline and RawHtml elements have been removed and replaced by generic RawInline and RawBlock elements. All modules updated to use the new raw elements.
2011-01-19More small parser rewrites for small performance gains.John MacFarlane1-9/+11
2011-01-19Parsing: Rewrote spaceChar for significant speedup in readers.John MacFarlane1-1/+1
2011-01-14Parsing: Fixed bug in grid table parser.John MacFarlane1-5/+5
Spaces at end of line were not being stripped properly, resulting in unintended LineBreaks.
2011-01-05Fixed macro parsing.John MacFarlane1-8/+10
2011-01-04Moved 'macro' and 'applyMacros'' from markdown reader to Parsing.John MacFarlane1-2/+27
2010-12-30New HTML reader using tagsoup as a lexer.John MacFarlane1-3/+3
* The new reader is faster and more accurate. * API changes for Text.Pandoc.Readers.HTML: - removed rawHtmlBlock, anyHtmlBlockTag, anyHtmlInlineTag, anyHtmlTag, anyHtmlEndTag, htmlEndTag, extractTagType, htmlBlockElement, htmlComment - added htmlTag, htmlInBalanced, isInlineTag, isBlockTag, isTextTag * tagsoup is a new dependency. * Text.Pandoc.Parsing: Generalized type on readWith. * Benchmark.hs: Added length calculation to force full evaluation. * Updated HTML reader tests. * Updated markdown and textile readers to use the functions from the HTML reader. * Note: The markdown reader now correctly handles some cases it did not before. For example: <hr/> is reproduced without adding a space. <script> a = '<b>'; </script> is parsed correctly.
2010-12-24Use functions from Text.Pandoc.Generic instead of processWith(M).John MacFarlane1-1/+2
2010-12-17Added new prettyprinting module.John MacFarlane1-2/+3
* Added Text.Pandoc.Pretty. This is better suited for pandoc than the 'pretty' package. One advantage is that we now get proper wrapping; Emph [Inline] is no longer treated as a big unwrappable unit. Previously we only got breaks for spaces at the "outer level." We can also more easily avoid doubled blank lines. Performance is significantly better as well. * Removed Text.Pandoc.Blocks. Text.Pandoc.Pretty allows you to define blocks and concatenate them. * Modified markdown, RST, org readers to use Text.Pandoc.Pretty instead of Text.PrettyPrint.HughesPJ. * Text.Pandoc.Shared: Added writerColumns to WriterOptions. * Markdown, RST, Org writers now break text at writerColumns. * Added --columns command-line option, which sets stColumns and writerColumns. * Table parsing: If the size of the header > stColumns, use the header size as 100% for purposes of calculating relative widths of columns.
2010-12-10Removed HTML sanitization.John MacFarlane1-2/+0
This is better done on the resulting HTML; use the xss-sanitize library for this. xss-sanitize is based on pandoc's sanitization, but improves it. - Removed stateSanitize from ParserState. - Removed --sanitize-html option.
2010-12-07Smart punctuation: recognize entities.John MacFarlane1-8/+22
Now &ldquo;Hi&rdquo; gets parsed as a Quoted DoubleQuote inline.
2010-12-07Smart punctuation: don't alllow ellipses containing spaces.John MacFarlane1-1/+1
Previously we allowed '. . .', ' . . . ', etc. This caused too many complications, and removed author's flexibility in combining ellipses with spaces and periods.