Age | Commit message (Collapse) | Author | Files | Lines |
|
Better to keep reader and writer options separate.
|
|
This is the beginning of a larger transition that will make
Options, not ParserState, the parameter of the read functions.
(Options will also be used in writers, in place of WriterOptions.)
Next step is to remove strict, replacing it with granular
tests for different extensions.
|
|
This goes character by character, not backtracking.
|
|
|
|
* All tables now require at least one body row.
* Renamed from 'extra' to 'pipe' tables.
* Moved functions from Parsing to Readers.Markdown.
* Cleaned up code; revised to parse in one pass rather than
parsing a raw string, splitting it, and parsing the components.
* Allow pipe tables without pipes on the ends (as PHP Markdown Extra
does).
|
|
Markdown extra tables [part of the multi-markdown syntax for tables]
|
|
|
|
No other module directly imports Parsec. This will make it easier
to change the parsing backend in the future, if we want to.
|
|
|
|
Now you can use def (which is re-exported by Text.Pandoc) instead of
defaultParserState or defaultWriterOptions. For now, these
are still defined too, so existing code need not change.
Closes #546.
|
|
Closes #554.
|
|
|
|
|
|
Only tables whose lines begin with a "|" are supported.
There are 2 warnings about unused variables when compiling.
|
|
This avoids exponential lookahead in parasitic cases, like
a**a*a**a*a**a*a**a*a**a*a**a*a**a*a**.
Added stateMaxNestingLevel to ParserState.
We set this to 6, so you can still have Emph inside Emph, just not
indefinitely.
|
|
|
|
Moved characterReference parser to Text.Pandoc.Parsing.
decodeCharacterReferences is now replaced by fromEntities
in Text.Pandoc.XML.
|
|
* The new reader is more robust, accurate, and extensible.
It is still quite incomplete, but it should be easier
now to add features.
* Text.Pandoc.Parsing: Added withRaw combinator.
* Markdown reader: do escapedChar before raw latex inline.
Otherwise we capture commands like \{.
* Fixed latex citation tests for new citeproc.
* Handle \include{} commands in latex.
This is done in pandoc.hs, not the (pure) latex reader.
But the reader exports the needed function, handleIncludes.
* Moved err and warn from pandoc.hs to Shared.
* Fixed tests - raw tex should sometimes have trailing space.
* Updated lhs-test for highlighting-kate changes.
|
|
Closes #348. Closes #108.
|
|
* `---` is always em-dash, `--` is always en-dash.
* pandoc no longer tries to guess when `-` should be en-dash.
* A new option, `--old-dashes`, is provided for legacy documents.
Rationale: The rules for en-dash are too complex and
language-dependent for a guesser to work reliably. This
change gives users greater control. The alternative of
using unicode isn't very good, since unicode em- and en-
dashes are barely distinguishable in a monospace font.
|
|
* Added stateLastStrPos to ParserState. This lets us keep track
of whether we're parsing the position immediately after a 'str'.
If we encounter a ' in such a location, it must be an apostrophe,
and can't be a single quote start.
* Set this in the markdown, textile, html, and rst str parsers.
* Closes #360.
|
|
|
|
|
|
The extra parameter is a character parser. This is needed for
proper handling of escapes, etc.
|
|
|
|
|
|
Also added test case.
|
|
These aren't valid in HTML, but many HTML files produced by
Windows tools contain them. We substitute correct unicode
characters.
|
|
This reverts commit ec5410bc4e9d228b7dc0123061d80f9addf825bf.
|
|
This should make it easier to change the types later.
|
|
So, in RST, 'http://google.com.' should be parsed as a link
to 'http://google.com' followed by a period.
The parser is smart enough to recognize balanced parentheses,
as often occur in wikipedia links: 'http://foo.bar/baz_(bam)'.
Also added ()s to RST specialChars, so '(http://google.com)'
will be parsed as a link in parens.
Added test cases.
Resolves Issue #291.
|
|
Additional related changes:
* URLs in Code in autolinks now use class "url".
* Require highlighting-kate 0.2.8.2, which omits the final <br/> tag,
essential for inline code.
|
|
The old TeX, HtmlInline and RawHtml elements have been removed
and replaced by generic RawInline and RawBlock elements.
All modules updated to use the new raw elements.
|
|
|
|
|
|
Spaces at end of line were not being stripped properly,
resulting in unintended LineBreaks.
|
|
|
|
|
|
* The new reader is faster and more accurate.
* API changes for Text.Pandoc.Readers.HTML:
- removed rawHtmlBlock, anyHtmlBlockTag, anyHtmlInlineTag,
anyHtmlTag, anyHtmlEndTag, htmlEndTag, extractTagType,
htmlBlockElement, htmlComment
- added htmlTag, htmlInBalanced, isInlineTag, isBlockTag, isTextTag
* tagsoup is a new dependency.
* Text.Pandoc.Parsing: Generalized type on readWith.
* Benchmark.hs: Added length calculation to force full evaluation.
* Updated HTML reader tests.
* Updated markdown and textile readers to use the functions from
the HTML reader.
* Note: The markdown reader now correctly handles some cases it did not
before. For example:
<hr/>
is reproduced without adding a space.
<script>
a = '<b>';
</script>
is parsed correctly.
|
|
|
|
* Added Text.Pandoc.Pretty.
This is better suited for pandoc than the 'pretty' package.
One advantage is that we now get proper wrapping; Emph [Inline]
is no longer treated as a big unwrappable unit. Previously
we only got breaks for spaces at the "outer level." We can also
more easily avoid doubled blank lines. Performance is
significantly better as well.
* Removed Text.Pandoc.Blocks.
Text.Pandoc.Pretty allows you to define blocks and concatenate
them.
* Modified markdown, RST, org readers to use Text.Pandoc.Pretty
instead of Text.PrettyPrint.HughesPJ.
* Text.Pandoc.Shared: Added writerColumns to WriterOptions.
* Markdown, RST, Org writers now break text at writerColumns.
* Added --columns command-line option, which sets stColumns
and writerColumns.
* Table parsing: If the size of the header > stColumns,
use the header size as 100% for purposes of calculating
relative widths of columns.
|
|
This is better done on the resulting HTML; use the xss-sanitize library
for this. xss-sanitize is based on pandoc's sanitization, but improves
it.
- Removed stateSanitize from ParserState.
- Removed --sanitize-html option.
|
|
Now “Hi” gets parsed as a Quoted DoubleQuote inline.
|
|
Previously we allowed '. . .', ' . . . ', etc. This caused
too many complications, and removed author's flexibility in
combining ellipses with spaces and periods.
|
|
+ Parameterized smartPunctuation on an inline parser.
+ Handle smartPunctuation in Textile reader.
|
|
This broke when we added the Key type. We had assumed that
the custom case-insensitive Ord instance would ensure case-insensitive
matching, but that is not how Data.Map works.
* Added a test case for case-insensitivity in markdown-reader-more
* Removed old refsMatch from Text.Pandoc.Parsing module;
* hid the 'Key' constructor;
* dropped the custom Ord and Eq instances, deriving instead;
* added fromKey and toKey to convert between Keys and Inline lists;
* toKey ensures that keys are case-insensitive, since this is the
only way the API provides to construct a Key.
Resolves Issue #272.
|
|
By Cabal policy, the API should not change depending on flags.
|
|
Example:
\newcommand{\plus}[2]{#1 + #2}
$\plus{3}{4}$
yields:
3+4
|
|
+ Added stateHasChapters to ParserState.
+ If a \chapter command is encountered, this is set to True
and subsequent \section commands (etc.) will be bumped up
one level.
|
|
|