pandoc - Conversion between markup formats

Age	Commit message (Collapse)	Author	Files	Lines
2017-08-28	RST reader: handle blank lines correctly in line blocks (#3881)	Alexander	1	-1/+1
	Previously pandoc would sometimes combine two line blocks separated by blanks, and ignore trailing blank lines within the line block. Test is checked to be consisted with http://rst.ninjs.org/
2017-08-19	Markdown reader: use CommonMark rules for list item nesting.	John MacFarlane	1	-8/+28
	Closes #3511. Previously pandoc used the four-space rule: continuation paragraphs, sublists, and other block level content had to be indented 4 spaces. Now the indentation required is determined by the first line of the list item: to be included in the list item, blocks must be indented to the level of the first non-space content after the list marker. Exception: if are 5 or more spaces after the list marker, then the content is interpreted as an indented code block, and continuation paragraphs must be indented two spaces beyond the end of the list marker. See the CommonMark spec for more details and examples. Documents that adhere to the four-space rule should, in most cases, be parsed the same way by the new rules. Here are some examples of texts that will be parsed differently: - a - b will be parsed as a list item with a sublist; under the four-space rule, it would be a list with two items. - a code Here we have an indented code block under the list item, even though it is only indented six spaces from the margin, because it is four spaces past the point where a continuation paragraph could begin. With the four-space rule, this would be a regular paragraph rather than a code block. - a code Here the code block will start with two spaces, whereas under the four-space rule, it would start with `code`. With the four-space rule, indented code under a list item always must be indented eight spaces from the margin, while the new rules require only that it be indented four spaces from the beginning of the first non-space text after the list marker (here, `a`). This change was motivated by a slew of bug reports from people who expected lists to work differently (#3125, #2367, #2575, #2210, #1990, #1137, #744, #172, #137, #128) and by the growing prevalance of CommonMark (now used by GitHub, for example). Users who want to use the old rules can select the `four_space_rule` extension. * Added `four_space_rule` extension. * Added `Ext_four_space_rule` to `Extensions`. * `Parsing` now exports `gobbleAtMostSpaces`, and the type of `gobbleSpaces` has been changed so that a `ReaderOptions` parameter is not needed.
2017-08-08	Parsing: added gobbleSpaces.	John MacFarlane	1	-0/+12
	This is a utility function to use in list parsing.
2017-07-14	Fix ghc 8.2.1 compiler warnings.	John MacFarlane	1	-23/+26

2017-07-14	Revert "Fixed some ghc 8.2 compiler warnings."	John MacFarlane	1	-14/+14
	This reverts commit e22dc98a70d030cc6b4056d14ddd6462c7790f97.
2017-07-14	Fixed some ghc 8.2 compiler warnings.	John MacFarlane	1	-14/+14
	(Unnecessary type constraints.)
2017-07-07	Parsing: added takeP, takeWhileP for efficient parsing of [Char].	John MacFarlane	1	-2/+33

2017-07-07	Rewrote LaTeX reader with proper tokenization.	John MacFarlane	1	-45/+22
	This rewrite is primarily motivated by the need to get macros working properly. A side benefit is that the reader is significantly faster (27s -> 19s in one benchmark, and there is a lot of room for further optimization). We now tokenize the input text, then parse the token stream. Macros modify the token stream, so they should now be effective in any context, including math. Thus, we no longer need the clunky macro processing capacities of texmath. A custom state LaTeXState is used instead of ParserState. This, plus the tokenization, will require some rewriting of the exported functions rawLaTeXInline, inlineCommand, rawLaTeXBlock. * Added Text.Pandoc.Readers.LaTeX.Types (new exported module). Exports Macro, Tok, TokType, Line, Column. [API change] * Text.Pandoc.Parsing: adjusted type of `insertIncludedFile` so it can be used with token parser. * Removed old texmath macro stuff from Parsing. Use Macro from Text.Pandoc.Readers.LaTeX.Types instead. * Removed texmath macro material from Markdown reader. * Changed types for Text.Pandoc.Readers.LaTeX's rawLaTeXInline and rawLaTeXBlock. (Both now return a String, and they are polymorphic in state.) * Added orgMacros field to OrgState. [API change] * Removed readerApplyMacros from ReaderOptions. Now we just check the `latex_macros` reader extension. * Allow `\newcommand\foo{blah}` without braces. Fixes #1390. Fixes #2118. Fixes #3236. Fixes #3779. Fixes #934. Fixes #982.
2017-06-19	Tracing: give less misleading line information with parseWithString.	John MacFarlane	1	-1/+2
	Previously positions would be reported past the end of the chunk. We now reset the source position within the chunk and report positions "in chunk."
2017-05-28	Parsing: `many1Till`: Check for the end condition before parsing	Herwig Stuetz	1	-2/+3
	By not checking for the end condition before the first parse, the parser was applied too often, consuming too much of the input. This fixes the behaviour of `testStringWith (many1Till (oneOf "ab") (string "aa")) "aaa"` which before incorrectly returned `Right "a"`. With this change, it instead correctly fails with `Left (PandocParsecError ...)` because it is not able to parse at least one occurence of `oneOf "ab"` that is not `"aa"`. Note that this only affects `many1Till p end` where `p` matches on a prefix of `end`.
2017-05-25	Markdown reader: warn for notes defined but not used.	John MacFarlane	1	-2/+5
	Closes #1718. Parsing.ParserState: Make stateNotes' a Map, add stateNoteRefs.
2017-05-24	Parsing: Provide parseFromString'.	John MacFarlane	1	-1/+17
	This is a verison of parseFromString specialied to ParserState, which resets stateLastStrPos at the end. This is almost always what we want. This fixes a bug where `_hi_` wasn't treated as emphasis in the following, because pandoc got confused about the position of the last word: - [o] _hi_ Closes #3690.
2017-05-23	Shared: Provide custom isURI that rejects unknown schemes [isURI]	Albert Krewinkel	1	-26/+1
	We also export the set of known `schemes`. The new function replaces the function of the same name from `Network.URI`, as the latter did not check whether a scheme is well-known. E.g. MediaWiki wikis frequently feature pages with names like `User:John`. These links were interpreted as URIs, thus turning internal links into global links. This is prevented by also checking whether the scheme of a URI is frequently used (i.e. is IANA registered or an otherwise well-known scheme). Fixes: #2713 Update set of well-known URIs from IANA list All official IANA schemes (as of 2017-05-22) are included in the set of known schemes. The four non-official schemes doi, isbn, javascript, and pmid are kept.
2017-05-22	Move indentWith to Text.Pandoc.Parsing (#3687)	Alexander Krotov	1	-0/+12

2017-05-17	Merge pull request #3677 from labdsf/anylinenewline	John MacFarlane	1	-0/+5
	Move anyLineNewline to Parsing.hs
2017-05-17	Move anyLineNewline to Parsing.hs	Alexander Krotov	1	-0/+5

2017-05-14	Parsing: add `insertIncludedFilesF` which returns F blocks	Albert Krewinkel	1	-7/+24
	The `insertIncludeFiles` function was generalized and renamed to `insertIncludedFiles'`; the specialized versions are based on that.
2017-05-14	Parsing: introduce `HasIncludeFiles` type class	Albert Krewinkel	1	-9/+22
	The `insertIncludeFile` function is generalized to work with all parser states which are instances of that class.
2017-05-14	Parsing: replace partial with total function	Albert Krewinkel	1	-1/+1
	Calling `tail` on an empty list raises an exception, while calling the otherwise equivalent `drop 1` will return the empty list again.
2017-05-13	Update dates in copyright notices	Albert Krewinkel	1	-2/+2
	This follows the suggestions given by the FSF for GPL licensed software. <https://www.gnu.org/prep/maintain/html_node/Copyright-Notices.html>
2017-05-11	Combine grid table parsers	Albert Krewinkel	1	-18/+51
	The grid table parsers for markdown and rst was combined into one single parser, slightly changing parsing behavior of both parsers: - The markdown parser now compactifies block content cell-wise: pure text blocks in cells are now treated as paragraphs only if the cell contains multiple paragraphs, and as plain blocks otherwise. Before, this was true only for single-column tables. - The rst parser now accepts newlines and multiple blocks in header cells. Closes: #3638
2017-05-02	Generalize tableWith, gridTableWith	Albert Krewinkel	1	-23/+26
	The parsing functions `tableWith` and `gridTableWith` are generalized to work with more parsers. The parser state only has to be an instance of the `HasOptions` class instead of requiring a concrete type. Block parsers are required to return blocks wrapped into a monad, as this makes it possible to use parsers returning results wrapped in `Future`s.
2017-04-30	Provide shared F monad functions for Markdown and Org readers	Albert Krewinkel	1	-10/+25
	The `F` monads used for delayed evaluation of certain values in the Markdown and Org readers are based on a shared data type capturing the common pattern of both `F` types.
2017-04-15	Avoid parsing "Notes:**" as a bare URI.	John MacFarlane	1	-0/+2
	This avoids parsing bare URIs that start with a scheme + colon + `*`, `_`, or `]`. Closes #3570.
2017-03-13	Better handling of \part in LaTeX.	John MacFarlane	1	-2/+0
	Closes #1905. Removed stateChapters from ParserState. Now we parse chapters as level 0 headers, and parts as level -1 headers. After parsing, we check for the lowest header level, and if it's less than 1 we bump everything up so that 1 is the lowest header level. So `\part` will always produce a header; no command-line options are needed.
2017-03-12	Issue warning for duplicate header identifiers.	John MacFarlane	1	-2/+8
	As noted in the previous commit, an autogenerated identifier may still coincide with an explicit identifier that is given for a header later in the document, or with an identifier on a div, span, link, or image. This commit adds a warning in this case, so users can supply an explicit identifier. * Added `DuplicateIdentifier` to LogMessage. * Modified HTML, Org, MediaWiki readers so their custom state type is an instance of HasLogMessages. This is necessary for `registerHeader` to issue warnings. See #1745.
2017-03-12	Improved behavior of `auto_identifiers` when there are explicit ids.	John MacFarlane	1	-1/+2
	Previously only autogenerated ids were added to the list of header identifiers in state, so explicit ids weren't taken into account when generating unique identifiers. Duplicated identifiers could result. This simple fix ensures that explicitly given identifiers are also taken into account. Fixes #1745. Note some limitations, however. An autogenerated identifier may still coincide with an explicit identifier that is given for a header later in the document, or with an identifier on a div, span, link, or image. Fixing this would be much more difficult, because we need to run `registerHeader` before we have the complete parse tree (so we can't get a complete list of identifiers from the document by walking the tree). However, it might be worth issuing warnings for duplicate header identifiers; I think we can do that. It is not common for headers to have the same text, and the issue can always be worked around by adding explicit identifiers, if the user is aware of it.
2017-03-10	Use pMacroDefinition in macro (for more direct parsing).	John MacFarlane	1	-13/+8
	This is newly exported in texmath 0.9.3. Note that this means that `macro` will now parse one macro at a time, rather than parsing a whole group together.
2017-03-03	RST reader: support RST-style citations.	John MacFarlane	1	-0/+2
	The citations appear at the end of the document as a definition list in a special div with id `citations`. Citations link to the definitions. Added stateCitations to ParserState. Closes #853.
2017-02-20	Revert "Refined constraint for HasQuoteContext instance."	John MacFarlane	1	-1/+1
	This reverts commit 3c427fc17d53a564305aadde015dd2f048d9ff71.
2017-02-20	Refined constraint for HasQuoteContext instance.	John MacFarlane	1	-1/+1
	in hopes that this will help the ghc 7.8.4 build...
2017-02-20	Removed redundant constraint.	John MacFarlane	1	-2/+1

2017-02-17	Parsing: Added HasLogMessages, logMessage, reportLogMessages.	John MacFarlane	1	-0/+25
	We need to do logging by updating parser state, or we'll get inappropriate and repeated log messages when there is parser backtracking. See #3447.
2017-02-11	Use new warnings throughout the code base.	John MacFarlane	1	-2/+8

2017-02-07	Refactored some files formerly in LaTeX reader.	John MacFarlane	1	-0/+23
	* Export readFileFromDirs from Class. * Export insertIncludedFile from Parsing. Simplified code in LaTeX/RST readers.
2017-02-07	Moved readFileFromDirs to Text.Pandoc.Class.	John MacFarlane	1	-3/+3
	This can be used in several different modules, not just LaTeX reader.
2017-01-27	Shared: rename compactify', compactify'DL -> compactify, compactifyDL.	John MacFarlane	1	-1/+1

2017-01-27	Removed Shared.compactify.	John MacFarlane	1	-12/+12
	Changed signatures on Parsing.tableWith and Parsing.gridTableWith.
2017-01-25	Removed readerOldDashes and --old-dashes option, added old_dashes extension.	John MacFarlane	1	-1/+1
	API change. CLI option change.
2017-01-25	Removed readerSmart and the --smart option; added Ext_smart extension.	John MacFarlane	1	-5/+1
	Now you will need to do -f markdown+smart instead of -f markdown --smart This change opens the way for writers, in addition to readers, to be sensitive to +smart, but this change hasn't yet been made. API change. Command-line option change. Updated manual.
2017-01-25	Make Extensions a custom type instead of a Set Extension.	John MacFarlane	1	-4/+4
	The type is implemented in terms of an underlying bitset which should be more efficient. API change: from Text.Pandoc.Extensions export Extensions, emptyExtensions, extensionsFromList, enableExtension, disableExtension, extensionEnabled.
2017-01-25	LaTeX reader: Proper include file processing.	John MacFarlane	1	-0/+2
	* Removed handleIncludes from LaTeX reader [API change]. * Now the ordinary LaTeX reader handles includes in a way that is appropriate to the monad it is run in.
2017-01-25	Parsing: Removed obsolete warnings stuff.	John MacFarlane	1	-21/+3
	Removed stateWarnings, addWarning, and readWithWarnings.
2017-01-25	Remove OverlappingInstances pragma.	Jesse Rosenthal	1	-1/+0
	It doesn't help to solve the problem in 7.8.
2017-01-25	Try adding OverlappingInstances pragma to parsing.	Jesse Rosenthal	1	-0/+1
	It's having trouble figuring out HasQuoteContext.
2017-01-25	Unify Errors.	Jesse Rosenthal	1	-1/+1

2017-01-25	Add IncoherentInstances pragma for HasQuotedContext.	Jesse Rosenthal	1	-1/+3
	We can remove this if we can figure out a better way to do this.
2016-10-23	Tighten up parsing of raw email addresses.	John MacFarlane	1	-4/+13
	Technically `**@user` is a valid email address, but if we allow things like this, we get bad results in markdown flavors that autolink raw email addresses. (See #2940.) So we exclude a few valid email addresses in order to avoid these more common bad cases. Closes #2940.
2016-10-13	Allow empty lines when parsing line blocks	Albert Krewinkel	1	-2/+5
	Line blocks are allowed to contain empty lines and should be parsed as a single block in that case. Previously an empty (line block) line would have terminated parsing of the line block element.
2016-09-02	Remove TagSoup compat	Jesse Rosenthal	1	-3/+3
	We already lower-bound tagsoup at 0.13.7, which means we were always running the compatibility layer (it was conditional on min value 0.13). Better to just use `lookupEntity` from the library directly, and convert a string to a char if need be.