aboutsummaryrefslogtreecommitdiff
path: root/src/Text/Pandoc/Readers
AgeCommit message (Collapse)AuthorFilesLines
2007-09-05Simplified parsing of reference keys and notes in markdown and RSTfiddlosopher2-33/+54
readers: + The Reference data structure from Text.Pandoc.Shared is no longer needed, since + referenceKey and noteBlock parses return strings (as many blank lines as are occuried by the key or note) and update state themselves. + getPosition and setPosition are now used to ensure that error messages will give the correct line number. + This yields cleaner (and slightly faster) code, with more accurate parsing error messages. git-svn-id: https://pandoc.googlecode.com/svn/trunk@1012 788f1e2b-df1e-0410-8736-df70ead52e1b
2007-09-02LaTeX command and environment names can't contain numbers.fiddlosopher1-4/+4
LaTeX reader updated accordingly. git-svn-id: https://pandoc.googlecode.com/svn/trunk@987 788f1e2b-df1e-0410-8736-df70ead52e1b
2007-09-01Skip notes parsing if running in strict mode. (This yields a nicefiddlosopher1-14/+16
speed improvement in strict mode.) git-svn-id: https://pandoc.googlecode.com/svn/trunk@983 788f1e2b-df1e-0410-8736-df70ead52e1b
2007-09-01Simplify autolink parsing code, using Network.URI to test forfiddlosopher1-25/+24
URIs. Added dependency on network library to debian/control and pandoc.cabal. git-svn-id: https://pandoc.googlecode.com/svn/trunk@982 788f1e2b-df1e-0410-8736-df70ead52e1b
2007-09-01More perspicuous definition of nonindentSpaces.fiddlosopher1-1/+4
git-svn-id: https://pandoc.googlecode.com/svn/trunk@981 788f1e2b-df1e-0410-8736-df70ead52e1b
2007-09-01Removed unneeded 'try' in 'rawLine'.fiddlosopher1-1/+1
git-svn-id: https://pandoc.googlecode.com/svn/trunk@979 788f1e2b-df1e-0410-8736-df70ead52e1b
2007-09-01Combined linebreak and whitespace into a new whitespacefiddlosopher1-7/+6
parser, to avoid unnecessary reparsing of space characters. git-svn-id: https://pandoc.googlecode.com/svn/trunk@978 788f1e2b-df1e-0410-8736-df70ead52e1b
2007-09-01Removed unnecessary 'try' in 'codeBlock'.fiddlosopher1-1/+1
git-svn-id: https://pandoc.googlecode.com/svn/trunk@977 788f1e2b-df1e-0410-8736-df70ead52e1b
2007-09-01Use lookAhead in parsers for setext headers andfiddlosopher1-1/+5
definition lists to see if the next line begins appropriately; if not, don't waste any more time parsing... git-svn-id: https://pandoc.googlecode.com/svn/trunk@976 788f1e2b-df1e-0410-8736-df70ead52e1b
2007-08-31Don't require blank lines after code block. (It's sufficientfiddlosopher1-1/+1
to end code block with a nonindented line.) git-svn-id: https://pandoc.googlecode.com/svn/trunk@975 788f1e2b-df1e-0410-8736-df70ead52e1b
2007-08-31Changed definition of 'emph': italics with '_' must not be followedfiddlosopher1-1/+1
by an alphanumeric character. This is to help prevent interpretation of e.g. [LC_TYPE]: my_type as '[LC<em>TYPE]:my</em>type'. git-svn-id: https://pandoc.googlecode.com/svn/trunk@974 788f1e2b-df1e-0410-8736-df70ead52e1b
2007-08-30Fixed bug in LaTeX reader, which wrongly assumed that thefiddlosopher1-1/+1
roman numeral after "enum" in "setcounter" would consist entirely of "i"s. enumiv is legitimate. git-svn-id: https://pandoc.googlecode.com/svn/trunk@961 788f1e2b-df1e-0410-8736-df70ead52e1b
2007-08-29Cleaned up LaTeX reader.fiddlosopher1-24/+24
Rearranged order of parsers in inline for slight speed improvement. Added ` to special characters and 'unescapedChar'. git-svn-id: https://pandoc.googlecode.com/svn/trunk@960 788f1e2b-df1e-0410-8736-df70ead52e1b
2007-08-29Removed unneeded try's in RST reader; also minor code cleanup.fiddlosopher1-23/+17
git-svn-id: https://pandoc.googlecode.com/svn/trunk@959 788f1e2b-df1e-0410-8736-df70ead52e1b
2007-08-29Efficiency improvements to RST reader (more than doubledfiddlosopher1-12/+9
speed): + removed tabchar + rearranged parsers in inline git-svn-id: https://pandoc.googlecode.com/svn/trunk@958 788f1e2b-df1e-0410-8736-df70ead52e1b
2007-08-29Purely stylistic change.fiddlosopher1-1/+1
git-svn-id: https://pandoc.googlecode.com/svn/trunk@957 788f1e2b-df1e-0410-8736-df70ead52e1b
2007-08-29Removed unneeded 'try' in 'ellipses'.fiddlosopher1-2/+1
git-svn-id: https://pandoc.googlecode.com/svn/trunk@956 788f1e2b-df1e-0410-8736-df70ead52e1b
2007-08-29+ Fixed bug introduced into referenceTitle by previous changes.fiddlosopher1-5/+6
Now it works as before. + Improved Markdown.pl-compatibility in referenceLink: the two parts of a reference-style link may be separated by one space, but not more... [a] [link], [not] [a link]. git-svn-id: https://pandoc.googlecode.com/svn/trunk@955 788f1e2b-df1e-0410-8736-df70ead52e1b
2007-08-29Fixed markdown inline code parsing so it better accords withfiddlosopher1-4/+6
Markdown.pl: the marker for the end of the code section is a clump of the same number of `'s with which the section began, followed by a non-` character. So, for example, ` h ``` i ` -> <code>h ``` i</code>. git-svn-id: https://pandoc.googlecode.com/svn/trunk@954 788f1e2b-df1e-0410-8736-df70ead52e1b
2007-08-29Small change to referenceTitle: should end with line-end, not ')'.fiddlosopher1-2/+1
git-svn-id: https://pandoc.googlecode.com/svn/trunk@953 788f1e2b-df1e-0410-8736-df70ead52e1b
2007-08-29Split 'title' into 'linkTitle' and 'referenceTitle', since thefiddlosopher1-14/+19
rules are slightly different. git-svn-id: https://pandoc.googlecode.com/svn/trunk@952 788f1e2b-df1e-0410-8736-df70ead52e1b
2007-08-29Removed unneeded 'try' from noteMarker.fiddlosopher1-4/+1
git-svn-id: https://pandoc.googlecode.com/svn/trunk@950 788f1e2b-df1e-0410-8736-df70ead52e1b
2007-08-29Minor reformatting.fiddlosopher1-3/+2
git-svn-id: https://pandoc.googlecode.com/svn/trunk@949 788f1e2b-df1e-0410-8736-df70ead52e1b
2007-08-29Rewrote 'para' for greater efficiency.fiddlosopher1-6/+4
git-svn-id: https://pandoc.googlecode.com/svn/trunk@948 788f1e2b-df1e-0410-8736-df70ead52e1b
2007-08-28Rewrote link parsers for greater efficiency.fiddlosopher1-7/+4
git-svn-id: https://pandoc.googlecode.com/svn/trunk@945 788f1e2b-df1e-0410-8736-df70ead52e1b
2007-08-28Removed redundant 'referenceLink' in definition of inlinefiddlosopher1-1/+0
(it's already in 'link'). git-svn-id: https://pandoc.googlecode.com/svn/trunk@940 788f1e2b-df1e-0410-8736-df70ead52e1b
2007-08-28Refactored escapeChar so it doesn't need 'try'.fiddlosopher1-4/+4
git-svn-id: https://pandoc.googlecode.com/svn/trunk@939 788f1e2b-df1e-0410-8736-df70ead52e1b
2007-08-28Removed unneeded 'try' in multilineRow.fiddlosopher1-1/+1
git-svn-id: https://pandoc.googlecode.com/svn/trunk@938 788f1e2b-df1e-0410-8736-df70ead52e1b
2007-08-28Removed unneeded 'try' in dashedLine.fiddlosopher1-1/+1
git-svn-id: https://pandoc.googlecode.com/svn/trunk@937 788f1e2b-df1e-0410-8736-df70ead52e1b
2007-08-28Removed unneeded try in rawHtmlBlocks (Markdown parser).fiddlosopher1-2/+2
git-svn-id: https://pandoc.googlecode.com/svn/trunk@936 788f1e2b-df1e-0410-8736-df70ead52e1b
2007-08-28Refactored hrule for performance in Markdown reader.fiddlosopher1-5/+5
git-svn-id: https://pandoc.googlecode.com/svn/trunk@935 788f1e2b-df1e-0410-8736-df70ead52e1b
2007-08-28Minor reformatting.fiddlosopher1-2/+3
git-svn-id: https://pandoc.googlecode.com/svn/trunk@934 788f1e2b-df1e-0410-8736-df70ead52e1b
2007-08-28Refactored setext header parsing in Markdown reader for greaterfiddlosopher1-5/+3
speed. git-svn-id: https://pandoc.googlecode.com/svn/trunk@933 788f1e2b-df1e-0410-8736-df70ead52e1b
2007-08-28More rearranging in definition of inline.fiddlosopher1-8/+8
git-svn-id: https://pandoc.googlecode.com/svn/trunk@932 788f1e2b-df1e-0410-8736-df70ead52e1b
2007-08-28More intelligent rearranging of 'inline' for speed boostsfiddlosopher1-3/+3
in Text.Pandoc.Readers.Markdown. git-svn-id: https://pandoc.googlecode.com/svn/trunk@931 788f1e2b-df1e-0410-8736-df70ead52e1b
2007-08-28Changed definition of 'enclosed' in Text.Pandoc.Shared so thatfiddlosopher3-5/+5
'try' is not automatically applied to the 'end' parser. Added 'try' in calls to 'enclosed' where needed. Slight speed increase. git-svn-id: https://pandoc.googlecode.com/svn/trunk@926 788f1e2b-df1e-0410-8736-df70ead52e1b
2007-08-28Performance improvements:fiddlosopher1-13/+11
+ Rearranged parsers in definition of 'inline' so that the most frequently used would (by and large) be tried first. + Removed some unneeded 'try's. + Removed tabchar parser, as whitespace handles tabs anyway. + All in all, these changes, together with the last two commits, cut almost in half the time it takes pandoc to parse a large test file. git-svn-id: https://pandoc.googlecode.com/svn/trunk@924 788f1e2b-df1e-0410-8736-df70ead52e1b
2007-08-26Don't countfiddlosopher1-0/+1
p. 27 at the beginning of a line as an ordered list start, since it's most likely a page number. git-svn-id: https://pandoc.googlecode.com/svn/trunk@900 788f1e2b-df1e-0410-8736-df70ead52e1b
2007-08-23Added new rule for enhanced markdown ordered lists: if the list markerfiddlosopher1-6/+9
is a capital letter followed by a period (including a single-letter capital roman numeral), then it must be followed by at least two spaces. The point of this is to avoid accidentally treating people's initials as list markers: a paragraph may begin: B. Russell was an English philosopher. and this shouldn't be treated as a list. Modified Markdown reader and README documentation. Added a test case. git-svn-id: https://pandoc.googlecode.com/svn/trunk@880 788f1e2b-df1e-0410-8736-df70ead52e1b
2007-08-22Added a needed 'try' to listItem in Markdown reader.fiddlosopher1-1/+1
git-svn-id: https://pandoc.googlecode.com/svn/trunk@878 788f1e2b-df1e-0410-8736-df70ead52e1b
2007-08-20Code cleanup in Markdown reader.fiddlosopher1-27/+18
git-svn-id: https://pandoc.googlecode.com/svn/trunk@877 788f1e2b-df1e-0410-8736-df70ead52e1b
2007-08-20Changes to Markdown reader for better conformity to thefiddlosopher1-6/+12
Markdown test suite under --strict: + Removed check for a following setext header in endline. A full test is too inefficient (doubles benchmark time), and the substitute we had before is not 100% accurate. + Don't use Code elements for autolinks if --strict specified. git-svn-id: https://pandoc.googlecode.com/svn/trunk@876 788f1e2b-df1e-0410-8736-df70ead52e1b
2007-08-19Refactor RST and Markdown readers using parseFromString.fiddlosopher2-28/+7
git-svn-id: https://pandoc.googlecode.com/svn/trunk@864 788f1e2b-df1e-0410-8736-df70ead52e1b
2007-08-19Added a necessary "try" in definition of "para"fiddlosopher1-1/+2
(HTML reader). git-svn-id: https://pandoc.googlecode.com/svn/trunk@863 788f1e2b-df1e-0410-8736-df70ead52e1b
2007-08-18Bug fixes in readers:fiddlosopher2-7/+20
+ LaTeX reader: skip anything after \end{document} + HTML reader: fixed bug skipping material after </html> -- previously, stuff at the end was skipped even if no </html> was present, which meant only part of the file would be parsed and no error issued + HTML reader: added new constant eitherBlockOrInline with elements that may count either as block-level or inline + Modified isInline and isBlock to take this into account + modified rawHtmlBlock to accept any tag (even an inline tag); this is innocuous, because rawHtmlBlock is tried only if a regular inline element can't be parsed. git-svn-id: https://pandoc.googlecode.com/svn/trunk@862 788f1e2b-df1e-0410-8736-df70ead52e1b
2007-08-18+ Fixed bug in markdown ordered list parsing. The problem wasfiddlosopher2-4/+6
that anyOrderedListStart did not check for a space following the ordered list marker. So, 'A.B. 2007' would be parsed as a list item, then fail because of the lack of space after 'A.' (required by orderedListStart). Resolves Issue #22. + Fixed a similar problem in RST reader. + Added regression test. git-svn-id: https://pandoc.googlecode.com/svn/trunk@861 788f1e2b-df1e-0410-8736-df70ead52e1b
2007-08-15Cosmetic changes.fiddlosopher1-4/+3
git-svn-id: https://pandoc.googlecode.com/svn/trunk@851 788f1e2b-df1e-0410-8736-df70ead52e1b
2007-08-15LaTeX reader: parse \texttt{} as code, as long as there'sfiddlosopher1-2/+9
nothing fancy inside. git-svn-id: https://pandoc.googlecode.com/svn/trunk@846 788f1e2b-df1e-0410-8736-df70ead52e1b
2007-08-15Allow htmlComments as rawHtmlInline in HTML reader.fiddlosopher1-2/+3
git-svn-id: https://pandoc.googlecode.com/svn/trunk@844 788f1e2b-df1e-0410-8736-df70ead52e1b
2007-08-15Major code cleanup in all modules. (Removed unneeded imports,fiddlosopher4-1070/+811
reformatted, etc.) More major changes are documented below: + Removed Text.Pandoc.ParserCombinators and moved all its definitions to Text.Pandoc.Shared. + In Text.Pandoc.Shared: - Removed unneeded 'try' in blanklines. - Removed endsWith function and rewrote functions to use isSuffixOf instead. - Added >>~ combinator. - Rewrote stripTrailingNewlines, removeLeadingSpaces. + Moved Text.Pandoc.Entities -> Text.Pandoc.CharacterReferences. - Removed unneeded functions charToEntity, charToNumericalEntity. - Renamed functions using proper terminology (character references, not entities). decodeEntities -> decodeCharacterReferences, characterEntity -> characterReference. - Moved escapeStringToXML to Docbook writer, which is the only thing that uses it. - Removed old entity parser in HTML and Markdown readers; replaced with new charRef parser in Text.Pandoc.Shared. + Fixed accent bug in Text.Pandoc.Readers.LaTeX: \^{} now correctly parses as a '^' character. + Text.Pandoc.ASCIIMathML is no longer an exported module. git-svn-id: https://pandoc.googlecode.com/svn/trunk@835 788f1e2b-df1e-0410-8736-df70ead52e1b