pandoc - Conversion between markup formats

Age	Commit message (Collapse)	Author	Files	Lines
2007-01-27	Cleaned up handling of embedded quotes in link titles.	fiddlosopher	2	-9/+4
	Now these are stored as a '"' character, not as '"'. The function escapeLinkTitle in the Markdown writer is unnecessary and was removed. Tests modified accordingly. git-svn-id: https://pandoc.googlecode.com/svn/trunk@517 788f1e2b-df1e-0410-8736-df70ead52e1b
2007-01-27	More changes in entity handling: Instead of using entities for characters	fiddlosopher	4	-52/+51
	above 128 in HTML and Docbook output, we now just use unicode. After all, we're declaring UTF-8 content in the header. This makes the HTML and docbook files produced by pandoc much more readable and editable. Changes to Entities.hs: + Removed specialCharToEntity + Added escapeSGMLChar (which just escapes the basic four, <>&") + Modified encodeEntities and stringToSGML to use escapeSGMLChar + Removed encodeEntitiesNumerical + Rewrote encodeEntities for better performance + Rewrote stringToSGML for better performance git-svn-id: https://pandoc.googlecode.com/svn/trunk@516 788f1e2b-df1e-0410-8736-df70ead52e1b
2007-01-27	Changes in entity handling:	fiddlosopher	7	-141/+133
	+ Entities are parsed (and unicode characters returned) in both Markdown and HTML readers. + Parsers characterEntity, namedEntity, decimalEntity, hexEntity added to Entities.hs; these parse a string and return a unicode character. + Changed 'entity' parser in HTML reader to use the 'characterEntity' parser from Entities.hs. + Added new 'entity' parser to Markdown reader, and added '&' as a special character. Adjusted test suite accordingly since now we get 'Str "AT",Str "&",Str "T"' instead of 'Str "AT&T".. + stringToSGML moved to Entities.hs. escapeSGML removed as redundant, given encodeEntities. + stringToSGML, encodeEntities, and specialCharToEntity are given a boolean parameter that causes only numerical entities to be used. This is used in the docbook writer. The HTML writer uses named entities where possible, but not all docbook-consumers know about the named entities without special instructions, so it seems safer to use numerical entities there. + decodeEntities is rewritten in a way that avoids Text.Regex, using the new parsers. + charToEntity and charToNumericalEntity added to Entities.hs. + Moved specialCharToEntity from Shared.hs to Entities.hs. + Removed unneeded 'decodeEntities' from 'str' parser in HTML and Markdown readers. + Removed sgmlHexEntity, sgmlDecimalEntity, sgmlNamedEntity, and sgmlCharacterEntity from Shared.hs. + Modified Docbook writer so that it doesn't rely on Text.Regex for detecting "mailto" links. git-svn-id: https://pandoc.googlecode.com/svn/trunk@515 788f1e2b-df1e-0410-8736-df70ead52e1b
2007-01-24	Rewrote functions in Text/Pandoc/Shared so as not to use Text.Regex,	fiddlosopher	1	-28/+51
	which does not support unicode: - escapePreservingRegex removed - stringToSGML rewritten using Parsec parser - new parsers for SGML character entities - escapeSGML rewritten using specialCharToEntity - new function specialCharToEntity git-svn-id: https://pandoc.googlecode.com/svn/trunk@514 788f1e2b-df1e-0410-8736-df70ead52e1b
2007-01-24	Changed Markdown autoLink parsing to conform better to	fiddlosopher	1	-6/+6
	Markdown.pl's behavior. <google.com> is not treated as a link, but <http://google.com>, <ftp://google.com>, and <mailto:google@google.com> are. git-svn-id: https://pandoc.googlecode.com/svn/trunk@513 788f1e2b-df1e-0410-8736-df70ead52e1b
2007-01-24	Fixed bug in 'extractTagType' in HTML reader: previous	fiddlosopher	1	-1/+4
	version was not skipping / in close tags. git-svn-id: https://pandoc.googlecode.com/svn/trunk@512 788f1e2b-df1e-0410-8736-df70ead52e1b
2007-01-24	Refactored markdown reader so that Text.Regex is not used.	fiddlosopher	1	-14/+19
	Replaced email regex test with a custom email autolink parser (autoLinkEmail). Also replaced 'selfClosingTag' with a custom function 'isSelfClosingTag'. git-svn-id: https://pandoc.googlecode.com/svn/trunk@511 788f1e2b-df1e-0410-8736-df70ead52e1b
2007-01-24	Fixed a bug in extractTagType in HTML Reader: the previous	fiddlosopher	1	-6/+2
	version extracted the attributes, too, which is not wanted. git-svn-id: https://pandoc.googlecode.com/svn/trunk@510 788f1e2b-df1e-0410-8736-df70ead52e1b
2007-01-24	Fixed bug in HTML attribute parser: now a space is	fiddlosopher	1	-2/+2
	required before an attribute. Previously, <a.b> would be parsed as an HTML tag with an attribute! git-svn-id: https://pandoc.googlecode.com/svn/trunk@509 788f1e2b-df1e-0410-8736-df70ead52e1b
2007-01-24	Modified Markdown writer to use autolinks when possible.	fiddlosopher	1	-6/+10
	So, instead of [site.com](site.com) we get <site.com>. Changed test suite accordingly. git-svn-id: https://pandoc.googlecode.com/svn/trunk@508 788f1e2b-df1e-0410-8736-df70ead52e1b
2007-01-24	Rewrote 'extractTagType' in HTML reader so that it doesn't use	fiddlosopher	1	-5/+7
	regexs. git-svn-id: https://pandoc.googlecode.com/svn/trunk@507 788f1e2b-df1e-0410-8736-df70ead52e1b
2007-01-24	More smart quote bug fixes:	fiddlosopher	3	-4/+20
	+ LaTeX writer now handles consecutive quotes properly: for example, ``\,`hello'\,'' + LaTeX reader now parses '\,' as empty Str + normalizeSpaces function in Shared now removes empty Str elements + Modified tests accordingly git-svn-id: https://pandoc.googlecode.com/svn/trunk@506 788f1e2b-df1e-0410-8736-df70ead52e1b
2007-01-24	Fixed bug in smart quoting: recognize ' in contractions like	fiddlosopher	1	-3/+7
	"don't" as not beginning single quoted contexts. git-svn-id: https://pandoc.googlecode.com/svn/trunk@505 788f1e2b-df1e-0410-8736-df70ead52e1b
2007-01-22	Removed 'gsub' entirely and replaced its uses with 'substitute'.	fiddlosopher	6	-13/+5
	git-svn-id: https://pandoc.googlecode.com/svn/trunk@501 788f1e2b-df1e-0410-8736-df70ead52e1b
2007-01-22	+ Added a 'substitute' function to Shared.hs. This is a generic	fiddlosopher	2	-8/+18
	list function that can be used to substitute one substring for another in a string, like 'gsub' except without regular expressions. + Use 'substitute' instead of 'gsub' in the LaTeX writer. This avoids what appears to be a bug in Text.Regex, whereby "\\^" matches "\350". There seems to be a slight speed improvement as well. (Note: If this works, it would be good to replace other uses of gsub that don't employ regexs with 'substitute'.) git-svn-id: https://pandoc.googlecode.com/svn/trunk@500 788f1e2b-df1e-0410-8736-df70ead52e1b
2007-01-18	Small bug fix to last change, and count "'S" as well as "'s" as	fiddlosopher	1	-1/+1
	possessive when followed by non-alphanumeric. git-svn-id: https://pandoc.googlecode.com/svn/trunk@499 788f1e2b-df1e-0410-8736-df70ead52e1b
2007-01-18	More tweaks to smart quote parsing: a ' is not a single quote	fiddlosopher	1	-0/+1
	start if followed by 's' and then a non-alphanumeric. (Yes, this is English-centric, I'm afraid. But it does help, and I can't think of a language in which 's' by itself is a word.) git-svn-id: https://pandoc.googlecode.com/svn/trunk@498 788f1e2b-df1e-0410-8736-df70ead52e1b
2007-01-16	Minor tweaks to smart quoting code.	fiddlosopher	1	-4/+3
	git-svn-id: https://pandoc.googlecode.com/svn/trunk@497 788f1e2b-df1e-0410-8736-df70ead52e1b
2007-01-16	Fixed bug in smart quote recognition: ' before ) or certain	fiddlosopher	1	-3/+4
	other punctuation must not be an open quote. git-svn-id: https://pandoc.googlecode.com/svn/trunk@496 788f1e2b-df1e-0410-8736-df70ead52e1b
2007-01-16	Fixed haddock documentation errors.	fiddlosopher	2	-30/+30
	git-svn-id: https://pandoc.googlecode.com/svn/trunk@495 788f1e2b-df1e-0410-8736-df70ead52e1b
2007-01-15	Added support for tables in markdown reader and in LaTeX,	fiddlosopher	12	-12/+307
	DocBook, and HTML writers. The syntax is documented in README. Tests have been added to the test suite. git-svn-id: https://pandoc.googlecode.com/svn/trunk@493 788f1e2b-df1e-0410-8736-df70ead52e1b
2007-01-09	Need to export TMPDIR in tempdir.sh.	fiddlosopher	1	-0/+1
	git-svn-id: https://pandoc.googlecode.com/svn/trunk@482 788f1e2b-df1e-0410-8736-df70ead52e1b
2007-01-09	On Cygwin, set TMPDIR to . before using mktemp. Otherwise	fiddlosopher	1	-0/+7
	one gets an error creating the output file in the /tmp directory. I haven't tracked this one down, but this should serve as a workaround. git-svn-id: https://pandoc.googlecode.com/svn/trunk@481 788f1e2b-df1e-0410-8736-df70ead52e1b
2007-01-09	Cleaned up markdown2pdf.in. Note that bibtex does not return	fiddlosopher	1	-4/+6
	an error condition when it gives warnings, so instead we grep for warnings or error messages to see if we need to print the log. git-svn-id: https://pandoc.googlecode.com/svn/trunk@476 788f1e2b-df1e-0410-8736-df70ead52e1b
2007-01-09	Minor changes to markdown2pdf: removed an unnecessary '\|\| exit $?',	fiddlosopher	1	-2/+2
	and made sure error output goes to stderr. git-svn-id: https://pandoc.googlecode.com/svn/trunk@475 788f1e2b-df1e-0410-8736-df70ead52e1b
2007-01-09	Don't use named entities in docbook writer. Instead, use	fiddlosopher	1	-4/+4
	numerical entities, for portability across stylesheets. git-svn-id: https://pandoc.googlecode.com/svn/trunk@473 788f1e2b-df1e-0410-8736-df70ead52e1b
2007-01-09	Changes to markdown2pdf.in:	fiddlosopher	1	-19/+36
	+ Exit if pandoc fails (second time through) -- no need to store the log for this. + Run pdflatex up to three times, if needed to resolve references. Also run bibtex as needed. + Minor reformatting. git-svn-id: https://pandoc.googlecode.com/svn/trunk@469 788f1e2b-df1e-0410-8736-df70ead52e1b
2007-01-09	Minor cleanups in markdown2pdf.in.	fiddlosopher	1	-19/+18
	git-svn-id: https://pandoc.googlecode.com/svn/trunk@468 788f1e2b-df1e-0410-8736-df70ead52e1b
2007-01-09	Moved up processing of --dump-args so that output file won't	fiddlosopher	1	-7/+7
	be created first! git-svn-id: https://pandoc.googlecode.com/svn/trunk@465 788f1e2b-df1e-0410-8736-df70ead52e1b
2007-01-08	+ Changed 'escapedChar' in Markdown reader so that only the	fiddlosopher	1	-1/+8
	characters Markdown escapes are escaped in strict mode. When not in strict mode, Pandoc allows all non-alphanumeric characters to be escaped. + Added documentation of backslash escapes to README. git-svn-id: https://pandoc.googlecode.com/svn/trunk@461 788f1e2b-df1e-0410-8736-df70ead52e1b
2007-01-08	+ Export TEXINPUTS variable.	roktas	1	-0/+1
	git-svn-id: https://pandoc.googlecode.com/svn/trunk@460 788f1e2b-df1e-0410-8736-df70ead52e1b
2007-01-08	Various fixes in markdown2pdf.	roktas	1	-17/+20
	+ Add a trailing ':' to TEXTINPUTS as per the instruction in TeX FAQ: http://www.tex.ac.uk/cgi-bin/texfaq2html?label=graphicspath In the lack of it, pdflatex silently fails, for example, with the following command: 'TEXINPUTS=/tmp markdown2pdf' + Put the origdir at the front for the correct directory search order. + pdflatex didn't create log file on one occasion (the above command) that made sed commands failed. Test the existence of log before filtering it. + A few non-essential changes. git-svn-id: https://pandoc.googlecode.com/svn/trunk@459 788f1e2b-df1e-0410-8736-df70ead52e1b
2007-01-08	Removed unneeded "export" statements.	fiddlosopher	1	-7/+4
	git-svn-id: https://pandoc.googlecode.com/svn/trunk@458 788f1e2b-df1e-0410-8736-df70ead52e1b
2007-01-08	Modified shell scripts to use new Pandoc --dump-args and	fiddlosopher	3	-71/+70
	--ignore-args features. This allows a simpler, cleaner design. Make use of TEXINPUTS environment variable to ensure that pdflatex will find images and other sources in the working directory from which markdown2pdf is called. git-svn-id: https://pandoc.googlecode.com/svn/trunk@456 788f1e2b-df1e-0410-8736-df70ead52e1b
2007-01-08	Have pandoc return exit code 2 whenever a usage message is	fiddlosopher	1	-3/+3
	produced, even if it's because a bad option was specified. git-svn-id: https://pandoc.googlecode.com/svn/trunk@455 788f1e2b-df1e-0410-8736-df70ead52e1b
2007-01-08	Changes to Pandoc's options to facilitate wrapper scripts:	fiddlosopher	1	-19/+30
	+ removed -d/--debug option + added --dump-args option, which prints the name of the output file (or '-' for STDOUT) and all the command-line arguments (excluding Pandoc options and their arguments), one per line, then exits. Note that special wrapper options will be treated as arguments if they follow '--' at the end of the command line. Thus, pandoc --dump-args -o foo.html foo.txt -- -e latin1 will print the following to STDOUT: foo.html foo.txt -e latin1 + added --ignore-args option, which causes Pandoc to ignore all (non-option) arguments, including any special options that occur after '--' at the end of the command line. + '-' now means STDIN as the name of an input file, STDOUT as the name of an output file. So, pandoc -o - - will take input from STDIN and print output to STDOUT. Note that if multiple '-o' options are specified on the same line, the last one takes precedence. So, in a script, pandoc "$@" -o - will guarantee output to STDOUT, even if the '-o' option was used. + documented these changes in man pages, README, and changelog. git-svn-id: https://pandoc.googlecode.com/svn/trunk@454 788f1e2b-df1e-0410-8736-df70ead52e1b
2007-01-07	Simplify regex.	roktas	1	-1/+1
	git-svn-id: https://pandoc.googlecode.com/svn/trunk@452 788f1e2b-df1e-0410-8736-df70ead52e1b
2007-01-07	+ Revert previous commit which is wrong and insufficient on some parts.	roktas	1	-14/+9
	+ Improve sed filter to extract the following error contexts: 1. From a line starting with ! to the next blank line. 2. From a line beginning "LaTeX Warning:" to the next blank line. 3. From a line beginning "Error:" to the next blank line, or EOF. + Improve the error message headers (perhaps needs a proof reading). Prepend the wrapper name to the error headers for easy spotting. git-svn-id: https://pandoc.googlecode.com/svn/trunk@451 788f1e2b-df1e-0410-8736-df70ead52e1b
2007-01-07	+ Fix a nasty bug in markdown2pdf. It used to send the log file to	roktas	1	-3/+10
	/dev/null! + Another problem is the sed filter which returns nothing with pdfeTeX '3.141592-1.21a-2.2 (Web2C 7.5.4)' here. As the first cut towards fixing, use a somewhat heuristic approach: try to build a short log by matching against a magic error stamp, dump the whole log if the previous attempt failed. Note that, there is still room to improve this code. git-svn-id: https://pandoc.googlecode.com/svn/trunk@450 788f1e2b-df1e-0410-8736-df70ead52e1b
2007-01-07	Added [breaklinks=true] to hyperref package in LaTeX header.	fiddlosopher	1	-1/+1
	This produces nicer-looking output by default. git-svn-id: https://pandoc.googlecode.com/svn/trunk@449 788f1e2b-df1e-0410-8736-df70ead52e1b
2007-01-07	Small improvements to indentSpaces. (Allow combinations	fiddlosopher	1	-1/+2
	of spaces and tabs.) git-svn-id: https://pandoc.googlecode.com/svn/trunk@446 788f1e2b-df1e-0410-8736-df70ead52e1b
2007-01-07	Modified HTML output for Image elements, to conform to	fiddlosopher	1	-3/+3
	Markdown.pl: + title attribute comes after alt attribute + title is included even if null git-svn-id: https://pandoc.googlecode.com/svn/trunk@445 788f1e2b-df1e-0410-8736-df70ead52e1b
2007-01-07	Fixed performance problems with '--strict' option:	fiddlosopher	2	-17/+18
	+ Replaced skipEndline with "option ' ' newline" where possible. + Replaced "notFollowedBy' header" in definition of endline with a faster but equally accurate test for a folliwng header. + Removed check at the beginning of 'reference' for a noteStart: This is not needed, because note comes before referenceKey in the definition of block. + Replaced check for a following anyHtmlBlockTag in autoLink with a check for anyHtmlTag or anyHtmlEndTag. + Other small code cleanups. git-svn-id: https://pandoc.googlecode.com/svn/trunk@444 788f1e2b-df1e-0410-8736-df70ead52e1b
2007-01-06	Fixed bug in Markdown reader's handling of underscores and other	fiddlosopher	1	-8/+14
	inline formatting markers inside reference labels: for example, in '[A_B]: /url/a_b', the material between underscores was being parsed as emphasized inlines. git-svn-id: https://pandoc.googlecode.com/svn/trunk@442 788f1e2b-df1e-0410-8736-df70ead52e1b
2007-01-06	Added support for hexadecimal entities: e.g. ꂫ	fiddlosopher	1	-6/+6
	git-svn-id: https://pandoc.googlecode.com/svn/trunk@441 788f1e2b-df1e-0410-8736-df70ead52e1b
2007-01-06	Allow '-S' option to be specified together with '--strict', if desired.	fiddlosopher	1	-2/+1
	Thus 'pandoc -S --strict -r markdown -w html' can replace the Markdown.pl/Smartypants combination. git-svn-id: https://pandoc.googlecode.com/svn/trunk@438 788f1e2b-df1e-0410-8736-df70ead52e1b
2007-01-06	Fixed serious performance problems with new Markdown reader:	fiddlosopher	2	-13/+44
	Instead of using lookahead to determine whether a single quote is an apostrophe, we now use state. Inside single quotes, a ' character won't be recognized as the beginning of a single quote. 'stateQuoteContext' has been added to keep track of this. git-svn-id: https://pandoc.googlecode.com/svn/trunk@437 788f1e2b-df1e-0410-8736-df70ead52e1b
2007-01-06	Merged changes from 'quotes' branch since r431. Smart typography	fiddlosopher	12	-308/+383
	is now handled in the Markdown and LaTeX readers, rather than in the writers. The HTML writer has been rewritten to use the prettyprinting library. git-svn-id: https://pandoc.googlecode.com/svn/trunk@436 788f1e2b-df1e-0410-8736-df70ead52e1b
2007-01-05	Setup executable permissions on some files.	roktas	1	-0/+0
	git-svn-id: https://pandoc.googlecode.com/svn/trunk@423 788f1e2b-df1e-0410-8736-df70ead52e1b
2007-01-05	Remove executable permission of hsmarkdown.in.	roktas	1	-0/+0
	git-svn-id: https://pandoc.googlecode.com/svn/trunk@422 788f1e2b-df1e-0410-8736-df70ead52e1b