diff options
-rw-r--r-- | changelog | 57 | ||||
-rw-r--r-- | pandoc.cabal | 16 | ||||
-rw-r--r-- | relann1.7 | 265 | ||||
-rw-r--r-- | src/Text/Pandoc.hs | 3 | ||||
-rw-r--r-- | src/Text/Pandoc/CharacterReferences.hs | 277 | ||||
-rw-r--r-- | src/Text/Pandoc/Readers/HTML.hs | 54 | ||||
-rw-r--r-- | src/Text/Pandoc/Writers/LaTeX.hs | 4 | ||||
-rw-r--r-- | tests/writer.latex | 10 |
8 files changed, 114 insertions, 572 deletions
@@ -2,20 +2,6 @@ pandoc (1.7) [new features] - * New `textile` reader and writer. Thanks to Paul Rivier for contributing - the `textile` reader, an almost complete implementation of the textile - syntax used by the ruby [RedCloth library](http://redcloth.org/textile). - Resolves Issue #51. - - * New `org` writer, for Emacs Org-mode, contributed by Puneeth Chaganti. - - * New `json` reader and writer, for reading and writing a JSON - representation of the native Pandoc AST. These are much faster - than the `native` reader and writer, and should be used for - serializing Pandoc to text. To convert between the JSON representation - and native Pandoc, use `encodeJSON` and `decodeJSON` from - `Text.JSON.Generic`. - * Support for citations using Andrea Rossato's `citeproc-hs` 0.3. You can now write, for example, @@ -37,6 +23,20 @@ pandoc (1.7) syntax, and in the LaTeX reader, using natbib or biblatex syntax. (Thanks to Nathan Gass for the natbib and biblatex support.) + * New `textile` reader and writer. Thanks to Paul Rivier for contributing + the `textile` reader, an almost complete implementation of the textile + syntax used by the ruby [RedCloth library](http://redcloth.org/textile). + Resolves Issue #51. + + * New `org` writer, for Emacs Org-mode, contributed by Puneeth Chaganti. + + * New `json` reader and writer, for reading and writing a JSON + representation of the native Pandoc AST. These are much faster + than the `native` reader and writer, and should be used for + serializing Pandoc to text. To convert between the JSON representation + and native Pandoc, use `encodeJSON` and `decodeJSON` from + `Text.JSON.Generic`. + * A new `--mathjax` option has been added for displaying math in HTML using MathJax. Resolves issue #259. @@ -68,11 +68,15 @@ pandoc (1.7) * Made `--smart` work in HTML, RST, and Textile readers, as well as markdown. + * Added `--html5` option for HTML5 output. + * Added support for listings package in LaTeX reader (Puneeth Chaganti). * Added support for simple tables in the LaTeX reader. + * Added support for simple tables in the HTML reader. + * Significant performance improvements in many readers and writers. [API and program changes] @@ -109,6 +113,9 @@ pandoc (1.7) resulting HTML using `xss-sanitize`, which is based on pandoc's sanitization, but improved. + * Added support for `lang` in `html` tag in the HTML template, + so you can do `pandoc -s -V lang=es`, for example. + * Added `Text.Pandoc.Pretty`. This is better suited for pandoc than the `pretty` package. Changed all writers that used `Text.PrettyPrint.HughesPJ` to use `Text.Pandoc.Pretty` instead. @@ -118,7 +125,7 @@ pandoc (1.7) * `Text.Pandoc.Shared`: - + Added `writerColumns` to `WriterOptions`. + + Added `writerColumns` and `writerHtml5` to `WriterOptions`. + Added `normalize`. + Removed unneeded prettyprinting functions: `wrapped`, `wrapIfNeeded`, `wrappedTeX`, `wrapTeXIfNeeded`, `hang'`, @@ -186,10 +193,16 @@ pandoc (1.7) [Under-the-hood improvements] * Completely rewrote HTML reader using tagsoup as a lexer. The - new reader is faster and more accurate. + new reader is faster and more accurate. Unlike the + old reader, it does not get bogged down on some input + (Issues #277, 255). And it handles namespaces in tags + (Issue #274). * Replaced `escapeStringAsXML` with a faster version. + * Simplified Text.Pandoc.CharacterReferences by using + entity lookup functions from TagSoup. + * Remove duplications in documentation by generating the pandoc man page from README, using `MakeManPage.hs`. @@ -218,6 +231,10 @@ pandoc (1.7) Now they are parsed as `Quoted` inlines, if `--smart` is specified. Resolves Issue #270. + * Text.Pandoc.Parsing: Fixed bug in grid table parser. + Spaces at end of line were not being stripped properly, + resulting in unintended LineBreaks. + * Markdown reader: + Allow HTML comments as inline elements in markdown. @@ -239,6 +256,11 @@ pandoc (1.7) + Allow spaces between '\begin' or '\end' and '{'. + Support \L and \l. + * LaTeX writer: + + + Escape strings in \href{..}. + + In nonsimple tables, put cells in \parbox. + * OpenDocument writer: don't print raw TeX. * Markdown writer: Fixed bug in `Image`. URI was getting unescaped twice! @@ -658,7 +680,8 @@ pandoc (1.5) + Removed stLink, link template variable. Reason: we now always include hyperref in the template. - * Latex template: + * LaTeX template: + + Only show \author if there are some. + Always include hyperref package. It is used not just for links but for toc, section heading bookmarks, footnotes, etc. Also added diff --git a/pandoc.cabal b/pandoc.cabal index 63a6f7fa9..b3394bc91 100644 --- a/pandoc.cabal +++ b/pandoc.cabal @@ -1,6 +1,6 @@ Name: pandoc Version: 1.7 -Cabal-Version: >= 1.2 +Cabal-Version: >= 1.6 Build-Type: Custom License: GPL License-File: COPYING @@ -83,10 +83,22 @@ Extra-Source-Files: tests/insert, tests/lalune.jpg, tests/movie.jpg, + tests/biblio.bib, + tests/chicago-author-date.csl, + tests/ieee.csl, + tests/mhra.csl, tests/latex-reader.latex, tests/latex-reader.native, + tests/biblatex-citations.latex, + tests/natbib-citations.latex, + tests/textile-reader.textile, + tests/textile-reader.native, tests/markdown-reader-more.txt, tests/markdown-reader-more.native, + tests/markdown-citations.txt, + tests/markdown-citations.chicago-author-date.txt, + tests/markdown-citations.mhra.txt, + tests/markdown-citations.ieee.txt, tests/textile-reader.textile, tests/rst-reader.native, tests/rst-reader.rst, @@ -106,6 +118,7 @@ Extra-Source-Files: tests/tables.textile, tests/tables.native, tests/tables.opendocument, + tests/tables.org, tests/tables.texinfo, tests/tables.rst, tests/tables.rtf, @@ -124,6 +137,7 @@ Extra-Source-Files: tests/writer.textile, tests/writer.native, tests/writer.opendocument, + tests/writer.org, tests/writer.rst, tests/writer.rtf, tests/writer.texinfo, diff --git a/relann1.7 b/relann1.7 deleted file mode 100644 index 024c87ed8..000000000 --- a/relann1.7 +++ /dev/null @@ -1,265 +0,0 @@ -I'm pleased to announce the release of pandoc 1.7. - -As usual, a source tarball and Windows installer are available -at <http://code.google.com/p/pandoc/downloads/list>. You can -also use 'cabal install' to get the latest version from HackageDB: - - cabal update - cabal install pandoc - -Thanks to everyone who contributed by filing bug reports or contributing -patches, and especially to Andrea Rossato, Nathan Gass, Paul Rivier, and -Puneeth Chaganti for their major contributions to this version. - -New features ------------- - - * New `textile` reader and writer. Thanks to Paul Rivier for contributing - the `textile` reader, an almost complete implementation of the textile - syntax used by the ruby [RedCloth library](http://redcloth.org/textile). - Resolves Issue #51. - - * New `org` writer, for Emacs Org-mode, contributed by Puneeth Chaganti. - - * New `json` reader and writer, for reading and writing a JSON - representation of the native Pandoc AST. These are much faster - than the `native` reader and writer, and should be used for - serializing Pandoc to text. To convert between the JSON representation - and native Pandoc, use `encodeJSON` and `decodeJSON` from - `Text.JSON.Generic`. - - * Support for citations using Andrea Rossato's `citeproc-hs` 0.3. - You can now write, for example, - - Water is wet [see @doe99, pp. 33-35; also @smith04, ch. 1]. - - and, when you process your document using `pandoc`, specifying - a citation style using `--csl` and a bibliography using `--bibliography`, - the citation will be replaced by an appropriately formatted - citation, and a list of works cited will be added to the end - of the document. - - This means that you can switch effortlessly between different citation - and bibliography styles, including footnote, numerical, and author-date - formats. The bibliography can be in any of the following formats: MODS, - BibTeX, BibLaTeX, RIS, EndNote, EndNote XML, ISI, MEDLINE, Copac, or JSON. - See the README for further details. - - Citations are supported in the markdown reader, using a special - syntax, and in the LaTeX reader, using natbib or biblatex syntax. - (Thanks to Nathan Gass for the natbib and biblatex support.) - - * A new `--mathjax` option has been added for displaying - math in HTML using MathJax. Resolves issue #259. - - * You can now define LaTeX macros in markdown documents, and pandoc - will apply them to TeX math. For example, - - \newcommand{\plus}[2]{#1 + #2} - $\plus{3}{4}$ - - yields `3+4`. Since the macros are applied in the reader, they - will work in every output format, not just LaTeX. - - * LaTeX macros can also be used in LaTeX documents (both in math - and in non-math contexts). - - * Footnotes are now supported in the RST reader. (Note, however, - that pandoc ignores the numeral or symbol used in the note; - footnotes are put in an auto-numbered ordered list.) - Resolves issue #258. - - * `markdown2pdf` now supports `--data-dir`. - - * Improved prettyprinting in most formats. Lines will be wrapped - more evenly and duplicate blank lines avoided. - - * New `--columns` command-line option sets the column width for - line wrapping and relative width calculations for tables. - - * Made `--smart` work in HTML, RST, and Textile readers, as well - as markdown. - - * Added support for listings package in LaTeX reader - (Puneeth Chaganti). - - * Added support for simple tables in the LaTeX reader. - - * Significant performance improvements in many readers and writers. - -API and program changes ------------------------ - - * Moved `Text.Pandoc.Definition` from the `pandoc` package to a new - auxiliary package, `pandoc-types`. This will make it possible for other - programs to supply output in Pandoc format, without depending on the whole - pandoc package. - - * Moved generic functions to `Text.Pandoc.Generic`. Deprecated - `processWith`, replacing it with two functions, `bottomUp` and `topDown`. - Removed previously deprecated functions `processPandoc` and `queryPandoc`. - - * Added `Text.Pandoc.Builder`, for building `Pandoc` structures. - - * `Text.Pandoc` now exports association lists `readers` and `writers`. - - * Removed deprecated `-C/--custom-header` option. - Use `--template` instead. - - * `--biblio-file` has been replaced by `--bibliography`. - `--biblio-format` has been removed; pandoc now guesses the format - from the file extension (see README). - - * pandoc will treat an argument as a URI only if it has an - `http(s)` scheme. Previously pandoc would treat some - Windows pathnames beginning with `C:/` as URIs. - - * pandoc now adds a newline to the end of its output in fragment - mode (= not `--standalone`). - - * The `--sanitize-html` option and the `stateSanitize` field in - `ParserState` have been removed. Sanitization is better done in the - resulting HTML using `xss-sanitize`, which is based on pandoc's - sanitization, but improved. - - * Added `Text.Pandoc.Pretty`. This is better suited for pandoc than the - `pretty` package. Changed all writers that used - `Text.PrettyPrint.HughesPJ` to use `Text.Pandoc.Pretty` instead. - - * Removed `Text.Pandoc.Blocks`. `Text.Pandoc.Pretty` allows you to define - blocks and concatenate them, so a separate module is no longer needed. - - * `Text.Pandoc.Shared`: - - + Added `writerColumns` to `WriterOptions`. - + Added `normalize`. - + Removed unneeded prettyprinting functions: - `wrapped`, `wrapIfNeeded`, `wrappedTeX`, `wrapTeXIfNeeded`, `hang'`, - `BlockWrapper`, `wrappedBlocksToDoc`. - + Made `splitBy` take a test instead of an element. - + Added `findDataFile`, refactored `readDataFile`. - + Added `stringify`. Rewrote `inlineListToIdentifier` using `stringify`. - + Fixed `inlineListToIdentifier` to treat '\160' as ' '. - - * `Text.Pandoc.Readers.HTML`: - - + Removed `rawHtmlBlock`, `anyHtmlBlockTag`, `anyHtmlInlineTag`, - `anyHtmlTag`, `anyHtmlEndTag`, `htmlEndTag`, `extractTagType`, - `htmlBlockElement`, `htmlComment` - + Added `htmlTag`, `htmlInBalanced`, `isInlineTag`, `isBlockTag`, - `isTextTag` - - * Moved `smartPunctuation` from `Text.Pandoc.Readers.Markdown` - to `Text.Pandoc.Readers.Parsing`, and parameterized it with - an inline parser. - - * Ellipses are no longer allowed to contain spaces. - Previously we allowed '. . .', ' . . . ', etc. This caused - too many complications, and removed author's flexibility in - combining ellipses with spaces and periods. - - * Allow linebreaks in URLs (treat as spaces). Also, a string of - consecutive spaces or tabs is now parsed as a single space. If you have - multiple spaces in your URL, use `%20%20`. - - * `Text.Pandoc.Parsing`: - - + Removed `refsMatch`. - + Hid `Key` constructor. - + Removed custom `Ord` and `Eq` instances for `Key`. - + Added `toKey` and `fromKey` to convert between `Key` and `[Inline]`. - + Generalized type on `readWith`. - - * Small change in calculation of relative widths of table columns. - If the size of the header > the specified column width, use - the header size as 100% for purposes of calculating - relative widths of columns. - - * Markdown writer now uses some pandoc-specific features when `--strict` - is not specified: \ newline is used for a hard linebreak instead of - two spaces then a newline. And delimited code blocks are used when - there are attributes. - - * HTML writer: improved gladTeX output by setting ENV appropriately - for display or inline math (Jonathan Daugherty). - - * LaTeX writer: Use `\paragraph`, `\subparagraph` for level 4,5 headers. - - * LaTeX reader: - - + `\label{foo}` and `\ref{foo}` now become `{foo}` instead of `(foo)`. - + `\index{}` commands are skipped. - - * Added `fontsize` variable to default LaTeX template. - This makes it easy to set the font size using `markdown2pdf`: - `markdown2pdf -V fontsize=12pt input.txt`. - - * The `COLUMNS` environment variable no longer has any effect. - -Under-the-hood improvements ---------------------------- - - * Completely rewrote HTML reader using tagsoup as a lexer. The - new reader is faster and more accurate. - - * Replaced `escapeStringAsXML` with a faster version. - - * Remove duplications in documentation by generating the - pandoc man page from README, using `MakeManPage.hs`. - - * Improvements to testing framework: Removed old `tests/RunTests.hs`. - `cabal test` now runs `test-pandoc`, which is built from - `src/test-pandoc.hs` when the `tests` Cabal flag is set. - This allows the testing framework to have its own dependencies. - - * Added `Interact.hs` to make it easier to use ghci while developing. - `Interact.hs` loads `ghci` from the `src` directory, specifying - all the options needed to load pandoc modules (including - specific package dependencies, which it gets by parsing - dist/setup-config). - - * Added `Benchmark.hs`, testing all readers + writers using criterion. - - * Added `stats.sh`, to make it easier to collect and archive - benchmark and lines-of-code stats. - -Bug fixes ---------- - - * Filenames are encoded as UTF8. Resolves Issue #252. - - * Handle curly quotes better in `--smart` mode. Previously, curly quotes - were just parsed literally, leading to problems in some output formats. - Now they are parsed as `Quoted` inlines, if `--smart` is specified. - Resolves Issue #270. - - * Markdown reader: - - + Allow HTML comments as inline elements in markdown. - So, `aaa <!-- comment --> bbb` can be a single paragraph. - + Fixed superscripts with links: `^[link](/foo)^` gets - recognized as a superscripted link, not an inline note followed by - garbage. - + Fixed regression, making markdown reference keys case-insensitive again. - Resolves Issue #272. - + Properly handle abbreviations (like `Mr.`) at the end of a line. - + Better handling of intraword underscores, avoiding exponential - slowdowns in some cases. Resolves Issue #182. - - * LaTeX reader: - - + Improved parsing of preamble. - Previously you'd get unexpected behavior on a document that - contained `\begin{document}` in, say, a verbatim block. - + Allow spaces between '\begin' or '\end' and '{'. - + Support \L and \l. - - * OpenDocument writer: don't print raw TeX. - - * Markdown writer: Fixed bug in `Image`. URI was getting unescaped twice! - - * LaTeX and ConTeXt: Escape `[` and `]` as `{[}` and `{]}`. - This avoids unwanted interpretation as an optional argument. - - * `:` now allowed in HTML tags. Resolves Issue #274. - diff --git a/src/Text/Pandoc.hs b/src/Text/Pandoc.hs index 3532c1d4b..dd1b3892d 100644 --- a/src/Text/Pandoc.hs +++ b/src/Text/Pandoc.hs @@ -149,8 +149,9 @@ readers = [("native" , \_ -> read) ,("markdown+lhs" , \st -> readMarkdown st{ stateLiterateHaskell = True}) ,("rst" , readRST) + ,("rst+lhs" , \st -> + readRST st{ stateLiterateHaskell = True}) ,("textile" , readTextile) -- TODO : textile+lhs - ,("rst+lhs" , readRST) ,("html" , readHtml) ,("latex" , readLaTeX) ,("latex+lhs" , \st -> diff --git a/src/Text/Pandoc/CharacterReferences.hs b/src/Text/Pandoc/CharacterReferences.hs index 8ac55fc61..8157d94d3 100644 --- a/src/Text/Pandoc/CharacterReferences.hs +++ b/src/Text/Pandoc/CharacterReferences.hs @@ -31,9 +31,9 @@ module Text.Pandoc.CharacterReferences ( characterReference, decodeCharacterReferences, ) where -import Data.Char ( chr ) import Text.ParserCombinators.Parsec -import qualified Data.Map as Map +import Text.HTML.TagSoup.Entity ( lookupNamedEntity, lookupNumericEntity ) +import Data.Maybe ( fromMaybe ) -- | Parse character entity. characterReference :: GenParser Char st Char @@ -47,18 +47,21 @@ numRef :: GenParser Char st Char numRef = do char '#' num <- hexNum <|> decNum - return $ chr $ num + return $ fromMaybe '?' $ lookupNumericEntity num -hexNum :: GenParser Char st Int -hexNum = oneOf "Xx" >> many1 hexDigit >>= return . read . (\xs -> '0':'x':xs) +hexNum :: GenParser Char st [Char] +hexNum = do + x <- oneOf "Xx" + num <- many1 hexDigit + return (x:num) -decNum :: GenParser Char st Int -decNum = many1 digit >>= return . read +decNum :: GenParser Char st [Char] +decNum = many1 digit entity :: GenParser Char st Char entity = do body <- many1 alphaNum - return $ Map.findWithDefault '?' body entityTable + return $ fromMaybe '?' $ lookupNamedEntity body -- | Convert entities in a string to characters. decodeCharacterReferences :: String -> String @@ -67,261 +70,3 @@ decodeCharacterReferences str = Left err -> error $ "\nError: " ++ show err Right result -> result -entityTable :: Map.Map String Char -entityTable = Map.fromList entityTableList - -entityTableList :: [(String, Char)] -entityTableList = [ - ("quot", chr 34), - ("amp", chr 38), - ("lt", chr 60), - ("gt", chr 62), - ("nbsp", chr 160), - ("iexcl", chr 161), - ("cent", chr 162), - ("pound", chr 163), - ("curren", chr 164), - ("yen", chr 165), - ("brvbar", chr 166), - ("sect", chr 167), - ("uml", chr 168), - ("copy", chr 169), - ("ordf", chr 170), - ("laquo", chr 171), - ("not", chr 172), - ("shy", chr 173), - ("reg", chr 174), - ("macr", chr 175), - ("deg", chr 176), - ("plusmn", chr 177), - ("sup2", chr 178), - ("sup3", chr 179), - ("acute", chr 180), - ("micro", chr 181), - ("para", chr 182), - ("middot", chr 183), - ("cedil", chr 184), - ("sup1", chr 185), - ("ordm", chr 186), - ("raquo", chr 187), - ("frac14", chr 188), - ("frac12", chr 189), - ("frac34", chr 190), - ("iquest", chr 191), - ("Agrave", chr 192), - ("Aacute", chr 193), - ("Acirc", chr 194), - ("Atilde", chr 195), - ("Auml", chr 196), - ("Aring", chr 197), - ("AElig", chr 198), - ("Ccedil", chr 199), - ("Egrave", chr 200), - ("Eacute", chr 201), - ("Ecirc", chr 202), - ("Euml", chr 203), - ("Igrave", chr 204), - ("Iacute", chr 205), - ("Icirc", chr 206), - ("Iuml", chr 207), - ("ETH", chr 208), - ("Ntilde", chr 209), - ("Ograve", chr 210), - ("Oacute", chr 211), - ("Ocirc", chr 212), - ("Otilde", chr 213), - ("Ouml", chr 214), - ("times", chr 215), - ("Oslash", chr 216), - ("Ugrave", chr 217), - ("Uacute", chr 218), - ("Ucirc", chr 219), - ("Uuml", chr 220), - ("Yacute", chr 221), - ("THORN", chr 222), - ("szlig", chr 223), - ("agrave", chr 224), - ("aacute", chr 225), - ("acirc", chr 226), - ("atilde", chr 227), - ("auml", chr 228), - ("aring", chr 229), - ("aelig", chr 230), - ("ccedil", chr 231), - ("egrave", chr 232), - ("eacute", chr 233), - ("ecirc", chr 234), - ("euml", chr 235), - ("igrave", chr 236), - ("iacute", chr 237), - ("icirc", chr 238), - ("iuml", chr 239), - ("eth", chr 240), - ("ntilde", chr 241), - ("ograve", chr 242), - ("oacute", chr 243), - ("ocirc", chr 244), - ("otilde", chr 245), - ("ouml", chr 246), - ("divide", chr 247), - ("oslash", chr 248), - ("ugrave", chr 249), - ("uacute", chr 250), - ("ucirc", chr 251), - ("uuml", chr 252), - ("yacute", chr 253), - ("thorn", chr 254), - ("yuml", chr 255), - ("OElig", chr 338), - ("oelig", chr 339), - ("Scaron", chr 352), - ("scaron", chr 353), - ("Yuml", chr 376), - ("fnof", chr 402), - ("circ", chr 710), - ("tilde", chr 732), - ("Alpha", chr 913), - ("Beta", chr 914), - ("Gamma", chr 915), - ("Delta", chr 916), - ("Epsilon", chr 917), - ("Zeta", chr 918), - ("Eta", chr 919), - ("Theta", chr 920), - ("Iota", chr 921), - ("Kappa", chr 922), - ("Lambda", chr 923), - ("Mu", chr 924), - ("Nu", chr 925), - ("Xi", chr 926), - ("Omicron", chr 927), - ("Pi", chr 928), - ("Rho", chr 929), - ("Sigma", chr 931), - ("Tau", chr 932), - ("Upsilon", chr 933), - ("Phi", chr 934), - ("Chi", chr 935), - ("Psi", chr 936), - ("Omega", chr 937), - ("alpha", chr 945), - ("beta", chr 946), - ("gamma", chr 947), - ("delta", chr 948), - ("epsilon", chr 949), - ("zeta", chr 950), - ("eta", chr 951), - ("theta", chr 952), - ("iota", chr 953), - ("kappa", chr 954), - ("lambda", chr 955), - ("mu", chr 956), - ("nu", chr 957), - ("xi", chr 958), - ("omicron", chr 959), - ("pi", chr 960), - ("rho", chr 961), - ("sigmaf", chr 962), - ("sigma", chr 963), - ("tau", chr 964), - ("upsilon", chr 965), - ("phi", chr 966), - ("chi", chr 967), - ("psi", chr 968), - ("omega", chr 969), - ("thetasym", chr 977), - ("upsih", chr 978), - ("piv", chr 982), - ("ensp", chr 8194), - ("emsp", chr 8195), - ("thinsp", chr 8201), - ("zwnj", chr 8204), - ("zwj", chr 8205), - ("lrm", chr 8206), - ("rlm", chr 8207), - ("ndash", chr 8211), - ("mdash", chr 8212), - ("lsquo", chr 8216), - ("rsquo", chr 8217), - ("sbquo", chr 8218), - ("ldquo", chr 8220), - ("rdquo", chr 8221), - ("bdquo", chr 8222), - ("dagger", chr 8224), - ("Dagger", chr 8225), - ("bull", chr 8226), - ("hellip", chr 8230), - ("permil", chr 8240), - ("prime", chr 8242), - ("Prime", chr 8243), - ("lsaquo", chr 8249), - ("rsaquo", chr 8250), - ("oline", chr 8254), - ("frasl", chr 8260), - ("euro", chr 8364), - ("image", chr 8465), - ("weierp", chr 8472), - ("real", chr 8476), - ("trade", chr 8482), - ("alefsym", chr 8501), - ("larr", chr 8592), - ("uarr", chr 8593), - ("rarr", chr 8594), - ("darr", chr 8595), - ("harr", chr 8596), - ("crarr", chr 8629), - ("lArr", chr 8656), - ("uArr", chr 8657), - ("rArr", chr 8658), - ("dArr", chr 8659), - ("hArr", chr 8660), - ("forall", chr 8704), - ("part", chr 8706), - ("exist", chr 8707), - ("empty", chr 8709), - ("nabla", chr 8711), - ("isin", chr 8712), - ("notin", chr 8713), - ("ni", chr 8715), - ("prod", chr 8719), - ("sum", chr 8721), - ("minus", chr 8722), - ("lowast", chr 8727), - ("radic", chr 8730), - ("prop", chr 8733), - ("infin", chr 8734), - ("ang", chr 8736), - ("and", chr 8743), - ("or", chr 8744), - ("cap", chr 8745), - ("cup", chr 8746), - ("int", chr 8747), - ("there4", chr 8756), - ("sim", chr 8764), - ("cong", chr 8773), - ("asymp", chr 8776), - ("ne", chr 8800), - ("equiv", chr 8801), - ("le", chr 8804), - ("ge", chr 8805), - ("sub", chr 8834), - ("sup", chr 8835), - ("nsub", chr 8836), - ("sube", chr 8838), - ("supe", chr 8839), - ("oplus", chr 8853), - ("otimes", chr 8855), - ("perp", chr 8869), - ("sdot", chr 8901), - ("lceil", chr 8968), - ("rceil", chr 8969), - ("lfloor", chr 8970), - ("rfloor", chr 8971), - ("lang", chr 9001), - ("rang", chr 9002), - ("loz", chr 9674), - ("spades", chr 9824), - ("clubs", chr 9827), - ("hearts", chr 9829), - ("diams", chr 9830) - ] diff --git a/src/Text/Pandoc/Readers/HTML.hs b/src/Text/Pandoc/Readers/HTML.hs index ae8f0438e..0cbdf72b0 100644 --- a/src/Text/Pandoc/Readers/HTML.hs +++ b/src/Text/Pandoc/Readers/HTML.hs @@ -78,14 +78,14 @@ parseBody :: TagParser [Block] parseBody = liftM concat $ manyTill block eof block :: TagParser [Block] -block = optional pLocation >> - choice [ - pPara +block = choice + [ pPara , pHeader , pBlockQuote , pCodeBlock , pList , pHrule + , pSimpleTable , pPlain , pRawHtmlBlock ] @@ -195,6 +195,27 @@ pHrule = do pSelfClosing (=="hr") (const True) return [HorizontalRule] +pSimpleTable :: TagParser [Block] +pSimpleTable = try $ do + TagOpen _ _ <- pSatisfy (~== TagOpen "table" []) + skipMany pBlank + head' <- option [] $ pInTags "th" pTd + rows <- many1 $ try $ + skipMany pBlank >> pInTags "tr" pTd + skipMany pBlank + TagClose _ <- pSatisfy (~== TagClose "table") + let cols = maximum $ map length rows + let aligns = replicate cols AlignLeft + let widths = replicate cols 0 + return [Table [] aligns widths head' rows] + +pTd :: TagParser [TableCell] +pTd = try $ do + skipMany pBlank + res <- pInTags "td" pPlain + skipMany pBlank + return [res] + pBlockQuote :: TagParser [Block] pBlockQuote = do contents <- pInTags "blockquote" block @@ -235,9 +256,8 @@ pCodeBlock = try $ do return [CodeBlock attribs result] inline :: TagParser [Inline] -inline = choice [ - pLocation - , pTagText +inline = choice + [ pTagText , pEmph , pStrong , pSuperscript @@ -250,17 +270,19 @@ inline = choice [ , pRawHtmlInline ] -pLocation :: TagParser [a] +pLocation :: TagParser () pLocation = do - (TagPosition r c) <- pSatisfy isTagPosition + (TagPosition r c) <- pSat isTagPosition setPosition $ newPos "input" r c - return [] -pSatisfy :: (Tag String -> Bool) -> TagParser (Tag String) -pSatisfy f = do +pSat :: (Tag String -> Bool) -> TagParser (Tag String) +pSat f = do pos <- getPosition token show (const pos) (\x -> if f x then Just x else Nothing) +pSatisfy :: (Tag String -> Bool) -> TagParser (Tag String) +pSatisfy f = try $ optional pLocation >> pSat f + pAnyTag :: TagParser (Tag String) pAnyTag = pSatisfy (const True) @@ -268,7 +290,7 @@ pSelfClosing :: (String -> Bool) -> ([Attribute String] -> Bool) -> TagParser (Tag String) pSelfClosing f g = do open <- pSatisfy (tagOpen f g) - optional $ try $ pLocation >> pSatisfy (tagClose f) + optional $ pSatisfy (tagClose f) return open pEmph :: TagParser [Inline] @@ -342,7 +364,6 @@ pInTags tagtype parser = try $ do pCloses :: String -> TagParser () pCloses tagtype = try $ do - optional pLocation t <- lookAhead $ pSatisfy $ \tag -> isTagClose tag || isTagOpen tag case t of (TagClose t') | t' == tagtype -> pAnyTag >> return () @@ -360,6 +381,11 @@ pTagText = try $ do Left _ -> fail $ "Could not parse `" ++ str ++ "'" Right result -> return result +pBlank :: TagParser () +pBlank = try $ do + (TagText str) <- pSatisfy isTagText + guard $ all isSpace str + pTagContents :: GenParser Char ParserState Inline pTagContents = pStr <|> pSpace <|> smartPunctuation pTagContents <|> pSymbol @@ -433,10 +459,8 @@ _ `closes` "html" = False "a" `closes` "a" = True "li" `closes` "li" = True "th" `closes` t | t `elem` ["th","td"] = True -"td" `closes` t | t `elem` ["th","td"] = True "tr" `closes` t | t `elem` ["th","td","tr"] = True "dt" `closes` t | t `elem` ["dt","dd"] = True -"dd" `closes` t | t `elem` ["dt","dd"] = True "hr" `closes` "p" = True "p" `closes` "p" = True "meta" `closes` "meta" = True diff --git a/src/Text/Pandoc/Writers/LaTeX.hs b/src/Text/Pandoc/Writers/LaTeX.hs index fbf443a03..836e0f974 100644 --- a/src/Text/Pandoc/Writers/LaTeX.hs +++ b/src/Text/Pandoc/Writers/LaTeX.hs @@ -370,8 +370,8 @@ inlineToLaTeX (Link txt (src, _)) = do modify $ \s -> s{ stUrl = True } return $ text $ "\\url{" ++ x ++ "}" _ -> do contents <- inlineListToLaTeX $ deVerb txt - return $ text ("\\href{" ++ src ++ "}{") <> contents <> - char '}' + return $ text ("\\href{" ++ stringToLaTeX src ++ "}{") <> + contents <> char '}' inlineToLaTeX (Image _ (source, _)) = do modify $ \s -> s{ stGraphics = True } return $ "\\includegraphics" <> braces (text source) diff --git a/tests/writer.latex b/tests/writer.latex index 374815f63..eb4012749 100644 --- a/tests/writer.latex +++ b/tests/writer.latex @@ -581,7 +581,7 @@ spaces: a\^{}b c\^{}d, a\ensuremath{\sim}b c\ensuremath{\sim}d. `He said, ``I want to go.''\,' Were you alive in the 70's? Here is some quoted `\verb!code!' and a -``\href{http://example.com/?foo=1&bar=2}{quoted link}''. +``\href{http://example.com/?foo=1\&bar=2}{quoted link}''. Some dashes: one---two --- three---four --- five. @@ -711,7 +711,7 @@ Just a \href{/url/}{URL}. \href{/url/}{URL and title} -\href{/url/with_underscore}{with\_underscore} +\href{/url/with\_underscore}{with\_underscore} \href{mailto:nobody@nowhere.net}{Email link} @@ -746,15 +746,15 @@ Foo \href{/url/}{biz}. \subsection{With ampersands} -Here's a \href{http://example.com/?foo=1&bar=2}{link with an ampersand in the +Here's a \href{http://example.com/?foo=1\&bar=2}{link with an ampersand in the URL}. Here's a link with an amersand in the link text: \href{http://att.com/}{AT\&T}. -Here's an \href{/script?foo=1&bar=2}{inline link}. +Here's an \href{/script?foo=1\&bar=2}{inline link}. -Here's an \href{/script?foo=1&bar=2}{inline link in pointy braces}. +Here's an \href{/script?foo=1\&bar=2}{inline link in pointy braces}. \subsection{Autolinks} |