aboutsummaryrefslogtreecommitdiff
diff options
context:
space:
mode:
-rw-r--r--changelog57
-rw-r--r--pandoc.cabal16
-rw-r--r--relann1.7265
-rw-r--r--src/Text/Pandoc.hs3
-rw-r--r--src/Text/Pandoc/CharacterReferences.hs277
-rw-r--r--src/Text/Pandoc/Readers/HTML.hs54
-rw-r--r--src/Text/Pandoc/Writers/LaTeX.hs4
-rw-r--r--tests/writer.latex10
8 files changed, 114 insertions, 572 deletions
diff --git a/changelog b/changelog
index 6bb057bbc..88f086127 100644
--- a/changelog
+++ b/changelog
@@ -2,20 +2,6 @@ pandoc (1.7)
[new features]
- * New `textile` reader and writer. Thanks to Paul Rivier for contributing
- the `textile` reader, an almost complete implementation of the textile
- syntax used by the ruby [RedCloth library](http://redcloth.org/textile).
- Resolves Issue #51.
-
- * New `org` writer, for Emacs Org-mode, contributed by Puneeth Chaganti.
-
- * New `json` reader and writer, for reading and writing a JSON
- representation of the native Pandoc AST. These are much faster
- than the `native` reader and writer, and should be used for
- serializing Pandoc to text. To convert between the JSON representation
- and native Pandoc, use `encodeJSON` and `decodeJSON` from
- `Text.JSON.Generic`.
-
* Support for citations using Andrea Rossato's `citeproc-hs` 0.3.
You can now write, for example,
@@ -37,6 +23,20 @@ pandoc (1.7)
syntax, and in the LaTeX reader, using natbib or biblatex syntax.
(Thanks to Nathan Gass for the natbib and biblatex support.)
+ * New `textile` reader and writer. Thanks to Paul Rivier for contributing
+ the `textile` reader, an almost complete implementation of the textile
+ syntax used by the ruby [RedCloth library](http://redcloth.org/textile).
+ Resolves Issue #51.
+
+ * New `org` writer, for Emacs Org-mode, contributed by Puneeth Chaganti.
+
+ * New `json` reader and writer, for reading and writing a JSON
+ representation of the native Pandoc AST. These are much faster
+ than the `native` reader and writer, and should be used for
+ serializing Pandoc to text. To convert between the JSON representation
+ and native Pandoc, use `encodeJSON` and `decodeJSON` from
+ `Text.JSON.Generic`.
+
* A new `--mathjax` option has been added for displaying
math in HTML using MathJax. Resolves issue #259.
@@ -68,11 +68,15 @@ pandoc (1.7)
* Made `--smart` work in HTML, RST, and Textile readers, as well
as markdown.
+ * Added `--html5` option for HTML5 output.
+
* Added support for listings package in LaTeX reader
(Puneeth Chaganti).
* Added support for simple tables in the LaTeX reader.
+ * Added support for simple tables in the HTML reader.
+
* Significant performance improvements in many readers and writers.
[API and program changes]
@@ -109,6 +113,9 @@ pandoc (1.7)
resulting HTML using `xss-sanitize`, which is based on pandoc's
sanitization, but improved.
+ * Added support for `lang` in `html` tag in the HTML template,
+ so you can do `pandoc -s -V lang=es`, for example.
+
* Added `Text.Pandoc.Pretty`. This is better suited for pandoc than the
`pretty` package. Changed all writers that used
`Text.PrettyPrint.HughesPJ` to use `Text.Pandoc.Pretty` instead.
@@ -118,7 +125,7 @@ pandoc (1.7)
* `Text.Pandoc.Shared`:
- + Added `writerColumns` to `WriterOptions`.
+ + Added `writerColumns` and `writerHtml5` to `WriterOptions`.
+ Added `normalize`.
+ Removed unneeded prettyprinting functions:
`wrapped`, `wrapIfNeeded`, `wrappedTeX`, `wrapTeXIfNeeded`, `hang'`,
@@ -186,10 +193,16 @@ pandoc (1.7)
[Under-the-hood improvements]
* Completely rewrote HTML reader using tagsoup as a lexer. The
- new reader is faster and more accurate.
+ new reader is faster and more accurate. Unlike the
+ old reader, it does not get bogged down on some input
+ (Issues #277, 255). And it handles namespaces in tags
+ (Issue #274).
* Replaced `escapeStringAsXML` with a faster version.
+ * Simplified Text.Pandoc.CharacterReferences by using
+ entity lookup functions from TagSoup.
+
* Remove duplications in documentation by generating the
pandoc man page from README, using `MakeManPage.hs`.
@@ -218,6 +231,10 @@ pandoc (1.7)
Now they are parsed as `Quoted` inlines, if `--smart` is specified.
Resolves Issue #270.
+ * Text.Pandoc.Parsing: Fixed bug in grid table parser.
+ Spaces at end of line were not being stripped properly,
+ resulting in unintended LineBreaks.
+
* Markdown reader:
+ Allow HTML comments as inline elements in markdown.
@@ -239,6 +256,11 @@ pandoc (1.7)
+ Allow spaces between '\begin' or '\end' and '{'.
+ Support \L and \l.
+ * LaTeX writer:
+
+ + Escape strings in \href{..}.
+ + In nonsimple tables, put cells in \parbox.
+
* OpenDocument writer: don't print raw TeX.
* Markdown writer: Fixed bug in `Image`. URI was getting unescaped twice!
@@ -658,7 +680,8 @@ pandoc (1.5)
+ Removed stLink, link template variable. Reason: we now always
include hyperref in the template.
- * Latex template:
+ * LaTeX template:
+
+ Only show \author if there are some.
+ Always include hyperref package. It is used not just for links but
for toc, section heading bookmarks, footnotes, etc. Also added
diff --git a/pandoc.cabal b/pandoc.cabal
index 63a6f7fa9..b3394bc91 100644
--- a/pandoc.cabal
+++ b/pandoc.cabal
@@ -1,6 +1,6 @@
Name: pandoc
Version: 1.7
-Cabal-Version: >= 1.2
+Cabal-Version: >= 1.6
Build-Type: Custom
License: GPL
License-File: COPYING
@@ -83,10 +83,22 @@ Extra-Source-Files:
tests/insert,
tests/lalune.jpg,
tests/movie.jpg,
+ tests/biblio.bib,
+ tests/chicago-author-date.csl,
+ tests/ieee.csl,
+ tests/mhra.csl,
tests/latex-reader.latex,
tests/latex-reader.native,
+ tests/biblatex-citations.latex,
+ tests/natbib-citations.latex,
+ tests/textile-reader.textile,
+ tests/textile-reader.native,
tests/markdown-reader-more.txt,
tests/markdown-reader-more.native,
+ tests/markdown-citations.txt,
+ tests/markdown-citations.chicago-author-date.txt,
+ tests/markdown-citations.mhra.txt,
+ tests/markdown-citations.ieee.txt,
tests/textile-reader.textile,
tests/rst-reader.native,
tests/rst-reader.rst,
@@ -106,6 +118,7 @@ Extra-Source-Files:
tests/tables.textile,
tests/tables.native,
tests/tables.opendocument,
+ tests/tables.org,
tests/tables.texinfo,
tests/tables.rst,
tests/tables.rtf,
@@ -124,6 +137,7 @@ Extra-Source-Files:
tests/writer.textile,
tests/writer.native,
tests/writer.opendocument,
+ tests/writer.org,
tests/writer.rst,
tests/writer.rtf,
tests/writer.texinfo,
diff --git a/relann1.7 b/relann1.7
deleted file mode 100644
index 024c87ed8..000000000
--- a/relann1.7
+++ /dev/null
@@ -1,265 +0,0 @@
-I'm pleased to announce the release of pandoc 1.7.
-
-As usual, a source tarball and Windows installer are available
-at <http://code.google.com/p/pandoc/downloads/list>. You can
-also use 'cabal install' to get the latest version from HackageDB:
-
- cabal update
- cabal install pandoc
-
-Thanks to everyone who contributed by filing bug reports or contributing
-patches, and especially to Andrea Rossato, Nathan Gass, Paul Rivier, and
-Puneeth Chaganti for their major contributions to this version.
-
-New features
-------------
-
- * New `textile` reader and writer. Thanks to Paul Rivier for contributing
- the `textile` reader, an almost complete implementation of the textile
- syntax used by the ruby [RedCloth library](http://redcloth.org/textile).
- Resolves Issue #51.
-
- * New `org` writer, for Emacs Org-mode, contributed by Puneeth Chaganti.
-
- * New `json` reader and writer, for reading and writing a JSON
- representation of the native Pandoc AST. These are much faster
- than the `native` reader and writer, and should be used for
- serializing Pandoc to text. To convert between the JSON representation
- and native Pandoc, use `encodeJSON` and `decodeJSON` from
- `Text.JSON.Generic`.
-
- * Support for citations using Andrea Rossato's `citeproc-hs` 0.3.
- You can now write, for example,
-
- Water is wet [see @doe99, pp. 33-35; also @smith04, ch. 1].
-
- and, when you process your document using `pandoc`, specifying
- a citation style using `--csl` and a bibliography using `--bibliography`,
- the citation will be replaced by an appropriately formatted
- citation, and a list of works cited will be added to the end
- of the document.
-
- This means that you can switch effortlessly between different citation
- and bibliography styles, including footnote, numerical, and author-date
- formats. The bibliography can be in any of the following formats: MODS,
- BibTeX, BibLaTeX, RIS, EndNote, EndNote XML, ISI, MEDLINE, Copac, or JSON.
- See the README for further details.
-
- Citations are supported in the markdown reader, using a special
- syntax, and in the LaTeX reader, using natbib or biblatex syntax.
- (Thanks to Nathan Gass for the natbib and biblatex support.)
-
- * A new `--mathjax` option has been added for displaying
- math in HTML using MathJax. Resolves issue #259.
-
- * You can now define LaTeX macros in markdown documents, and pandoc
- will apply them to TeX math. For example,
-
- \newcommand{\plus}[2]{#1 + #2}
- $\plus{3}{4}$
-
- yields `3+4`. Since the macros are applied in the reader, they
- will work in every output format, not just LaTeX.
-
- * LaTeX macros can also be used in LaTeX documents (both in math
- and in non-math contexts).
-
- * Footnotes are now supported in the RST reader. (Note, however,
- that pandoc ignores the numeral or symbol used in the note;
- footnotes are put in an auto-numbered ordered list.)
- Resolves issue #258.
-
- * `markdown2pdf` now supports `--data-dir`.
-
- * Improved prettyprinting in most formats. Lines will be wrapped
- more evenly and duplicate blank lines avoided.
-
- * New `--columns` command-line option sets the column width for
- line wrapping and relative width calculations for tables.
-
- * Made `--smart` work in HTML, RST, and Textile readers, as well
- as markdown.
-
- * Added support for listings package in LaTeX reader
- (Puneeth Chaganti).
-
- * Added support for simple tables in the LaTeX reader.
-
- * Significant performance improvements in many readers and writers.
-
-API and program changes
------------------------
-
- * Moved `Text.Pandoc.Definition` from the `pandoc` package to a new
- auxiliary package, `pandoc-types`. This will make it possible for other
- programs to supply output in Pandoc format, without depending on the whole
- pandoc package.
-
- * Moved generic functions to `Text.Pandoc.Generic`. Deprecated
- `processWith`, replacing it with two functions, `bottomUp` and `topDown`.
- Removed previously deprecated functions `processPandoc` and `queryPandoc`.
-
- * Added `Text.Pandoc.Builder`, for building `Pandoc` structures.
-
- * `Text.Pandoc` now exports association lists `readers` and `writers`.
-
- * Removed deprecated `-C/--custom-header` option.
- Use `--template` instead.
-
- * `--biblio-file` has been replaced by `--bibliography`.
- `--biblio-format` has been removed; pandoc now guesses the format
- from the file extension (see README).
-
- * pandoc will treat an argument as a URI only if it has an
- `http(s)` scheme. Previously pandoc would treat some
- Windows pathnames beginning with `C:/` as URIs.
-
- * pandoc now adds a newline to the end of its output in fragment
- mode (= not `--standalone`).
-
- * The `--sanitize-html` option and the `stateSanitize` field in
- `ParserState` have been removed. Sanitization is better done in the
- resulting HTML using `xss-sanitize`, which is based on pandoc's
- sanitization, but improved.
-
- * Added `Text.Pandoc.Pretty`. This is better suited for pandoc than the
- `pretty` package. Changed all writers that used
- `Text.PrettyPrint.HughesPJ` to use `Text.Pandoc.Pretty` instead.
-
- * Removed `Text.Pandoc.Blocks`. `Text.Pandoc.Pretty` allows you to define
- blocks and concatenate them, so a separate module is no longer needed.
-
- * `Text.Pandoc.Shared`:
-
- + Added `writerColumns` to `WriterOptions`.
- + Added `normalize`.
- + Removed unneeded prettyprinting functions:
- `wrapped`, `wrapIfNeeded`, `wrappedTeX`, `wrapTeXIfNeeded`, `hang'`,
- `BlockWrapper`, `wrappedBlocksToDoc`.
- + Made `splitBy` take a test instead of an element.
- + Added `findDataFile`, refactored `readDataFile`.
- + Added `stringify`. Rewrote `inlineListToIdentifier` using `stringify`.
- + Fixed `inlineListToIdentifier` to treat '\160' as ' '.
-
- * `Text.Pandoc.Readers.HTML`:
-
- + Removed `rawHtmlBlock`, `anyHtmlBlockTag`, `anyHtmlInlineTag`,
- `anyHtmlTag`, `anyHtmlEndTag`, `htmlEndTag`, `extractTagType`,
- `htmlBlockElement`, `htmlComment`
- + Added `htmlTag`, `htmlInBalanced`, `isInlineTag`, `isBlockTag`,
- `isTextTag`
-
- * Moved `smartPunctuation` from `Text.Pandoc.Readers.Markdown`
- to `Text.Pandoc.Readers.Parsing`, and parameterized it with
- an inline parser.
-
- * Ellipses are no longer allowed to contain spaces.
- Previously we allowed '. . .', ' . . . ', etc. This caused
- too many complications, and removed author's flexibility in
- combining ellipses with spaces and periods.
-
- * Allow linebreaks in URLs (treat as spaces). Also, a string of
- consecutive spaces or tabs is now parsed as a single space. If you have
- multiple spaces in your URL, use `%20%20`.
-
- * `Text.Pandoc.Parsing`:
-
- + Removed `refsMatch`.
- + Hid `Key` constructor.
- + Removed custom `Ord` and `Eq` instances for `Key`.
- + Added `toKey` and `fromKey` to convert between `Key` and `[Inline]`.
- + Generalized type on `readWith`.
-
- * Small change in calculation of relative widths of table columns.
- If the size of the header > the specified column width, use
- the header size as 100% for purposes of calculating
- relative widths of columns.
-
- * Markdown writer now uses some pandoc-specific features when `--strict`
- is not specified: \ newline is used for a hard linebreak instead of
- two spaces then a newline. And delimited code blocks are used when
- there are attributes.
-
- * HTML writer: improved gladTeX output by setting ENV appropriately
- for display or inline math (Jonathan Daugherty).
-
- * LaTeX writer: Use `\paragraph`, `\subparagraph` for level 4,5 headers.
-
- * LaTeX reader:
-
- + `\label{foo}` and `\ref{foo}` now become `{foo}` instead of `(foo)`.
- + `\index{}` commands are skipped.
-
- * Added `fontsize` variable to default LaTeX template.
- This makes it easy to set the font size using `markdown2pdf`:
- `markdown2pdf -V fontsize=12pt input.txt`.
-
- * The `COLUMNS` environment variable no longer has any effect.
-
-Under-the-hood improvements
----------------------------
-
- * Completely rewrote HTML reader using tagsoup as a lexer. The
- new reader is faster and more accurate.
-
- * Replaced `escapeStringAsXML` with a faster version.
-
- * Remove duplications in documentation by generating the
- pandoc man page from README, using `MakeManPage.hs`.
-
- * Improvements to testing framework: Removed old `tests/RunTests.hs`.
- `cabal test` now runs `test-pandoc`, which is built from
- `src/test-pandoc.hs` when the `tests` Cabal flag is set.
- This allows the testing framework to have its own dependencies.
-
- * Added `Interact.hs` to make it easier to use ghci while developing.
- `Interact.hs` loads `ghci` from the `src` directory, specifying
- all the options needed to load pandoc modules (including
- specific package dependencies, which it gets by parsing
- dist/setup-config).
-
- * Added `Benchmark.hs`, testing all readers + writers using criterion.
-
- * Added `stats.sh`, to make it easier to collect and archive
- benchmark and lines-of-code stats.
-
-Bug fixes
----------
-
- * Filenames are encoded as UTF8. Resolves Issue #252.
-
- * Handle curly quotes better in `--smart` mode. Previously, curly quotes
- were just parsed literally, leading to problems in some output formats.
- Now they are parsed as `Quoted` inlines, if `--smart` is specified.
- Resolves Issue #270.
-
- * Markdown reader:
-
- + Allow HTML comments as inline elements in markdown.
- So, `aaa <!-- comment --> bbb` can be a single paragraph.
- + Fixed superscripts with links: `^[link](/foo)^` gets
- recognized as a superscripted link, not an inline note followed by
- garbage.
- + Fixed regression, making markdown reference keys case-insensitive again.
- Resolves Issue #272.
- + Properly handle abbreviations (like `Mr.`) at the end of a line.
- + Better handling of intraword underscores, avoiding exponential
- slowdowns in some cases. Resolves Issue #182.
-
- * LaTeX reader:
-
- + Improved parsing of preamble.
- Previously you'd get unexpected behavior on a document that
- contained `\begin{document}` in, say, a verbatim block.
- + Allow spaces between '\begin' or '\end' and '{'.
- + Support \L and \l.
-
- * OpenDocument writer: don't print raw TeX.
-
- * Markdown writer: Fixed bug in `Image`. URI was getting unescaped twice!
-
- * LaTeX and ConTeXt: Escape `[` and `]` as `{[}` and `{]}`.
- This avoids unwanted interpretation as an optional argument.
-
- * `:` now allowed in HTML tags. Resolves Issue #274.
-
diff --git a/src/Text/Pandoc.hs b/src/Text/Pandoc.hs
index 3532c1d4b..dd1b3892d 100644
--- a/src/Text/Pandoc.hs
+++ b/src/Text/Pandoc.hs
@@ -149,8 +149,9 @@ readers = [("native" , \_ -> read)
,("markdown+lhs" , \st ->
readMarkdown st{ stateLiterateHaskell = True})
,("rst" , readRST)
+ ,("rst+lhs" , \st ->
+ readRST st{ stateLiterateHaskell = True})
,("textile" , readTextile) -- TODO : textile+lhs
- ,("rst+lhs" , readRST)
,("html" , readHtml)
,("latex" , readLaTeX)
,("latex+lhs" , \st ->
diff --git a/src/Text/Pandoc/CharacterReferences.hs b/src/Text/Pandoc/CharacterReferences.hs
index 8ac55fc61..8157d94d3 100644
--- a/src/Text/Pandoc/CharacterReferences.hs
+++ b/src/Text/Pandoc/CharacterReferences.hs
@@ -31,9 +31,9 @@ module Text.Pandoc.CharacterReferences (
characterReference,
decodeCharacterReferences,
) where
-import Data.Char ( chr )
import Text.ParserCombinators.Parsec
-import qualified Data.Map as Map
+import Text.HTML.TagSoup.Entity ( lookupNamedEntity, lookupNumericEntity )
+import Data.Maybe ( fromMaybe )
-- | Parse character entity.
characterReference :: GenParser Char st Char
@@ -47,18 +47,21 @@ numRef :: GenParser Char st Char
numRef = do
char '#'
num <- hexNum <|> decNum
- return $ chr $ num
+ return $ fromMaybe '?' $ lookupNumericEntity num
-hexNum :: GenParser Char st Int
-hexNum = oneOf "Xx" >> many1 hexDigit >>= return . read . (\xs -> '0':'x':xs)
+hexNum :: GenParser Char st [Char]
+hexNum = do
+ x <- oneOf "Xx"
+ num <- many1 hexDigit
+ return (x:num)
-decNum :: GenParser Char st Int
-decNum = many1 digit >>= return . read
+decNum :: GenParser Char st [Char]
+decNum = many1 digit
entity :: GenParser Char st Char
entity = do
body <- many1 alphaNum
- return $ Map.findWithDefault '?' body entityTable
+ return $ fromMaybe '?' $ lookupNamedEntity body
-- | Convert entities in a string to characters.
decodeCharacterReferences :: String -> String
@@ -67,261 +70,3 @@ decodeCharacterReferences str =
Left err -> error $ "\nError: " ++ show err
Right result -> result
-entityTable :: Map.Map String Char
-entityTable = Map.fromList entityTableList
-
-entityTableList :: [(String, Char)]
-entityTableList = [
- ("quot", chr 34),
- ("amp", chr 38),
- ("lt", chr 60),
- ("gt", chr 62),
- ("nbsp", chr 160),
- ("iexcl", chr 161),
- ("cent", chr 162),
- ("pound", chr 163),
- ("curren", chr 164),
- ("yen", chr 165),
- ("brvbar", chr 166),
- ("sect", chr 167),
- ("uml", chr 168),
- ("copy", chr 169),
- ("ordf", chr 170),
- ("laquo", chr 171),
- ("not", chr 172),
- ("shy", chr 173),
- ("reg", chr 174),
- ("macr", chr 175),
- ("deg", chr 176),
- ("plusmn", chr 177),
- ("sup2", chr 178),
- ("sup3", chr 179),
- ("acute", chr 180),
- ("micro", chr 181),
- ("para", chr 182),
- ("middot", chr 183),
- ("cedil", chr 184),
- ("sup1", chr 185),
- ("ordm", chr 186),
- ("raquo", chr 187),
- ("frac14", chr 188),
- ("frac12", chr 189),
- ("frac34", chr 190),
- ("iquest", chr 191),
- ("Agrave", chr 192),
- ("Aacute", chr 193),
- ("Acirc", chr 194),
- ("Atilde", chr 195),
- ("Auml", chr 196),
- ("Aring", chr 197),
- ("AElig", chr 198),
- ("Ccedil", chr 199),
- ("Egrave", chr 200),
- ("Eacute", chr 201),
- ("Ecirc", chr 202),
- ("Euml", chr 203),
- ("Igrave", chr 204),
- ("Iacute", chr 205),
- ("Icirc", chr 206),
- ("Iuml", chr 207),
- ("ETH", chr 208),
- ("Ntilde", chr 209),
- ("Ograve", chr 210),
- ("Oacute", chr 211),
- ("Ocirc", chr 212),
- ("Otilde", chr 213),
- ("Ouml", chr 214),
- ("times", chr 215),
- ("Oslash", chr 216),
- ("Ugrave", chr 217),
- ("Uacute", chr 218),
- ("Ucirc", chr 219),
- ("Uuml", chr 220),
- ("Yacute", chr 221),
- ("THORN", chr 222),
- ("szlig", chr 223),
- ("agrave", chr 224),
- ("aacute", chr 225),
- ("acirc", chr 226),
- ("atilde", chr 227),
- ("auml", chr 228),
- ("aring", chr 229),
- ("aelig", chr 230),
- ("ccedil", chr 231),
- ("egrave", chr 232),
- ("eacute", chr 233),
- ("ecirc", chr 234),
- ("euml", chr 235),
- ("igrave", chr 236),
- ("iacute", chr 237),
- ("icirc", chr 238),
- ("iuml", chr 239),
- ("eth", chr 240),
- ("ntilde", chr 241),
- ("ograve", chr 242),
- ("oacute", chr 243),
- ("ocirc", chr 244),
- ("otilde", chr 245),
- ("ouml", chr 246),
- ("divide", chr 247),
- ("oslash", chr 248),
- ("ugrave", chr 249),
- ("uacute", chr 250),
- ("ucirc", chr 251),
- ("uuml", chr 252),
- ("yacute", chr 253),
- ("thorn", chr 254),
- ("yuml", chr 255),
- ("OElig", chr 338),
- ("oelig", chr 339),
- ("Scaron", chr 352),
- ("scaron", chr 353),
- ("Yuml", chr 376),
- ("fnof", chr 402),
- ("circ", chr 710),
- ("tilde", chr 732),
- ("Alpha", chr 913),
- ("Beta", chr 914),
- ("Gamma", chr 915),
- ("Delta", chr 916),
- ("Epsilon", chr 917),
- ("Zeta", chr 918),
- ("Eta", chr 919),
- ("Theta", chr 920),
- ("Iota", chr 921),
- ("Kappa", chr 922),
- ("Lambda", chr 923),
- ("Mu", chr 924),
- ("Nu", chr 925),
- ("Xi", chr 926),
- ("Omicron", chr 927),
- ("Pi", chr 928),
- ("Rho", chr 929),
- ("Sigma", chr 931),
- ("Tau", chr 932),
- ("Upsilon", chr 933),
- ("Phi", chr 934),
- ("Chi", chr 935),
- ("Psi", chr 936),
- ("Omega", chr 937),
- ("alpha", chr 945),
- ("beta", chr 946),
- ("gamma", chr 947),
- ("delta", chr 948),
- ("epsilon", chr 949),
- ("zeta", chr 950),
- ("eta", chr 951),
- ("theta", chr 952),
- ("iota", chr 953),
- ("kappa", chr 954),
- ("lambda", chr 955),
- ("mu", chr 956),
- ("nu", chr 957),
- ("xi", chr 958),
- ("omicron", chr 959),
- ("pi", chr 960),
- ("rho", chr 961),
- ("sigmaf", chr 962),
- ("sigma", chr 963),
- ("tau", chr 964),
- ("upsilon", chr 965),
- ("phi", chr 966),
- ("chi", chr 967),
- ("psi", chr 968),
- ("omega", chr 969),
- ("thetasym", chr 977),
- ("upsih", chr 978),
- ("piv", chr 982),
- ("ensp", chr 8194),
- ("emsp", chr 8195),
- ("thinsp", chr 8201),
- ("zwnj", chr 8204),
- ("zwj", chr 8205),
- ("lrm", chr 8206),
- ("rlm", chr 8207),
- ("ndash", chr 8211),
- ("mdash", chr 8212),
- ("lsquo", chr 8216),
- ("rsquo", chr 8217),
- ("sbquo", chr 8218),
- ("ldquo", chr 8220),
- ("rdquo", chr 8221),
- ("bdquo", chr 8222),
- ("dagger", chr 8224),
- ("Dagger", chr 8225),
- ("bull", chr 8226),
- ("hellip", chr 8230),
- ("permil", chr 8240),
- ("prime", chr 8242),
- ("Prime", chr 8243),
- ("lsaquo", chr 8249),
- ("rsaquo", chr 8250),
- ("oline", chr 8254),
- ("frasl", chr 8260),
- ("euro", chr 8364),
- ("image", chr 8465),
- ("weierp", chr 8472),
- ("real", chr 8476),
- ("trade", chr 8482),
- ("alefsym", chr 8501),
- ("larr", chr 8592),
- ("uarr", chr 8593),
- ("rarr", chr 8594),
- ("darr", chr 8595),
- ("harr", chr 8596),
- ("crarr", chr 8629),
- ("lArr", chr 8656),
- ("uArr", chr 8657),
- ("rArr", chr 8658),
- ("dArr", chr 8659),
- ("hArr", chr 8660),
- ("forall", chr 8704),
- ("part", chr 8706),
- ("exist", chr 8707),
- ("empty", chr 8709),
- ("nabla", chr 8711),
- ("isin", chr 8712),
- ("notin", chr 8713),
- ("ni", chr 8715),
- ("prod", chr 8719),
- ("sum", chr 8721),
- ("minus", chr 8722),
- ("lowast", chr 8727),
- ("radic", chr 8730),
- ("prop", chr 8733),
- ("infin", chr 8734),
- ("ang", chr 8736),
- ("and", chr 8743),
- ("or", chr 8744),
- ("cap", chr 8745),
- ("cup", chr 8746),
- ("int", chr 8747),
- ("there4", chr 8756),
- ("sim", chr 8764),
- ("cong", chr 8773),
- ("asymp", chr 8776),
- ("ne", chr 8800),
- ("equiv", chr 8801),
- ("le", chr 8804),
- ("ge", chr 8805),
- ("sub", chr 8834),
- ("sup", chr 8835),
- ("nsub", chr 8836),
- ("sube", chr 8838),
- ("supe", chr 8839),
- ("oplus", chr 8853),
- ("otimes", chr 8855),
- ("perp", chr 8869),
- ("sdot", chr 8901),
- ("lceil", chr 8968),
- ("rceil", chr 8969),
- ("lfloor", chr 8970),
- ("rfloor", chr 8971),
- ("lang", chr 9001),
- ("rang", chr 9002),
- ("loz", chr 9674),
- ("spades", chr 9824),
- ("clubs", chr 9827),
- ("hearts", chr 9829),
- ("diams", chr 9830)
- ]
diff --git a/src/Text/Pandoc/Readers/HTML.hs b/src/Text/Pandoc/Readers/HTML.hs
index ae8f0438e..0cbdf72b0 100644
--- a/src/Text/Pandoc/Readers/HTML.hs
+++ b/src/Text/Pandoc/Readers/HTML.hs
@@ -78,14 +78,14 @@ parseBody :: TagParser [Block]
parseBody = liftM concat $ manyTill block eof
block :: TagParser [Block]
-block = optional pLocation >>
- choice [
- pPara
+block = choice
+ [ pPara
, pHeader
, pBlockQuote
, pCodeBlock
, pList
, pHrule
+ , pSimpleTable
, pPlain
, pRawHtmlBlock
]
@@ -195,6 +195,27 @@ pHrule = do
pSelfClosing (=="hr") (const True)
return [HorizontalRule]
+pSimpleTable :: TagParser [Block]
+pSimpleTable = try $ do
+ TagOpen _ _ <- pSatisfy (~== TagOpen "table" [])
+ skipMany pBlank
+ head' <- option [] $ pInTags "th" pTd
+ rows <- many1 $ try $
+ skipMany pBlank >> pInTags "tr" pTd
+ skipMany pBlank
+ TagClose _ <- pSatisfy (~== TagClose "table")
+ let cols = maximum $ map length rows
+ let aligns = replicate cols AlignLeft
+ let widths = replicate cols 0
+ return [Table [] aligns widths head' rows]
+
+pTd :: TagParser [TableCell]
+pTd = try $ do
+ skipMany pBlank
+ res <- pInTags "td" pPlain
+ skipMany pBlank
+ return [res]
+
pBlockQuote :: TagParser [Block]
pBlockQuote = do
contents <- pInTags "blockquote" block
@@ -235,9 +256,8 @@ pCodeBlock = try $ do
return [CodeBlock attribs result]
inline :: TagParser [Inline]
-inline = choice [
- pLocation
- , pTagText
+inline = choice
+ [ pTagText
, pEmph
, pStrong
, pSuperscript
@@ -250,17 +270,19 @@ inline = choice [
, pRawHtmlInline
]
-pLocation :: TagParser [a]
+pLocation :: TagParser ()
pLocation = do
- (TagPosition r c) <- pSatisfy isTagPosition
+ (TagPosition r c) <- pSat isTagPosition
setPosition $ newPos "input" r c
- return []
-pSatisfy :: (Tag String -> Bool) -> TagParser (Tag String)
-pSatisfy f = do
+pSat :: (Tag String -> Bool) -> TagParser (Tag String)
+pSat f = do
pos <- getPosition
token show (const pos) (\x -> if f x then Just x else Nothing)
+pSatisfy :: (Tag String -> Bool) -> TagParser (Tag String)
+pSatisfy f = try $ optional pLocation >> pSat f
+
pAnyTag :: TagParser (Tag String)
pAnyTag = pSatisfy (const True)
@@ -268,7 +290,7 @@ pSelfClosing :: (String -> Bool) -> ([Attribute String] -> Bool)
-> TagParser (Tag String)
pSelfClosing f g = do
open <- pSatisfy (tagOpen f g)
- optional $ try $ pLocation >> pSatisfy (tagClose f)
+ optional $ pSatisfy (tagClose f)
return open
pEmph :: TagParser [Inline]
@@ -342,7 +364,6 @@ pInTags tagtype parser = try $ do
pCloses :: String -> TagParser ()
pCloses tagtype = try $ do
- optional pLocation
t <- lookAhead $ pSatisfy $ \tag -> isTagClose tag || isTagOpen tag
case t of
(TagClose t') | t' == tagtype -> pAnyTag >> return ()
@@ -360,6 +381,11 @@ pTagText = try $ do
Left _ -> fail $ "Could not parse `" ++ str ++ "'"
Right result -> return result
+pBlank :: TagParser ()
+pBlank = try $ do
+ (TagText str) <- pSatisfy isTagText
+ guard $ all isSpace str
+
pTagContents :: GenParser Char ParserState Inline
pTagContents = pStr <|> pSpace <|> smartPunctuation pTagContents <|> pSymbol
@@ -433,10 +459,8 @@ _ `closes` "html" = False
"a" `closes` "a" = True
"li" `closes` "li" = True
"th" `closes` t | t `elem` ["th","td"] = True
-"td" `closes` t | t `elem` ["th","td"] = True
"tr" `closes` t | t `elem` ["th","td","tr"] = True
"dt" `closes` t | t `elem` ["dt","dd"] = True
-"dd" `closes` t | t `elem` ["dt","dd"] = True
"hr" `closes` "p" = True
"p" `closes` "p" = True
"meta" `closes` "meta" = True
diff --git a/src/Text/Pandoc/Writers/LaTeX.hs b/src/Text/Pandoc/Writers/LaTeX.hs
index fbf443a03..836e0f974 100644
--- a/src/Text/Pandoc/Writers/LaTeX.hs
+++ b/src/Text/Pandoc/Writers/LaTeX.hs
@@ -370,8 +370,8 @@ inlineToLaTeX (Link txt (src, _)) =
do modify $ \s -> s{ stUrl = True }
return $ text $ "\\url{" ++ x ++ "}"
_ -> do contents <- inlineListToLaTeX $ deVerb txt
- return $ text ("\\href{" ++ src ++ "}{") <> contents <>
- char '}'
+ return $ text ("\\href{" ++ stringToLaTeX src ++ "}{") <>
+ contents <> char '}'
inlineToLaTeX (Image _ (source, _)) = do
modify $ \s -> s{ stGraphics = True }
return $ "\\includegraphics" <> braces (text source)
diff --git a/tests/writer.latex b/tests/writer.latex
index 374815f63..eb4012749 100644
--- a/tests/writer.latex
+++ b/tests/writer.latex
@@ -581,7 +581,7 @@ spaces: a\^{}b c\^{}d, a\ensuremath{\sim}b c\ensuremath{\sim}d.
`He said, ``I want to go.''\,' Were you alive in the 70's?
Here is some quoted `\verb!code!' and a
-``\href{http://example.com/?foo=1&bar=2}{quoted link}''.
+``\href{http://example.com/?foo=1\&bar=2}{quoted link}''.
Some dashes: one---two --- three---four --- five.
@@ -711,7 +711,7 @@ Just a \href{/url/}{URL}.
\href{/url/}{URL and title}
-\href{/url/with_underscore}{with\_underscore}
+\href{/url/with\_underscore}{with\_underscore}
\href{mailto:nobody@nowhere.net}{Email link}
@@ -746,15 +746,15 @@ Foo \href{/url/}{biz}.
\subsection{With ampersands}
-Here's a \href{http://example.com/?foo=1&bar=2}{link with an ampersand in the
+Here's a \href{http://example.com/?foo=1\&bar=2}{link with an ampersand in the
URL}.
Here's a link with an amersand in the link text:
\href{http://att.com/}{AT\&T}.
-Here's an \href{/script?foo=1&bar=2}{inline link}.
+Here's an \href{/script?foo=1\&bar=2}{inline link}.
-Here's an \href{/script?foo=1&bar=2}{inline link in pointy braces}.
+Here's an \href{/script?foo=1\&bar=2}{inline link in pointy braces}.
\subsection{Autolinks}