diff options
32 files changed, 1454 insertions, 277 deletions
@@ -6,8 +6,11 @@ all: cabal-dev configure --enable-tests --enable-benchmarks && cabal-dev build +prof: + cabal-dev configure --enable-tests --enable-library-profiling --enable-executable-profiling && cabal-dev build + prep: pandoc-types citeproc-hs - cabal-dev install-deps --enable-tests --enable-benchmarks + cabal-dev install-deps --enable-library-profiling --enable-tests --enable-benchmarks quick: cabal-dev configure --enable-tests --disable-optimization && cabal-dev build @@ -27,4 +30,4 @@ citeproc-hs: pandoc-types cabal-dev add-source citeproc-hs install: - cabal-dev install + cabal-dev install --enable-tests --enable-benchmarks @@ -13,13 +13,14 @@ Description Pandoc is a [Haskell] library for converting from one markup format to another, and a command-line tool that uses this library. It can read [markdown] and (subsets of) [Textile], [reStructuredText], [HTML], -[LaTeX], and [DocBook XML]; and it can write plain text, [markdown], -[reStructuredText], [XHTML], [HTML 5], [LaTeX] (including [beamer] -slide shows), [ConTeXt], [RTF], [DocBook XML], [OpenDocument XML], -[ODT], [Word docx], [GNU Texinfo], [MediaWiki markup], [EPUB], [FictionBook2], -[Textile], [groff man] pages, [Emacs Org-Mode], [AsciiDoc], and [Slidy], -[Slideous], [DZSlides], or [S5] HTML slide shows. It can also produce -[PDF] output on systems where LaTeX is installed. +[LaTeX], [MediaWiki markup], and [DocBook XML]; and it can write plain +text, [markdown], [reStructuredText], [XHTML], [HTML 5], [LaTeX] +(including [beamer] slide shows), [ConTeXt], [RTF], [DocBook XML], +[OpenDocument XML], [ODT], [Word docx], [GNU Texinfo], [MediaWiki +markup], [EPUB], [FictionBook2], [Textile], [groff man] pages, [Emacs +Org-Mode], [AsciiDoc], and [Slidy], [Slideous], [DZSlides], or [S5] HTML +slide shows. It can also produce [PDF] output on systems where LaTeX is +installed. Pandoc's enhanced version of markdown includes syntax for footnotes, tables, flexible ordered lists, definition lists, fenced code blocks, @@ -107,7 +108,7 @@ to PDF: Production of a PDF requires that a LaTeX engine be installed (see `--latex-engine`, below), and assumes that the following LaTeX packages are available: `amssymb`, `amsmath`, `ifxetex`, `ifluatex`, `listings` (if the -`--listings` option is used), `fancyvrb`, `enumerate`, `ctable`, `url`, +`--listings` option is used), `fancyvrb`, `enumerate`, `longtable`, `url`, `graphicx`, `hyperref`, `ulem`, `babel` (if the `lang` variable is set), `fontspec` (if `xelatex` or `lualatex` is used as the LaTeX engine), `xltxtra` and `xunicode` (if `xelatex` is used). @@ -140,18 +141,18 @@ General options `json` (JSON version of native AST), `markdown` (pandoc's extended markdown), `markdown_strict` (original unextended markdown), `textile` (Textile), `rst` (reStructuredText), `html` (HTML), - `docbook` (DocBook XML), or `latex` (LaTeX). If `+lhs` is - appended to `markdown`, `rst`, `latex`, the input will be - treated as literate Haskell source: see [Literate Haskell - support](#literate-haskell-support), below. Markdown syntax - extensions can be individually enabled or disabled by appending - `+EXTENSION` or `-EXTENSION` to the format name. So, for example, - `markdown_strict+footnotes+definition_lists` is strict markdown - with footnotes and definition lists enabled, and - `markdown-pipe_tables+hard_line_breaks` is pandoc's markdown - without pipe tables and with hard line breaks. See - [Pandoc's markdown](#pandocs-markdown), below, for a list of - extensions and their names. + `docbook` (DocBook XML), `mediawiki` (MediaWiki markup), + or `latex` (LaTeX). If `+lhs` is appended to `markdown`, `rst`, + `latex`, the input will be treated as literate Haskell source: + see [Literate Haskell support](#literate-haskell-support), below. + Markdown syntax extensions can be individually enabled or disabled + by appending `+EXTENSION` or `-EXTENSION` to the format name. + So, for example, `markdown_strict+footnotes+definition_lists` + is strict markdown with footnotes and definition lists enabled, + and `markdown-pipe_tables+hard_line_breaks` is pandoc's markdown + without pipe tables and with hard line breaks. See [Pandoc's + markdown](#pandocs-markdown), below, for a list of extensions and + their names. `-t` *FORMAT*, `-w` *FORMAT*, `--to=`*FORMAT*, `--write=`*FORMAT* : Specify output format. *FORMAT* can be `native` (native Haskell), @@ -415,8 +416,8 @@ Options affecting specific writers `--id-prefix`=*STRING* : Specify a prefix to be added to all automatically generated identifiers - in HTML output. This is useful for preventing duplicate identifiers - when generating fragments to be included in other pages. + in HTML and DocBook output. This is useful for preventing duplicate + identifiers when generating fragments to be included in other pages. `-T` *STRING*, `--title-prefix=`*STRING* : Specify *STRING* as a prefix at the beginning of the title @@ -1813,11 +1814,11 @@ HTML, Slidy, DZSlides, S5, EPUB styled differently from the surrounding text if needed. 2. If the `--latexmathml` option is used, TeX math will be displayed - between $ or $$ characters and put in `<span>` tags with class `LaTeX`. + between `$` or `$$` characters and put in `<span>` tags with class `LaTeX`. The [LaTeXMathML] script will be used to render it as formulas. (This trick does not work in all browsers, but it works in Firefox. In browsers that do not support LaTeXMathML, TeX math will appear - verbatim between $ characters.) + verbatim between `$` characters.) 3. If the `--jsmath` option is used, TeX math will be put inside `<span>` tags (for inline math) or `<div>` tags (for display math) diff --git a/pandoc.cabal b/pandoc.cabal index e999f1b80..0b234c52f 100644 --- a/pandoc.cabal +++ b/pandoc.cabal @@ -16,10 +16,11 @@ Synopsis: Conversion between markup formats Description: Pandoc is a Haskell library for converting from one markup format to another, and a command-line tool that uses this library. It can read markdown and (subsets of) HTML, - reStructuredText, LaTeX, DocBook, and Textile, and it can write - markdown, reStructuredText, HTML, LaTeX, ConTeXt, Docbook, - OpenDocument, ODT, Word docx, RTF, MediaWiki, Textile, - groff man pages, plain text, Emacs Org-Mode, AsciiDoc, EPUB, + reStructuredText, LaTeX, DocBook, MediaWiki markup, + and Textile, and it can write markdown, reStructuredText, + HTML, LaTeX, ConTeXt, Docbook, OpenDocument, ODT, + Word docx, RTF, MediaWiki, Textile, groff man pages, + plain text, Emacs Org-Mode, AsciiDoc, EPUB, FictionBook2, and S5, Slidy and Slideous HTML slide shows. . Pandoc extends standard markdown syntax with footnotes, @@ -120,6 +121,8 @@ Extra-Source-Files: tests/markdown-citations.mhra.txt, tests/markdown-citations.ieee.txt, tests/textile-reader.textile, + tests/mediawiki-reader.wiki, + tests/mediawiki-reader.native, tests/rst-reader.native, tests/rst-reader.rst, tests/s5.basic.html, @@ -262,6 +265,7 @@ Library Text.Pandoc.Readers.HTML, Text.Pandoc.Readers.LaTeX, Text.Pandoc.Readers.Markdown, + Text.Pandoc.Readers.MediaWiki, Text.Pandoc.Readers.RST, Text.Pandoc.Readers.DocBook, Text.Pandoc.Readers.TeXMath, diff --git a/src/Text/Pandoc.hs b/src/Text/Pandoc.hs index 33706816e..1e6b1d010 100644 --- a/src/Text/Pandoc.hs +++ b/src/Text/Pandoc.hs @@ -66,6 +66,7 @@ module Text.Pandoc , writers -- * Readers: converting /to/ Pandoc format , readMarkdown + , readMediaWiki , readRST , readLaTeX , readHtml @@ -110,6 +111,7 @@ module Text.Pandoc import Text.Pandoc.Definition import Text.Pandoc.Generic import Text.Pandoc.Readers.Markdown +import Text.Pandoc.Readers.MediaWiki import Text.Pandoc.Readers.RST import Text.Pandoc.Readers.DocBook import Text.Pandoc.Readers.LaTeX @@ -179,6 +181,7 @@ readers = [("native" , \_ -> readNative) ,("markdown_strict" , readMarkdown) ,("markdown" , readMarkdown) ,("rst" , readRST) + ,("mediawiki" , readMediaWiki) ,("docbook" , readDocBook) ,("textile" , readTextile) -- TODO : textile+lhs ,("html" , readHtml) diff --git a/src/Text/Pandoc/Parsing.hs b/src/Text/Pandoc/Parsing.hs index 50691f409..bee96be82 100644 --- a/src/Text/Pandoc/Parsing.hs +++ b/src/Text/Pandoc/Parsing.hs @@ -82,6 +82,7 @@ module Text.Pandoc.Parsing ( (>>~), ellipses, apostrophe, dash, + nested, macro, applyMacros', Parser, @@ -379,10 +380,11 @@ uri = try $ do char ')' return $ '(' : res ++ ")" str <- liftM concat $ many1 $ inParens <|> count 1 (innerPunct <|> uriChar) + str' <- option str $ char '/' >> return (str ++ "/") -- now see if they amount to an absolute URI - case parseURI (escapeURI str) of + case parseURI (escapeURI str') of Just uri' -> if uriScheme uri' `elem` protocols - then return (str, show uri') + then return (str', show uri') else fail "not a URI" Nothing -> fail "not a URI" @@ -811,8 +813,8 @@ quoted :: Parser [Char] ParserState Inline quoted inlineParser = doubleQuoted inlineParser <|> singleQuoted inlineParser withQuoteContext :: QuoteContext - -> Parser [Char] ParserState a - -> Parser [Char] ParserState a + -> Parser [tok] ParserState a + -> Parser [tok] ParserState a withQuoteContext context parser = do oldState <- getState let oldQuoteContext = stateQuoteContext oldState @@ -924,6 +926,18 @@ emDashOld = do try (charOrRef "\8212\151") <|> (try $ string "--" >> optional (char '-') >> return '-') return (Str "\8212") +-- This is used to prevent exponential blowups for things like: +-- a**a*a**a*a**a*a**a*a**a*a**a*a** +nested :: Parser s ParserState a + -> Parser s ParserState a +nested p = do + nestlevel <- stateMaxNestingLevel `fmap` getState + guard $ nestlevel > 0 + updateState $ \st -> st{ stateMaxNestingLevel = stateMaxNestingLevel st - 1 } + res <- p + updateState $ \st -> st{ stateMaxNestingLevel = nestlevel } + return res + -- -- Macros -- diff --git a/src/Text/Pandoc/Readers/HTML.hs b/src/Text/Pandoc/Readers/HTML.hs index e5c310ffc..424d9bdec 100644 --- a/src/Text/Pandoc/Readers/HTML.hs +++ b/src/Text/Pandoc/Readers/HTML.hs @@ -271,6 +271,7 @@ pCodeBlock = try $ do inline :: TagParser [Inline] inline = choice [ pTagText + , pQ , pEmph , pStrong , pSuperscript @@ -306,6 +307,17 @@ pSelfClosing f g = do optional $ pSatisfy (tagClose f) return open +pQ :: TagParser [Inline] +pQ = do + quoteContext <- stateQuoteContext `fmap` getState + let quoteType = case quoteContext of + InDoubleQuote -> SingleQuote + _ -> DoubleQuote + let innerQuoteContext = if quoteType == SingleQuote + then InSingleQuote + else InDoubleQuote + withQuoteContext innerQuoteContext $ pInlinesInTags "q" (Quoted quoteType) + pEmph :: TagParser [Inline] pEmph = pInlinesInTags "em" Emph <|> pInlinesInTags "i" Emph @@ -585,9 +597,9 @@ htmlInBalanced f = try $ do return $ tag ++ concat contents ++ endtag -- | Matches a tag meeting a certain condition. -htmlTag :: (Tag String -> Bool) -> Parser [Char] ParserState (Tag String, String) +htmlTag :: (Tag String -> Bool) -> Parser [Char] st (Tag String, String) htmlTag f = try $ do - lookAhead (char '<') + lookAhead $ char '<' >> (oneOf "/!?" <|> letter) (next : _) <- getInput >>= return . canonicalizeTags . parseTags guard $ f next -- advance the parser diff --git a/src/Text/Pandoc/Readers/LaTeX.hs b/src/Text/Pandoc/Readers/LaTeX.hs index 4a5a14d6a..86ae400de 100644 --- a/src/Text/Pandoc/Readers/LaTeX.hs +++ b/src/Text/Pandoc/Readers/LaTeX.hs @@ -166,10 +166,8 @@ double_quote = (doubleQuoted . mconcat) <$> (try $ string "``" *> manyTill inline (try $ string "''")) single_quote :: LP Inlines -single_quote = char '`' *> - ( try ((singleQuoted . mconcat) <$> - manyTill inline (try $ char '\'' >> notFollowedBy letter)) - <|> lit "`") +single_quote = (singleQuoted . mconcat) <$> + (try $ char '`' *> manyTill inline (try $ char '\'' >> notFollowedBy letter)) inline :: LP Inlines inline = (mempty <$ comment) @@ -181,6 +179,9 @@ inline = (mempty <$ comment) ((char '-') *> option (str "–") (str "—" <$ char '-'))) <|> double_quote <|> single_quote + <|> (str "“" <$ try (string "``")) -- nb. {``} won't be caught by double_quote + <|> (str "”" <$ try (string "''")) + <|> (str "‘" <$ char '`') -- nb. {`} won't be caught by single_quote <|> (str "’" <$ char '\'') <|> (str "\160" <$ char '~') <|> (mathDisplay $ string "$$" *> mathChars <* string "$$") @@ -188,10 +189,9 @@ inline = (mempty <$ comment) <|> (superscript <$> (char '^' *> tok)) <|> (subscript <$> (char '_' *> tok)) <|> (guardEnabled Ext_literate_haskell *> char '|' *> doLHSverb) - <|> (str <$> count 1 tildeEscape) - <|> (str <$> string "]") - <|> (str <$> string "#") -- TODO print warning? - <|> (str <$> string "&") -- TODO print warning? + <|> (str . (:[]) <$> tildeEscape) + <|> (str . (:[]) <$> oneOf "[]") + <|> (str . (:[]) <$> oneOf "#&") -- TODO print warning? -- <|> (str <$> count 1 (satisfy (\c -> c /= '\\' && c /='\n' && c /='}' && c /='{'))) -- eat random leftover characters inlines :: LP Inlines @@ -203,8 +203,8 @@ block = (mempty <$ comment) <|> environment <|> mempty <$ macro -- TODO improve macros, make them work everywhere <|> blockCommand - <|> grouped block <|> paragraph + <|> grouped block <|> (mempty <$ char '&') -- loose & in table environment @@ -214,6 +214,7 @@ blocks = mconcat <$> many block blockCommand :: LP Blocks blockCommand = try $ do name <- anyControlSeq + guard $ name /= "begin" && name /= "end" star <- option "" (string "*" <* optional sp) let name' = name ++ star case M.lookup name' blockCommands of @@ -265,8 +266,6 @@ blockCommands = M.fromList $ , ("closing", skipopts *> closing) -- , ("rule", skipopts *> tok *> tok *> pure horizontalRule) - , ("begin", mzero) -- these are here so they won't be interpreted as inline - , ("end", mzero) , ("item", skipopts *> loose_item) , ("documentclass", skipopts *> braced *> preamble) , ("centerline", (para . trimInlines) <$> (skipopts *> tok)) @@ -321,6 +320,7 @@ section lvl = do inlineCommand :: LP Inlines inlineCommand = try $ do name <- anyControlSeq + guard $ name /= "begin" && name /= "end" guard $ not $ isBlockCommand name parseRaw <- getOption readerParseRaw star <- option "" (string "*") @@ -352,6 +352,7 @@ inlineCommands = M.fromList $ , ("textsubscript", subscript <$> tok) , ("textbackslash", lit "\\") , ("backslash", lit "\\") + , ("slash", lit "/") , ("textbf", strong <$> tok) , ("ldots", lit "…") , ("dots", lit "…") @@ -644,11 +645,7 @@ inlineText :: LP Inlines inlineText = str <$> many1 inlineChar inlineChar :: LP Char -inlineChar = satisfy $ \c -> - not (c == '\\' || c == '$' || c == '%' || c == '^' || c == '_' || - c == '&' || c == '~' || c == '#' || c == '{' || c == '}' || - c == '^' || c == '\'' || c == '`' || c == '-' || c == ']' || - c == ' ' || c == '\t' || c == '\n' ) +inlineChar = noneOf "\\$%^_&~#{}^'`-[] \t\n" environment :: LP Blocks environment = do @@ -745,6 +742,9 @@ environments = M.fromList , ("lstlisting", codeBlock <$> (verbEnv "lstlisting")) , ("minted", liftA2 (\l c -> codeBlockWith ("",[l],[]) c) (grouped (many1 $ satisfy (/= '}'))) (verbEnv "minted")) + , ("obeylines", parseFromString + (para . trimInlines . mconcat <$> many inline) =<< + intercalate "\\\\\n" . lines <$> verbEnv "obeylines") , ("displaymath", mathEnv Nothing "displaymath") , ("equation", mathEnv Nothing "equation") , ("equation*", mathEnv Nothing "equation*") @@ -801,7 +801,9 @@ descItem = do return (ils, [bs]) env :: String -> LP a -> LP a -env name p = p <* (controlSeq "end" *> braced >>= guard . (== name)) +env name p = p <* + (try (controlSeq "end" *> braced >>= guard . (== name)) + <?> ("\\end{" ++ name ++ "}")) listenv :: String -> LP a -> LP a listenv name p = try $ do diff --git a/src/Text/Pandoc/Readers/Markdown.hs b/src/Text/Pandoc/Readers/Markdown.hs index 2407e137c..1c2cc12f1 100644 --- a/src/Text/Pandoc/Readers/Markdown.hs +++ b/src/Text/Pandoc/Readers/Markdown.hs @@ -1,6 +1,4 @@ {-# LANGUAGE RelaxedPolyRec #-} -- needed for inlinesBetween on GHC < 7 -{-# LANGUAGE FlexibleInstances, TypeSynonymInstances, - GeneralizedNewtypeDeriving #-} {- Copyright (C) 2006-2010 John MacFarlane <jgm@berkeley.edu> @@ -1294,18 +1292,6 @@ inlinesBetween start end = where inner = innerSpace <|> (notFollowedBy' (() <$ whitespace) >> inline) innerSpace = try $ whitespace >>~ notFollowedBy' end --- This is used to prevent exponential blowups for things like: --- a**a*a**a*a**a*a**a*a**a*a**a*a** -nested :: Parser [Char] ParserState a - -> Parser [Char] ParserState a -nested p = do - nestlevel <- stateMaxNestingLevel `fmap` getState - guard $ nestlevel > 0 - updateState $ \st -> st{ stateMaxNestingLevel = stateMaxNestingLevel st - 1 } - res <- p - updateState $ \st -> st{ stateMaxNestingLevel = nestlevel } - return res - emph :: Parser [Char] ParserState (F Inlines) emph = fmap B.emph <$> nested (inlinesBetween starStart starEnd <|> inlinesBetween ulStart ulEnd) diff --git a/src/Text/Pandoc/Readers/MediaWiki.hs b/src/Text/Pandoc/Readers/MediaWiki.hs new file mode 100644 index 000000000..7936be38b --- /dev/null +++ b/src/Text/Pandoc/Readers/MediaWiki.hs @@ -0,0 +1,594 @@ +{-# LANGUAGE RelaxedPolyRec #-} -- needed for inlinesBetween on GHC < 7 +{- + Copyright (C) 2012 John MacFarlane <jgm@berkeley.edu> + +This program is free software; you can redistribute it and/or modify +it under the terms of the GNU General Public License as published by +the Free Software Foundation; either version 2 of the License, or +(at your option) any later version. + +This program is distributed in the hope that it will be useful, +but WITHOUT ANY WARRANTY; without even the implied warranty of +MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the +GNU General Public License for more details. + +You should have received a copy of the GNU General Public License +along with this program; if not, write to the Free Software +Foundation, Inc., 59 Temple Place, Suite 330, Boston, MA 02111-1307 USA +-} + +{- | + Module : Text.Pandoc.Readers.MediaWiki + Copyright : Copyright (C) 2012 John MacFarlane + License : GNU GPL, version 2 or above + + Maintainer : John MacFarlane <jgm@berkeley.edu> + Stability : alpha + Portability : portable + +Conversion of mediawiki text to 'Pandoc' document. +-} +{- +TODO: +_ correctly handle tables within tables +_ parse templates? +-} +module Text.Pandoc.Readers.MediaWiki ( readMediaWiki ) where + +import Text.Pandoc.Definition +import qualified Text.Pandoc.Builder as B +import Text.Pandoc.Builder (Inlines, Blocks, trimInlines, (<>)) +import Text.Pandoc.Options +import Text.Pandoc.Readers.HTML ( htmlTag, isBlockTag, isCommentTag ) +import Text.Pandoc.XML ( fromEntities ) +import Text.Pandoc.Parsing hiding ( nested ) +import Text.Pandoc.Generic ( bottomUp ) +import Text.Pandoc.Shared ( stripTrailingNewlines, safeRead ) +import Data.Monoid (mconcat, mempty) +import Control.Applicative ((<$>), (<*), (*>), (<$)) +import Control.Monad +import Data.List (intersperse, intercalate, isPrefixOf ) +import Text.HTML.TagSoup +import Data.Sequence (viewl, ViewL(..), (<|)) +import Data.Char (isDigit) + +-- | Read mediawiki from an input string and return a Pandoc document. +readMediaWiki :: ReaderOptions -- ^ Reader options + -> String -- ^ String to parse (assuming @'\n'@ line endings) + -> Pandoc +readMediaWiki opts s = + case runParser parseMediaWiki MWState{ mwOptions = opts + , mwMaxNestingLevel = 4 + , mwNextLinkNumber = 1 + , mwCategoryLinks = [] + } + "source" (s ++ "\n") of + Left err' -> error $ "\nError:\n" ++ show err' + Right result -> result + +data MWState = MWState { mwOptions :: ReaderOptions + , mwMaxNestingLevel :: Int + , mwNextLinkNumber :: Int + , mwCategoryLinks :: [Inlines] + } + +type MWParser = Parser [Char] MWState + +-- +-- auxiliary functions +-- + +-- This is used to prevent exponential blowups for things like: +-- ''a'''a''a'''a''a'''a''a'''a +nested :: MWParser a -> MWParser a +nested p = do + nestlevel <- mwMaxNestingLevel `fmap` getState + guard $ nestlevel > 0 + updateState $ \st -> st{ mwMaxNestingLevel = mwMaxNestingLevel st - 1 } + res <- p + updateState $ \st -> st{ mwMaxNestingLevel = nestlevel } + return res + +specialChars :: [Char] +specialChars = "'[]<=&*{}|\"" + +spaceChars :: [Char] +spaceChars = " \n\t" + +sym :: String -> MWParser () +sym s = () <$ try (string s) + +newBlockTags :: [String] +newBlockTags = ["haskell","syntaxhighlight","source","gallery","references"] + +isBlockTag' :: Tag String -> Bool +isBlockTag' tag@(TagOpen t _) = (isBlockTag tag || t `elem` newBlockTags) && + t `notElem` eitherBlockOrInline +isBlockTag' tag@(TagClose t) = (isBlockTag tag || t `elem` newBlockTags) && + t `notElem` eitherBlockOrInline +isBlockTag' tag = isBlockTag tag + +isInlineTag' :: Tag String -> Bool +isInlineTag' (TagComment _) = True +isInlineTag' t = not (isBlockTag' t) + +eitherBlockOrInline :: [String] +eitherBlockOrInline = ["applet", "button", "del", "iframe", "ins", + "map", "area", "object"] + +htmlComment :: MWParser () +htmlComment = () <$ htmlTag isCommentTag + +inlinesInTags :: String -> MWParser Inlines +inlinesInTags tag = try $ do + (_,raw) <- htmlTag (~== TagOpen tag []) + if '/' `elem` raw -- self-closing tag + then return mempty + else trimInlines . mconcat <$> + manyTill inline (htmlTag (~== TagClose tag)) + +blocksInTags :: String -> MWParser Blocks +blocksInTags tag = try $ do + (_,raw) <- htmlTag (~== TagOpen tag []) + if '/' `elem` raw -- self-closing tag + then return mempty + else mconcat <$> manyTill block (htmlTag (~== TagClose tag)) + +charsInTags :: String -> MWParser [Char] +charsInTags tag = try $ do + (_,raw) <- htmlTag (~== TagOpen tag []) + if '/' `elem` raw -- self-closing tag + then return "" + else innerText . parseTags <$> + manyTill anyChar (htmlTag (~== TagClose tag)) + +-- +-- main parser +-- + +parseMediaWiki :: MWParser Pandoc +parseMediaWiki = do + bs <- mconcat <$> many block + spaces + eof + categoryLinks <- reverse . mwCategoryLinks <$> getState + let categories = if null categoryLinks + then mempty + else B.para $ mconcat $ intersperse B.space categoryLinks + return $ B.doc $ bs <> categories + +-- +-- block parsers +-- + +block :: MWParser Blocks +block = mempty <$ skipMany1 blankline + <|> table + <|> header + <|> hrule + <|> orderedList + <|> bulletList + <|> definitionList + <|> mempty <$ try (spaces *> htmlComment) + <|> preformatted + <|> blockTag + <|> (B.rawBlock "mediawiki" <$> template) + <|> para + +para :: MWParser Blocks +para = B.para . trimInlines . mconcat <$> many1 inline + +table :: MWParser Blocks +table = do + tableStart + styles <- manyTill anyChar newline + let tableWidth = case lookup "width" $ parseAttrs styles of + Just w -> maybe 1.0 id $ parseWidth w + Nothing -> 1.0 + caption <- option mempty tableCaption + optional rowsep + hasheader <- option False $ True <$ (lookAhead (char '!')) + (cellspecs',hdr) <- unzip <$> tableRow + let widths = map ((tableWidth *) . snd) cellspecs' + let restwidth = tableWidth - sum widths + let zerocols = length $ filter (==0.0) widths + let defaultwidth = if zerocols == 0 || zerocols == length widths + then 0.0 + else restwidth / fromIntegral zerocols + let widths' = map (\w -> if w == 0 then defaultwidth else w) widths + let cellspecs = zip (map fst cellspecs') widths' + rows' <- many $ try $ rowsep *> (map snd <$> tableRow) + tableEnd + let cols = length hdr + let (headers,rows) = if hasheader + then (hdr, rows') + else (replicate cols mempty, hdr:rows') + return $ B.table caption cellspecs headers rows + +parseAttrs :: String -> [(String,String)] +parseAttrs s = case parse (many parseAttr) "attributes" s of + Right r -> r + Left _ -> [] + +parseAttr :: Parser String () (String, String) +parseAttr = try $ do + skipMany spaceChar + k <- many1 letter + char '=' + char '"' + v <- many1Till anyChar (char '"') + return (k,v) + +tableStart :: MWParser () +tableStart = try $ guardColumnOne *> sym "{|" + +tableEnd :: MWParser () +tableEnd = try $ guardColumnOne *> sym "|}" <* blanklines + +rowsep :: MWParser () +rowsep = try $ guardColumnOne *> sym "|-" <* blanklines + +cellsep :: MWParser () +cellsep = try $ + (guardColumnOne <* + ( (char '|' <* notFollowedBy (oneOf "-}+")) + <|> (char '!') + ) + ) + <|> (() <$ try (string "||")) + <|> (() <$ try (string "!!")) + +tableCaption :: MWParser Inlines +tableCaption = try $ do + guardColumnOne + sym "|+" + skipMany spaceChar + res <- manyTill anyChar newline >>= parseFromString (many inline) + return $ trimInlines $ mconcat res + +tableRow :: MWParser [((Alignment, Double), Blocks)] +tableRow = try $ many tableCell + +tableCell :: MWParser ((Alignment, Double), Blocks) +tableCell = try $ do + cellsep + skipMany spaceChar + attrs <- option [] $ try $ parseAttrs <$> + manyTill (satisfy (/='\n')) (char '|' <* notFollowedBy (char '|')) + skipMany spaceChar + ls <- concat <$> many (notFollowedBy (cellsep <|> rowsep <|> tableEnd) *> + ((snd <$> withRaw table) <|> count 1 anyChar)) + bs <- parseFromString (mconcat <$> many block) ls + let align = case lookup "align" attrs of + Just "left" -> AlignLeft + Just "right" -> AlignRight + Just "center" -> AlignCenter + _ -> AlignDefault + let width = case lookup "width" attrs of + Just xs -> maybe 0.0 id $ parseWidth xs + Nothing -> 0.0 + return ((align, width), bs) + +parseWidth :: String -> Maybe Double +parseWidth s = + case reverse s of + ('%':ds) | all isDigit ds -> safeRead ('0':'.':reverse ds) + _ -> Nothing + +template :: MWParser String +template = try $ do + string "{{" + notFollowedBy (char '{') + let chunk = template <|> variable <|> many1 (noneOf "{}") <|> count 1 anyChar + contents <- manyTill chunk (try $ string "}}") + return $ "{{" ++ concat contents ++ "}}" + +blockTag :: MWParser Blocks +blockTag = do + (tag, _) <- lookAhead $ htmlTag isBlockTag' + case tag of + TagOpen "blockquote" _ -> B.blockQuote <$> blocksInTags "blockquote" + TagOpen "pre" _ -> B.codeBlock . trimCode <$> charsInTags "pre" + TagOpen "syntaxhighlight" attrs -> syntaxhighlight "syntaxhighlight" attrs + TagOpen "source" attrs -> syntaxhighlight "source" attrs + TagOpen "haskell" _ -> B.codeBlockWith ("",["haskell"],[]) . trimCode <$> + charsInTags "haskell" + TagOpen "gallery" _ -> blocksInTags "gallery" + TagOpen "p" _ -> mempty <$ htmlTag (~== tag) + TagClose "p" -> mempty <$ htmlTag (~== tag) + _ -> B.rawBlock "html" . snd <$> htmlTag (~== tag) + +trimCode :: String -> String +trimCode ('\n':xs) = stripTrailingNewlines xs +trimCode xs = stripTrailingNewlines xs + +syntaxhighlight :: String -> [Attribute String] -> MWParser Blocks +syntaxhighlight tag attrs = try $ do + let mblang = lookup "lang" attrs + let mbstart = lookup "start" attrs + let mbline = lookup "line" attrs + let classes = maybe [] (:[]) mblang ++ maybe [] (const ["numberLines"]) mbline + let kvs = maybe [] (\x -> [("startFrom",x)]) mbstart + contents <- charsInTags tag + return $ B.codeBlockWith ("",classes,kvs) $ trimCode contents + +hrule :: MWParser Blocks +hrule = B.horizontalRule <$ try (string "----" *> many (char '-') *> newline) + +guardColumnOne :: MWParser () +guardColumnOne = getPosition >>= \pos -> guard (sourceColumn pos == 1) + +preformatted :: MWParser Blocks +preformatted = try $ do + guardColumnOne + char ' ' + let endline' = B.linebreak <$ (try $ newline <* char ' ') + let whitespace' = B.str <$> many1 ('\160' <$ spaceChar) + let spToNbsp ' ' = '\160' + spToNbsp x = x + let nowiki' = mconcat . intersperse B.linebreak . map B.str . + lines . fromEntities . map spToNbsp <$> try + (htmlTag (~== TagOpen "nowiki" []) *> + manyTill anyChar (htmlTag (~== TagClose "nowiki"))) + let inline' = whitespace' <|> endline' <|> nowiki' <|> inline + let strToCode (Str s) = Code ("",[],[]) s + strToCode x = x + B.para . bottomUp strToCode . mconcat <$> many1 inline' + +header :: MWParser Blocks +header = try $ do + guardColumnOne + eqs <- many1 (char '=') + let lev = length eqs + guard $ lev <= 6 + contents <- trimInlines . mconcat <$> manyTill inline (count lev $ char '=') + return $ B.header lev contents + +bulletList :: MWParser Blocks +bulletList = B.bulletList <$> + ( many1 (listItem '*') + <|> (htmlTag (~== TagOpen "ul" []) *> spaces *> many (listItem '*' <|> li) <* + optional (htmlTag (~== TagClose "ul"))) ) + +orderedList :: MWParser Blocks +orderedList = + (B.orderedList <$> many1 (listItem '#')) + <|> (B.orderedList <$> (htmlTag (~== TagOpen "ul" []) *> spaces *> + many (listItem '#' <|> li) <* + optional (htmlTag (~== TagClose "ul")))) + <|> do (tag,_) <- htmlTag (~== TagOpen "ol" []) + spaces + items <- many (listItem '#' <|> li) + optional (htmlTag (~== TagClose "ol")) + let start = maybe 1 id $ safeRead $ fromAttrib "start" tag + return $ B.orderedListWith (start, DefaultStyle, DefaultDelim) items + +definitionList :: MWParser Blocks +definitionList = B.definitionList <$> many1 defListItem + +defListItem :: MWParser (Inlines, [Blocks]) +defListItem = try $ do + terms <- mconcat . intersperse B.linebreak <$> many defListTerm + -- we allow dd with no dt, or dt with no dd + defs <- if B.isNull terms + then many1 $ listItem ':' + else many $ listItem ':' + return (terms, defs) + +defListTerm :: MWParser Inlines +defListTerm = char ';' >> skipMany spaceChar >> manyTill anyChar newline >>= + parseFromString (trimInlines . mconcat <$> many inline) + +listStart :: Char -> MWParser () +listStart c = char c *> notFollowedBy listStartChar + +listStartChar :: MWParser Char +listStartChar = oneOf "*#;:" + +anyListStart :: MWParser Char +anyListStart = char '*' + <|> char '#' + <|> char ':' + <|> char ';' + +li :: MWParser Blocks +li = lookAhead (htmlTag (~== TagOpen "li" [])) *> + (firstParaToPlain <$> blocksInTags "li") <* spaces + +listItem :: Char -> MWParser Blocks +listItem c = try $ do + extras <- many (try $ char c <* lookAhead listStartChar) + if null extras + then listItem' c + else do + skipMany spaceChar + first <- concat <$> manyTill listChunk newline + rest <- many + (try $ string extras *> (concat <$> manyTill listChunk newline)) + contents <- parseFromString (many1 $ listItem' c) + (unlines (first : rest)) + case c of + '*' -> return $ B.bulletList contents + '#' -> return $ B.orderedList contents + ':' -> return $ B.definitionList [(mempty, contents)] + _ -> mzero + +-- The point of this is to handle stuff like +-- * {{cite book +-- | blah +-- | blah +-- }} +-- * next list item +-- which seems to be valid mediawiki. +listChunk :: MWParser String +listChunk = template <|> count 1 anyChar + +listItem' :: Char -> MWParser Blocks +listItem' c = try $ do + listStart c + skipMany spaceChar + first <- concat <$> manyTill listChunk newline + rest <- many (try $ char c *> lookAhead listStartChar *> + (concat <$> manyTill listChunk newline)) + parseFromString (firstParaToPlain . mconcat <$> many1 block) + $ unlines $ first : rest + +firstParaToPlain :: Blocks -> Blocks +firstParaToPlain contents = + case viewl (B.unMany contents) of + (Para xs) :< ys -> B.Many $ (Plain xs) <| ys + _ -> contents + +-- +-- inline parsers +-- + +inline :: MWParser Inlines +inline = whitespace + <|> url + <|> str + <|> doubleQuotes + <|> strong + <|> emph + <|> image + <|> internalLink + <|> externalLink + <|> inlineTag + <|> B.singleton <$> charRef + <|> inlineHtml + <|> (B.rawInline "mediawiki" <$> variable) + <|> (B.rawInline "mediawiki" <$> template) + <|> special + +str :: MWParser Inlines +str = B.str <$> many1 (noneOf $ specialChars ++ spaceChars) + +variable :: MWParser String +variable = try $ do + string "{{{" + contents <- manyTill anyChar (try $ string "}}}") + return $ "{{{" ++ contents ++ "}}}" + +inlineTag :: MWParser Inlines +inlineTag = do + (tag, _) <- lookAhead $ htmlTag isInlineTag' + case tag of + TagOpen "ref" _ -> B.note . B.plain <$> inlinesInTags "ref" + TagOpen "nowiki" _ -> try $ do + (_,raw) <- htmlTag (~== tag) + if '/' `elem` raw + then return mempty + else B.text . fromEntities <$> + manyTill anyChar (htmlTag (~== TagClose "nowiki")) + TagOpen "br" _ -> B.linebreak <$ (htmlTag (~== TagOpen "br" []) -- will get /> too + *> optional blankline) + TagOpen "strike" _ -> B.strikeout <$> inlinesInTags "strike" + TagOpen "del" _ -> B.strikeout <$> inlinesInTags "del" + TagOpen "sub" _ -> B.subscript <$> inlinesInTags "sub" + TagOpen "sup" _ -> B.superscript <$> inlinesInTags "sup" + TagOpen "math" _ -> B.math <$> charsInTags "math" + TagOpen "code" _ -> B.code <$> charsInTags "code" + TagOpen "tt" _ -> B.code <$> charsInTags "tt" + TagOpen "hask" _ -> B.codeWith ("",["haskell"],[]) <$> charsInTags "hask" + _ -> B.rawInline "html" . snd <$> htmlTag (~== tag) + +special :: MWParser Inlines +special = B.str <$> count 1 (notFollowedBy' (htmlTag isBlockTag') *> + oneOf specialChars) + +inlineHtml :: MWParser Inlines +inlineHtml = B.rawInline "html" . snd <$> htmlTag isInlineTag' + +whitespace :: MWParser Inlines +whitespace = B.space <$ (skipMany1 spaceChar <|> endline <|> htmlComment) + +endline :: MWParser () +endline = () <$ try (newline <* + notFollowedBy blankline <* + notFollowedBy' hrule <* + notFollowedBy tableStart <* + notFollowedBy' header <* + notFollowedBy anyListStart) + +image :: MWParser Inlines +image = try $ do + sym "[[" + sym "File:" + fname <- many1 (noneOf "|]") + _ <- many (try $ char '|' *> imageOption) + caption <- (B.str fname <$ sym "]]") + <|> try (char '|' *> (mconcat <$> manyTill inline (sym "]]"))) + return $ B.image fname "image" caption + +imageOption :: MWParser String +imageOption = + try (oneOfStrings [ "border", "thumbnail", "frameless" + , "thumb", "upright", "left", "right" + , "center", "none", "baseline", "sub" + , "super", "top", "text-top", "middle" + , "bottom", "text-bottom" ]) + <|> try (string "frame") + <|> try (many1 (oneOf "x0123456789") <* string "px") + <|> try (oneOfStrings ["link=","alt=","page=","class="] <* many (noneOf "|]")) + +internalLink :: MWParser Inlines +internalLink = try $ do + sym "[[" + let addUnderscores x = let (pref,suff) = break (=='#') x + in pref ++ intercalate "_" (words suff) + pagename <- unwords . words <$> many (noneOf "|]") + label <- option (B.text pagename) $ char '|' *> + ( (mconcat <$> many1 (notFollowedBy (char ']') *> inline)) + -- the "pipe trick" + -- [[Help:Contents|] -> "Contents" + <|> (return $ B.text $ drop 1 $ dropWhile (/=':') pagename) ) + sym "]]" + linktrail <- B.text <$> many letter + let link = B.link (addUnderscores pagename) "wikilink" (label <> linktrail) + if "Category:" `isPrefixOf` pagename + then do + updateState $ \st -> st{ mwCategoryLinks = link : mwCategoryLinks st } + return mempty + else return link + +externalLink :: MWParser Inlines +externalLink = try $ do + char '[' + (_, src) <- uri + lab <- try (trimInlines . mconcat <$> + (skipMany1 spaceChar *> manyTill inline (char ']'))) + <|> do char ']' + num <- mwNextLinkNumber <$> getState + updateState $ \st -> st{ mwNextLinkNumber = num + 1 } + return $ B.str $ show num + return $ B.link src "" lab + +url :: MWParser Inlines +url = do + (orig, src) <- uri + return $ B.link src "" (B.str orig) + +-- | Parses a list of inlines between start and end delimiters. +inlinesBetween :: (Show b) => MWParser a -> MWParser b -> MWParser Inlines +inlinesBetween start end = + (trimInlines . mconcat) <$> try (start >> many1Till inner end) + where inner = innerSpace <|> (notFollowedBy' (() <$ whitespace) >> inline) + innerSpace = try $ whitespace >>~ notFollowedBy' end + +emph :: MWParser Inlines +emph = B.emph <$> nested (inlinesBetween start end) + where start = sym "''" >> lookAhead nonspaceChar + end = try $ notFollowedBy' (() <$ strong) >> sym "''" + +strong :: MWParser Inlines +strong = B.strong <$> nested (inlinesBetween start end) + where start = sym "'''" >> lookAhead nonspaceChar + end = try $ sym "'''" + +doubleQuotes :: MWParser Inlines +doubleQuotes = B.doubleQuoted . trimInlines . mconcat <$> try + ((getState >>= guard . readerSmart . mwOptions) *> + openDoubleQuote *> manyTill inline closeDoubleQuote ) + where openDoubleQuote = char '"' <* lookAhead alphaNum + closeDoubleQuote = char '"' <* notFollowedBy alphaNum + diff --git a/src/Text/Pandoc/Readers/RST.hs b/src/Text/Pandoc/Readers/RST.hs index a26b1623d..74653efcf 100644 --- a/src/Text/Pandoc/Readers/RST.hs +++ b/src/Text/Pandoc/Readers/RST.hs @@ -757,7 +757,7 @@ simpleTableHeader headless = try $ do rawContent <- if headless then return "" else simpleTableSep '=' >> anyLine - dashes <- simpleDashedLines '=' + dashes <- simpleDashedLines '=' <|> simpleDashedLines '-' newline let lines' = map snd dashes let indices = scanl (+) 0 lines' diff --git a/src/Text/Pandoc/Readers/Textile.hs b/src/Text/Pandoc/Readers/Textile.hs index 89f281ae8..dc95d9a56 100644 --- a/src/Text/Pandoc/Readers/Textile.hs +++ b/src/Text/Pandoc/Readers/Textile.hs @@ -47,7 +47,6 @@ Left to be implemented: TODO : refactor common patterns across readers : - autolink - - smartPunctuation - more ... -} @@ -62,6 +61,7 @@ import Text.Pandoc.Parsing import Text.Pandoc.Readers.HTML ( htmlTag, isInlineTag, isBlockTag ) import Text.Pandoc.Readers.LaTeX ( rawLaTeXInline, rawLaTeXBlock ) import Text.HTML.TagSoup.Match +import Data.List ( intercalate ) import Data.Char ( digitToInt, isUpper ) import Control.Monad ( guard, liftM ) import Control.Applicative ((<$>), (*>), (<*)) @@ -412,7 +412,7 @@ note = try $ do -- | Special chars markupChars :: [Char] -markupChars = "\\[]*#_@~-+^|%=" +markupChars = "\\*#_@~-+^|%=[]" -- | Break strings on following chars. Space tab and newline break for -- inlines breaking. Open paren breaks for mark. Quote, dash and dot @@ -427,13 +427,15 @@ wordBoundaries = markupChars ++ stringBreakers -- | Parse a hyphened sequence of words hyphenedWords :: Parser [Char] ParserState String -hyphenedWords = try $ do +hyphenedWords = intercalate "-" <$> sepBy1 wordChunk (char '-') + +wordChunk :: Parser [Char] ParserState String +wordChunk = try $ do hd <- noneOf wordBoundaries tl <- many ( (noneOf wordBoundaries) <|> - try (oneOf markupChars <* lookAhead (noneOf wordBoundaries) ) ) - let wd = hd:tl - option wd $ try $ - (\r -> concat [wd, "-", r]) <$> (char '-' *> hyphenedWords) + try (notFollowedBy' note *> oneOf markupChars + <* lookAhead (noneOf wordBoundaries) ) ) + return $ hd:tl -- | Any string str :: Parser [Char] ParserState Inline diff --git a/src/Text/Pandoc/Writers/Docbook.hs b/src/Text/Pandoc/Writers/Docbook.hs index e696fc63e..a38f57074 100644 --- a/src/Text/Pandoc/Writers/Docbook.hs +++ b/src/Text/Pandoc/Writers/Docbook.hs @@ -103,7 +103,7 @@ elementToDocbook opts lvl (Sec _ _num id' title elements) = n | n == 0 -> "chapter" | n >= 1 && n <= 5 -> "sect" ++ show n | otherwise -> "simplesect" - in inTags True tag [("id",id')] $ + in inTags True tag [("id", writerIdentifierPrefix opts ++ id')] $ inTagsSimple "title" (inlinesToDocbook opts title) $$ vcat (map (elementToDocbook opts (lvl + 1)) elements') diff --git a/src/Text/Pandoc/Writers/Docx.hs b/src/Text/Pandoc/Writers/Docx.hs index 05c9555c6..84bf95dfb 100644 --- a/src/Text/Pandoc/Writers/Docx.hs +++ b/src/Text/Pandoc/Writers/Docx.hs @@ -60,7 +60,7 @@ data WriterState = WriterState{ , stExternalLinks :: M.Map String String , stImages :: M.Map FilePath (String, B.ByteString) , stListLevel :: Int - , stListMarker :: ListMarker + , stListNumId :: Int , stNumStyles :: M.Map ListMarker Int , stLists :: [ListMarker] } @@ -79,7 +79,7 @@ defaultWriterState = WriterState{ , stExternalLinks = M.empty , stImages = M.empty , stListLevel = -1 - , stListMarker = NoMarker + , stListNumId = 1 , stNumStyles = M.fromList [(NoMarker, 0)] , stLists = [NoMarker] } @@ -285,6 +285,9 @@ mkLvl marker lvl = patternFor TwoParens s = "(" ++ s ++ ")" patternFor _ s = s ++ "." +getNumId :: WS Int +getNumId = length `fmap` gets stLists + -- | Convert Pandoc document to string in OpenXML format. writeOpenXML :: WriterOptions -> Pandoc -> WS Element writeOpenXML opts (Pandoc (Meta tit auths dat) blocks) = do @@ -402,11 +405,13 @@ blockToOpenXML opts (Table caption aligns widths headers rows) = do blockToOpenXML opts (BulletList lst) = do let marker = BulletMarker addList marker - asList $ concat `fmap` mapM (listItemToOpenXML opts marker) lst + numid <- getNumId + asList $ concat `fmap` mapM (listItemToOpenXML opts numid) lst blockToOpenXML opts (OrderedList (start, numstyle, numdelim) lst) = do let marker = NumberMarker numstyle numdelim start addList marker - asList $ concat `fmap` mapM (listItemToOpenXML opts marker) lst + numid <- getNumId + asList $ concat `fmap` mapM (listItemToOpenXML opts numid) lst blockToOpenXML opts (DefinitionList items) = concat `fmap` mapM (definitionListItemToOpenXML opts) items @@ -418,9 +423,6 @@ definitionListItemToOpenXML opts (term,defs) = do $ concat `fmap` mapM (blocksToOpenXML opts) defs return $ term' ++ defs' -getNumId :: WS Int -getNumId = length `fmap` gets stLists - addList :: ListMarker -> WS () addList marker = do lists <- gets stLists @@ -431,11 +433,11 @@ addList marker = do Nothing -> modify $ \st -> st{ stNumStyles = M.insert marker (M.size numStyles + 1) numStyles } -listItemToOpenXML :: WriterOptions -> ListMarker -> [Block] -> WS [Element] +listItemToOpenXML :: WriterOptions -> Int -> [Block] -> WS [Element] listItemToOpenXML _ _ [] = return [] -listItemToOpenXML opts marker (first:rest) = do - first' <- withMarker marker $ blockToOpenXML opts first - rest' <- withMarker NoMarker $ blocksToOpenXML opts rest +listItemToOpenXML opts numid (first:rest) = do + first' <- withNumId numid $ blockToOpenXML opts first + rest' <- withNumId 1 $ blocksToOpenXML opts rest return $ first' ++ rest' alignmentToString :: Alignment -> [Char] @@ -449,12 +451,12 @@ alignmentToString alignment = case alignment of inlinesToOpenXML :: WriterOptions -> [Inline] -> WS [Element] inlinesToOpenXML opts lst = concat `fmap` mapM (inlineToOpenXML opts) lst -withMarker :: ListMarker -> WS a -> WS a -withMarker m p = do - origMarker <- gets stListMarker - modify $ \st -> st{ stListMarker = m } +withNumId :: Int -> WS a -> WS a +withNumId numid p = do + origNumId <- gets stListNumId + modify $ \st -> st{ stListNumId = numid } result <- p - modify $ \st -> st{ stListMarker = origMarker } + modify $ \st -> st{ stListNumId = origNumId } return result asList :: WS a -> WS a @@ -489,10 +491,7 @@ getParaProps :: WS [Element] getParaProps = do props <- gets stParaProperties listLevel <- gets stListLevel - listMarker <- gets stListMarker - numid <- case listMarker of - NoMarker -> return 1 - _ -> getNumId + numid <- gets stListNumId let listPr = if listLevel >= 0 then [ mknode "w:numPr" [] [ mknode "w:numId" [("w:val",show numid)] () diff --git a/src/Text/Pandoc/Writers/EPUB.hs b/src/Text/Pandoc/Writers/EPUB.hs index b6527c6c8..6f8931caa 100644 --- a/src/Text/Pandoc/Writers/EPUB.hs +++ b/src/Text/Pandoc/Writers/EPUB.hs @@ -32,6 +32,7 @@ import Data.IORef import Data.Maybe ( fromMaybe, isNothing ) import Data.List ( findIndices, isPrefixOf ) import System.Environment ( getEnv ) +import Text.Printf (printf) import System.FilePath ( (</>), (<.>), takeBaseName, takeExtension, takeFileName ) import qualified Data.ByteString.Lazy as B import Data.ByteString.Lazy.UTF8 ( fromString ) @@ -122,8 +123,9 @@ writeEPUB opts doc@(Pandoc meta _) = do let chapters = map titleize chunks let chapToHtml = writeHtmlString opts'{ writerTemplate = pageTemplate } let chapterToEntry :: Int -> Pandoc -> Entry - chapterToEntry num chap = mkEntry ("ch" ++ show num ++ ".xhtml") $ - fromString $ chapToHtml chap + chapterToEntry num chap = mkEntry + (showChapter num) $ + fromString $ chapToHtml chap let chapterEntries = zipWith chapterToEntry [1..] chapters -- contents.opf @@ -334,11 +336,15 @@ data IdentState = IdentState{ identTable :: [(String,String)] } deriving (Read, Show) +-- Returns filename for chapter number. +showChapter :: Int -> String +showChapter = printf "ch%03d.xhtml" + -- Go through a block list and construct a table -- correlating the automatically constructed references -- that would be used in a normal pandoc document with -- new URLs to be used in the EPUB. For example, what --- was "header-1" might turn into "ch6.xhtml#header". +-- was "header-1" might turn into "ch006.xhtml#header". correlateRefs :: [Block] -> [(String,String)] correlateRefs bs = identTable $ execState (mapM_ go bs) IdentState{ chapterNumber = 0 @@ -358,8 +364,9 @@ correlateRefs bs = identTable $ execState (mapM_ go bs) modify $ \s -> s{ runningIdents = runningid : runningIdents st , chapterIdents = maybe (chapterIdents st) (: chapterIdents st) chapid - , identTable = (runningid, "ch" ++ show (chapterNumber st) ++ - ".xhtml" ++ maybe "" ('#':) chapid) : identTable st + , identTable = (runningid, + showChapter (chapterNumber st) ++ + maybe "" ('#':) chapid) : identTable st } go _ = return () diff --git a/src/Text/Pandoc/Writers/HTML.hs b/src/Text/Pandoc/Writers/HTML.hs index c6c4a8fd7..ebb705a61 100644 --- a/src/Text/Pandoc/Writers/HTML.hs +++ b/src/Text/Pandoc/Writers/HTML.hs @@ -482,10 +482,13 @@ blockToHtml opts (OrderedList (startnum, numstyle, _) lst) = do return $ foldl (!) (ordList opts contents) attribs blockToHtml opts (DefinitionList lst) = do contents <- mapM (\(term, defs) -> - do term' <- liftM (H.dt) $ inlineListToHtml opts term + do term' <- if null term + then return mempty + else liftM (H.dt) $ inlineListToHtml opts term defs' <- mapM ((liftM (\x -> H.dd $ (x >> nl opts))) . blockListToHtml opts) defs - return $ mconcat $ nl opts : term' : nl opts : defs') lst + return $ mconcat $ nl opts : term' : nl opts : + intersperse (nl opts) defs') lst let lst' = H.dl $ mconcat contents >> nl opts let lst'' = if writerIncremental opts then lst' ! A.class_ "incremental" diff --git a/src/Text/Pandoc/Writers/LaTeX.hs b/src/Text/Pandoc/Writers/LaTeX.hs index abbbd4d01..2b5c7e84b 100644 --- a/src/Text/Pandoc/Writers/LaTeX.hs +++ b/src/Text/Pandoc/Writers/LaTeX.hs @@ -49,8 +49,7 @@ import Text.Pandoc.Highlighting (highlight, styleToLaTeX, data WriterState = WriterState { stInNote :: Bool -- true if we're in a note , stInTable :: Bool -- true if we're in a table - , stTableNotes :: [(Char, Doc)] -- List of markers, notes - -- in current table + , stTableNotes :: [Doc] -- List of notes in current table , stOLLevel :: Int -- level of ordered list nesting , stOptions :: WriterOptions -- writer options, so they don't have to be parameter , stVerbInNote :: Bool -- true if document has verbatim text in note @@ -190,7 +189,7 @@ stringToLaTeX isUrl (x:xs) = do '$' -> "\\$" ++ rest '%' -> "\\%" ++ rest '&' -> "\\&" ++ rest - '_' -> "\\_" ++ rest + '_' | not isUrl -> "\\_" ++ rest '#' -> "\\#" ++ rest '-' -> case xs of -- prevent adjacent hyphens from forming ligatures ('-':_) -> "-{}" ++ rest @@ -382,27 +381,27 @@ blockToLaTeX (Table caption aligns widths heads rows) = do modify $ \s -> s{ stInTable = True, stTableNotes = [] } headers <- if all null heads then return empty - else liftM ($$ "\\ML") - $ (tableRowToLaTeX True aligns widths) heads + else ($$ "\\hline\\noalign{\\medskip}") `fmap` + (tableRowToLaTeX True aligns widths) heads captionText <- inlineListToLaTeX caption let capt = if isEmpty captionText then empty - else text "caption = {" <> captionText <> "}," <> space + else text "\\noalign{\\medskip}" + $$ text "\\caption" <> braces captionText rows' <- mapM (tableRowToLaTeX False aligns widths) rows - let rows'' = intersperse ("\\\\\\noalign{\\medskip}") rows' tableNotes <- liftM (reverse . stTableNotes) get - let toNote (marker, x) = "\\tnote" <> brackets (char marker) <> - braces (nest 2 x) + let toNote x = "\\footnotetext" <> braces (nest 2 x) let notes = vcat $ map toNote tableNotes let colDescriptors = text $ concat $ map toColDescriptor aligns - let tableBody = - ("\\ctable" <> brackets (capt <> text "pos = H, center, botcap")) - <> braces colDescriptors - $$ braces ("% notes" <> cr <> notes <> cr) - $$ braces (text "% rows" $$ "\\FL" $$ - vcat (headers : rows'') $$ "\\LL" <> cr) modify $ \s -> s{ stTable = True, stInTable = False, stTableNotes = [] } - return $ tableBody + return $ "\\begin{longtable}[c]" <> braces colDescriptors + $$ "\\hline\\noalign{\\medskip}" + $$ headers + $$ vcat rows' + $$ "\\hline" + $$ capt + $$ notes + $$ "\\end{longtable}" toColDescriptor :: Alignment -> String toColDescriptor align = @@ -433,7 +432,7 @@ tableRowToLaTeX header aligns widths cols = do braces (text (printf "%.2f\\columnwidth" w)) <> braces (halign a <> cr <> c <> cr) let cells = zipWith3 toCell widths aligns renderedCells - return $ hcat $ intersperse (" & ") cells + return $ hsep (intersperse "&" cells) $$ "\\\\\\noalign{\\medskip}" listItemToLaTeX :: [Block] -> State WriterState Doc listItemToLaTeX lst = blockListToLaTeX lst >>= return . (text "\\item" $$) . @@ -572,7 +571,7 @@ inlineToLaTeX (LineBreak) = return "\\\\" inlineToLaTeX Space = return space inlineToLaTeX (Link txt ('#':ident, _)) = do contents <- inlineListToLaTeX txt - ident' <- stringToLaTeX False ident + ident' <- stringToLaTeX True ident return $ text "\\hyperref" <> brackets (text ident') <> braces contents inlineToLaTeX (Link txt (src, _)) = case txt of @@ -600,9 +599,8 @@ inlineToLaTeX (Note contents) = do if inTable then do curnotes <- liftM stTableNotes get - let marker = cycle ['a'..'z'] !! length curnotes - modify $ \s -> s{ stTableNotes = (marker, contents') : curnotes } - return $ "\\tmark" <> brackets (char marker) <> space + modify $ \s -> s{ stTableNotes = contents' : curnotes } + return $ "\\footnotemark" <> space else return $ "\\footnote" <> braces (nest 2 contents' <> optnl) -- note: a \n before } needed when note ends with a Verbatim environment diff --git a/src/Text/Pandoc/Writers/Markdown.hs b/src/Text/Pandoc/Writers/Markdown.hs index d88419feb..1a0731710 100644 --- a/src/Text/Pandoc/Writers/Markdown.hs +++ b/src/Text/Pandoc/Writers/Markdown.hs @@ -54,9 +54,12 @@ data WriterState = WriterState { stNotes :: Notes -- | Convert Pandoc to Markdown. writeMarkdown :: WriterOptions -> Pandoc -> String writeMarkdown opts document = - evalState (pandocToMarkdown opts document) WriterState{ stNotes = [] - , stRefs = [] - , stPlain = False } + evalState (pandocToMarkdown opts{ + writerWrapText = writerWrapText opts && + not (isEnabled Ext_hard_line_breaks opts) } + document) WriterState{ stNotes = [] + , stRefs = [] + , stPlain = False } -- | Convert Pandoc to plain text (like markdown, but without links, -- pictures, or inline formatting). @@ -588,8 +591,9 @@ inlineToMarkdown opts (RawInline f str) return $ text str inlineToMarkdown _ (RawInline _ _) = return empty inlineToMarkdown opts (LineBreak) + | isEnabled Ext_hard_line_breaks opts = return cr | isEnabled Ext_escaped_line_breaks opts = return $ "\\" <> cr - | otherwise = return $ " " <> cr + | otherwise = return $ " " <> cr inlineToMarkdown _ Space = return space inlineToMarkdown opts (Cite (c:cs) lst) | writerCiteMethod opts == Citeproc = inlineListToMarkdown opts lst diff --git a/src/pandoc.hs b/src/pandoc.hs index 63a0df51a..cb561e817 100644 --- a/src/pandoc.hs +++ b/src/pandoc.hs @@ -653,7 +653,7 @@ options = (\arg opt -> do let url' = case arg of Just u -> u - Nothing -> "https://d3eoax9i5htok0.cloudfront.net/mathjax/latest/MathJax.js?config=TeX-AMS-MML_HTMLorMML" + Nothing -> "http://cdn.mathjax.org/mathjax/latest/MathJax.js?config=TeX-AMS-MML_HTMLorMML" return opt { optHTMLMathMethod = MathJax url'}) "URL") "" -- "Use MathJax for HTML math" @@ -721,6 +721,7 @@ defaultReaderName fallback (x:xs) = ".rst" -> "rst" ".lhs" -> "markdown+lhs" ".db" -> "docbook" + ".wiki" -> "mediawiki" ".textile" -> "textile" ".native" -> "native" ".json" -> "json" diff --git a/templates b/templates -Subproject 150f3ce9c67a2b4c948fcdc5793344bf8b53072 +Subproject 22628ea9aa4e65ac11d489f22be5c765b510760 diff --git a/tests/Tests/Old.hs b/tests/Tests/Old.hs index 8899fef6f..5360126c2 100644 --- a/tests/Tests/Old.hs +++ b/tests/Tests/Old.hs @@ -119,9 +119,14 @@ tests = [ testGroup "markdown" , fb2WriterTest "math" [] "fb2.math.markdown" "fb2.math.fb2" , fb2WriterTest "testsuite" [] "testsuite.native" "writer.fb2" ] + , testGroup "mediawiki" + [ testGroup "writer" $ writerTests "mediawiki" + , test "reader" ["-r", "mediawiki", "-w", "native", "-s"] + "mediawiki-reader.wiki" "mediawiki-reader.native" + ] , testGroup "other writers" $ map (\f -> testGroup f $ writerTests f) [ "opendocument" , "context" , "texinfo" - , "man" , "plain" , "mediawiki", "rtf", "org", "asciidoc" + , "man" , "plain" , "rtf", "org", "asciidoc" ] ] diff --git a/tests/latex-reader.latex b/tests/latex-reader.latex index cd424baec..dd7854eb1 100644 --- a/tests/latex-reader.latex +++ b/tests/latex-reader.latex @@ -155,6 +155,11 @@ And: These should not be escaped: \$ \\ \> \[ \{ \end{verbatim} + +\begin{obeylines} +this has \emph{two +lines} +\end{obeylines} \begin{center}\rule{3in}{0.4pt}\end{center} \section{Lists} diff --git a/tests/latex-reader.native b/tests/latex-reader.native index 2873529ae..b7cf39bc3 100644 --- a/tests/latex-reader.native +++ b/tests/latex-reader.native @@ -57,6 +57,7 @@ Pandoc (Meta {docTitle = [Str "Pandoc",Space,Str "Test",Space,Str "Suite"], docA ,CodeBlock ("",[],[]) "---- (should be four hyphens)\n\nsub status {\n print \"working\";\n}\n\nthis code block is indented by one tab" ,Para [Str "And:"] ,CodeBlock ("",[],[]) " this code block is indented by two tabs\n\nThese should not be escaped: \\$ \\\\ \\> \\[ \\{" +,Para [Str "this",Space,Str "has",Space,Emph [Str "two",LineBreak,Str "lines"]] ,HorizontalRule ,Header 1 [Str "Lists"] ,Header 2 [Str "Unordered"] @@ -301,7 +302,7 @@ Pandoc (Meta {docTitle = [Str "Pandoc",Space,Str "Test",Space,Str "Suite"], docA ,Para [Str "4",Space,Str "<",Space,Str "5."] ,Para [Str "6",Space,Str ">",Space,Str "5."] ,Para [Str "Backslash:",Space,Str "\\"] -,Para [Str "Backtick:",Space,Str "`"] +,Para [Str "Backtick:",Space,Str "\8216"] ,Para [Str "Asterisk:",Space,Str "*"] ,Para [Str "Underscore:",Space,Str "_"] ,Para [Str "Left",Space,Str "brace:",Space,Str "{"] diff --git a/tests/mediawiki-reader.native b/tests/mediawiki-reader.native new file mode 100644 index 000000000..122b52545 --- /dev/null +++ b/tests/mediawiki-reader.native @@ -0,0 +1,241 @@ +Pandoc (Meta {docTitle = [], docAuthors = [], docDate = []}) +[Header 1 [Str "header"] +,Header 2 [Str "header",Space,Str "level",Space,Str "two"] +,Header 3 [Str "header",Space,Str "level",Space,Str "3"] +,Header 4 [Str "header",Space,Emph [Str "level"],Space,Str "four"] +,Header 5 [Str "header",Space,Str "level",Space,Str "5"] +,Header 6 [Str "header",Space,Str "level",Space,Str "6"] +,Para [Str "=======",Space,Str "not",Space,Str "a",Space,Str "header",Space,Str "========"] +,Para [Code ("",[],[]) "==\160not\160a\160header\160=="] +,Header 2 [Str "emph",Space,Str "and",Space,Str "strong"] +,Para [Emph [Str "emph"],Space,Strong [Str "strong"]] +,Para [Strong [Emph [Str "strong",Space,Str "and",Space,Str "emph"]]] +,Para [Strong [Emph [Str "emph",Space,Str "inside"],Space,Str "strong"]] +,Para [Strong [Str "strong",Space,Str "with",Space,Emph [Str "emph"]]] +,Para [Emph [Strong [Str "strong",Space,Str "inside"],Space,Str "emph"]] +,Header 2 [Str "horizontal",Space,Str "rule"] +,Para [Str "top"] +,HorizontalRule +,Para [Str "bottom"] +,HorizontalRule +,Header 2 [Str "nowiki"] +,Para [Str "''not",Space,Str "emph''"] +,Header 2 [Str "strikeout"] +,Para [Strikeout [Str "This",Space,Str "is",Space,Emph [Str "struck",Space,Str "out"]]] +,Header 2 [Str "entities"] +,Para [Str "hi",Space,Str "&",Space,Str "low"] +,Para [Str "hi",Space,Str "&",Space,Str "low"] +,Para [Str "G\246del"] +,Para [Str "\777\2730"] +,Header 2 [Str "comments"] +,Para [Str "inline",Space,Str "comment"] +,Para [Str "between",Space,Str "blocks"] +,Header 2 [Str "linebreaks"] +,Para [Str "hi",LineBreak,Str "there"] +,Para [Str "hi",LineBreak,Str "there"] +,Header 2 [Str ":",Space,Str "indents"] +,Para [Str "hi"] +,DefinitionList + [([], + [[Plain [Str "there"]]])] +,Para [Str "bud"] +,Para [Str "hi"] +,DefinitionList + [([], + [[DefinitionList + [([], + [[Plain [Str "there"]]])]]])] +,Para [Str "bud"] +,Header 2 [Str "p",Space,Str "tags"] +,Para [Str "hi",Space,Str "there"] +,Para [Str "bud"] +,Para [Str "another"] +,Header 2 [Str "raw",Space,Str "html"] +,Para [Str "hi",Space,RawInline "html" "<span style=\"color:red\">",Emph [Str "there"],RawInline "html" "</span>",Str "."] +,Para [RawInline "html" "<ins>",Str "inserted",RawInline "html" "</ins>"] +,RawBlock "html" "<div class=\"special\">" +,Para [Str "hi",Space,Emph [Str "there"]] +,RawBlock "html" "</div>" +,Header 2 [Str "sup,",Space,Str "sub,",Space,Str "del"] +,Para [Str "H",Subscript [Str "2"],Str "O",Space,Str "base",Superscript [Emph [Str "exponent"]],Space,Strikeout [Str "hello"]] +,Header 2 [Str "inline",Space,Str "code"] +,Para [Code ("",[],[]) "*\8594*",Space,Code ("",[],[]) "typed",Space,Code ("",["haskell"],[]) ">>="] +,Header 2 [Str "code",Space,Str "blocks"] +,CodeBlock ("",[],[]) "case xs of\n (_:_) -> reverse xs\n [] -> ['*']" +,CodeBlock ("",["haskell"],[]) "case xs of\n (_:_) -> reverse xs\n [] -> ['*']" +,CodeBlock ("",["ruby","numberLines"],[("startFrom","100")]) "widgets.each do |w|\n print w.price\nend" +,Header 2 [Str "block",Space,Str "quotes"] +,Para [Str "Regular",Space,Str "paragraph"] +,BlockQuote + [Para [Str "This",Space,Str "is",Space,Str "a",Space,Str "block",Space,Str "quote."] + ,Para [Str "With",Space,Str "two",Space,Str "paragraphs."]] +,Para [Str "Nother",Space,Str "paragraph."] +,Header 2 [Str "external",Space,Str "links"] +,Para [Link [Emph [Str "Google"],Space,Str "search",Space,Str "engine"] ("http://google.com","")] +,Para [Link [Str "http://johnmacfarlane.net/pandoc/"] ("http://johnmacfarlane.net/pandoc/","")] +,Para [Link [Str "1"] ("http://google.com",""),Space,Link [Str "2"] ("http://yahoo.com","")] +,Para [Link [Str "email",Space,Str "me"] ("mailto:info@example.org","")] +,Header 2 [Str "internal",Space,Str "links"] +,Para [Link [Str "Help"] ("Help","wikilink")] +,Para [Link [Str "the",Space,Str "help",Space,Str "page"] ("Help","wikilink")] +,Para [Link [Str "Helpers"] ("Help","wikilink")] +,Para [Link [Str "Help"] ("Help","wikilink"),Str "ers"] +,Para [Link [Str "Contents"] ("Help:Contents","wikilink")] +,Para [Link [Str "#My",Space,Str "anchor"] ("#My_anchor","wikilink")] +,Para [Link [Str "and",Space,Str "text"] ("Page#with_anchor","wikilink")] +,Header 2 [Str "images"] +,Para [Image [Str "caption"] ("example.jpg","image")] +,Para [Image [Str "the",Space,Emph [Str "caption"],Space,Str "with",Space,Link [Str "external",Space,Str "link"] ("http://google.com","")] ("example.jpg","image")] +,Para [Image [Str "caption"] ("example.jpg","image")] +,Para [Image [Str "example.jpg"] ("example.jpg","image")] +,Header 2 [Str "lists"] +,BulletList + [[Plain [Str "Start",Space,Str "each",Space,Str "line"]] + ,[Plain [Str "with",Space,Str "an",Space,Str "asterisk",Space,Str "(*)."] + ,BulletList + [[Plain [Str "More",Space,Str "asterisks",Space,Str "gives",Space,Str "deeper"] + ,BulletList + [[Plain [Str "and",Space,Str "deeper",Space,Str "levels."]]]]]] + ,[Plain [Str "Line",Space,Str "breaks",LineBreak,Str "don't",Space,Str "break",Space,Str "levels."] + ,BulletList + [[BulletList + [[Plain [Str "But",Space,Str "jumping",Space,Str "levels",Space,Str "creates",Space,Str "empty",Space,Str "space."]]]]]]] +,Para [Str "Any",Space,Str "other",Space,Str "start",Space,Str "ends",Space,Str "the",Space,Str "list."] +,OrderedList (1,DefaultStyle,DefaultDelim) + [[Plain [Str "Start",Space,Str "each",Space,Str "line"]] + ,[Plain [Str "with",Space,Str "a",Space,Str "number",Space,Str "sign",Space,Str "(#)."] + ,OrderedList (1,DefaultStyle,DefaultDelim) + [[Plain [Str "More",Space,Str "number",Space,Str "signs",Space,Str "gives",Space,Str "deeper"] + ,OrderedList (1,DefaultStyle,DefaultDelim) + [[Plain [Str "and",Space,Str "deeper"]] + ,[Plain [Str "levels."]]]]]] + ,[Plain [Str "Line",Space,Str "breaks",LineBreak,Str "don't",Space,Str "break",Space,Str "levels."] + ,OrderedList (1,DefaultStyle,DefaultDelim) + [[OrderedList (1,DefaultStyle,DefaultDelim) + [[Plain [Str "But",Space,Str "jumping",Space,Str "levels",Space,Str "creates",Space,Str "empty",Space,Str "space."]]]]]] + ,[Plain [Str "Blank",Space,Str "lines"]]] +,OrderedList (1,DefaultStyle,DefaultDelim) + [[Plain [Str "end",Space,Str "the",Space,Str "list",Space,Str "and",Space,Str "start",Space,Str "another."]]] +,Para [Str "Any",Space,Str "other",Space,Str "start",Space,Str "also",Space,Str "ends",Space,Str "the",Space,Str "list."] +,DefinitionList + [([Str "item",Space,Str "1"], + [[Plain [Str "definition",Space,Str "1"]]]) + ,([Str "item",Space,Str "2"], + [[Plain [Str "definition",Space,Str "2-1"]] + ,[Plain [Str "definition",Space,Str "2-2"]]])] +,OrderedList (1,DefaultStyle,DefaultDelim) + [[Plain [Str "one"]] + ,[Plain [Str "two"] + ,BulletList + [[Plain [Str "two",Space,Str "point",Space,Str "one"]] + ,[Plain [Str "two",Space,Str "point",Space,Str "two"]]]] + ,[Plain [Str "three"] + ,DefinitionList + [([Str "three",Space,Str "item",Space,Str "one"], + [[Plain [Str "three",Space,Str "def",Space,Str "one"]]])]] + ,[Plain [Str "four"] + ,DefinitionList + [([], + [[Plain [Str "four",Space,Str "def",Space,Str "one"]] + ,[Plain [Str "this",Space,Str "looks",Space,Str "like",Space,Str "a",Space,Str "continuation"]] + ,[Plain [Str "and",Space,Str "is",Space,Str "often",Space,Str "used"]] + ,[Plain [Str "instead",LineBreak,Str "of",Space,Str "<br/>"]]])]] + ,[Plain [RawInline "mediawiki" "{{{template\n|author=John\n|title=My Book\n}}}"] + ,OrderedList (1,DefaultStyle,DefaultDelim) + [[Plain [Str "five",Space,Str "sub",Space,Str "1"] + ,OrderedList (1,DefaultStyle,DefaultDelim) + [[Plain [Str "five",Space,Str "sub",Space,Str "1",Space,Str "sub",Space,Str "1"]]]] + ,[Plain [Str "five",Space,Str "sub",Space,Str "2"]]]]] +,OrderedList (1,DefaultStyle,DefaultDelim) + [[Plain [Str "list",Space,Str "item",Space,Emph [Str "emph"]] + ,OrderedList (1,DefaultStyle,DefaultDelim) + [[Plain [Str "list",Space,Str "item",Space,Str "B1"]] + ,[Plain [Str "list",Space,Str "item",Space,Str "B2"]]] + ,Para [Str "continuing",Space,Str "list",Space,Str "item",Space,Str "A1"]] + ,[Plain [Str "list",Space,Str "item",Space,Str "A2"]]] +,OrderedList (1,DefaultStyle,DefaultDelim) + [[Plain [Str "abc"]] + ,[Plain [Str "def"]] + ,[Plain [Str "ghi"]]] +,OrderedList (9,DefaultStyle,DefaultDelim) + [[Plain [Str "Amsterdam"]] + ,[Plain [Str "Rotterdam"]] + ,[Plain [Str "The",Space,Str "Hague"]]] +,Header 2 [Str "math"] +,Para [Str "Here",Space,Str "is",Space,Str "some",Space,Math InlineMath "x=\\frac{y^\\pi}{z}",Str "."] +,Header 2 [Str "preformatted",Space,Str "blocks"] +,Para [Code ("",[],[]) "Start\160each\160line\160with\160a\160space.",LineBreak,Code ("",[],[]) "Text\160is\160",Strong [Code ("",[],[]) "preformatted"],Code ("",[],[]) "\160and",LineBreak,Emph [Code ("",[],[]) "markups"],Code ("",[],[]) "\160",Strong [Emph [Code ("",[],[]) "can"]],Code ("",[],[]) "\160be\160done."] +,Para [Code ("",[],[]) "\160hell\160\160\160\160\160\160yeah"] +,Para [Code ("",[],[]) "Start\160with\160a\160space\160in\160the\160first\160column,",LineBreak,Code ("",[],[]) "(before\160the\160<nowiki>).",LineBreak,Code ("",[],[]) "",LineBreak,Code ("",[],[]) "Then\160your\160block\160format\160will\160be",LineBreak,Code ("",[],[]) "\160\160\160\160maintained.",LineBreak,Code ("",[],[]) "",LineBreak,Code ("",[],[]) "This\160is\160good\160for\160copying\160in\160code\160blocks:",LineBreak,Code ("",[],[]) "",LineBreak,Code ("",[],[]) "def\160function():",LineBreak,Code ("",[],[]) "\160\160\160\160\"\"\"documentation\160string\"\"\"",LineBreak,Code ("",[],[]) "",LineBreak,Code ("",[],[]) "\160\160\160\160if\160True:",LineBreak,Code ("",[],[]) "\160\160\160\160\160\160\160\160print\160True",LineBreak,Code ("",[],[]) "\160\160\160\160else:",LineBreak,Code ("",[],[]) "\160\160\160\160\160\160\160\160print\160False"] +,Para [Str "Not"] +,RawBlock "html" "<hr/>" +,Para [Str "preformatted"] +,Header 2 [Str "templates"] +,RawBlock "mediawiki" "{{Welcome}}" +,RawBlock "mediawiki" "{{Foo:Bar}}" +,RawBlock "mediawiki" "{{Thankyou|all your effort|Me}}" +,Para [Str "Written",Space,RawInline "mediawiki" "{{{date}}}",Space,Str "by",Space,RawInline "mediawiki" "{{{name}}}",Str "."] +,Header 2 [Str "tables"] +,Table [] [AlignDefault,AlignDefault] [0.0,0.0] + [[] + ,[]] + [[[Para [Str "Orange"]] + ,[Para [Str "Apple"]]] + ,[[Para [Str "Bread"]] + ,[Para [Str "Pie"]]] + ,[[Para [Str "Butter"]] + ,[Para [Str "Ice",Space,Str "cream"]]]] +,Table [Str "Food",Space,Str "complements"] [AlignDefault,AlignDefault] [0.0,0.0] + [[Para [Str "Orange"]] + ,[Para [Str "Apple"]]] + [[[Para [Str "Bread"]] + ,[Para [Str "Pie"]]] + ,[[Para [Str "Butter"]] + ,[Para [Str "Ice",Space,Str "cream"]]]] +,Table [Str "Food",Space,Str "complements"] [AlignDefault,AlignDefault] [0.0,0.0] + [[Para [Str "Orange"]] + ,[Para [Str "Apple"]]] + [[[Para [Str "Bread"] + ,Para [Str "and",Space,Str "cheese"]] + ,[Para [Str "Pie"] + ,OrderedList (1,DefaultStyle,DefaultDelim) + [[Plain [Str "apple"]] + ,[Plain [Str "carrot"]]]]]] +,Table [] [AlignDefault,AlignDefault,AlignDefault] [0.0,0.0,0.0] + [[] + ,[] + ,[]] + [[[Para [Str "Orange"]] + ,[Para [Str "Apple"]] + ,[Para [Str "more"]]] + ,[[Para [Str "Bread"]] + ,[Para [Str "Pie"]] + ,[Para [Str "more"]]] + ,[[Para [Str "Butter"]] + ,[Para [Str "Ice",Space,Str "cream"]] + ,[Para [Str "and",Space,Str "more"]]]] +,Table [] [AlignLeft,AlignRight,AlignCenter] [0.25,0.125,0.125] + [[Para [Str "Left"]] + ,[Para [Str "Right"]] + ,[Para [Str "Center"]]] + [[[Para [Str "left"]] + ,[Para [Str "15.00"]] + ,[Para [Str "centered"]]] + ,[[Para [Str "more"]] + ,[Para [Str "2.0"]] + ,[Para [Str "more"]]]] +,Table [] [AlignDefault,AlignDefault] [0.0,0.0] + [[] + ,[]] + [[[Para [Str "Orange"]] + ,[Para [Str "Apple"]]] + ,[[Para [Str "Bread"]] + ,[Table [] [AlignDefault,AlignDefault] [0.0,0.0] + [[Para [Str "fruit"]] + ,[Para [Str "topping"]]] + [[[Para [Str "apple"]] + ,[Para [Str "ice",Space,Str "cream"]]]]]] + ,[[Para [Str "Butter"]] + ,[Para [Str "Ice",Space,Str "cream"]]]] +,Header 2 [Str "notes"] +,Para [Str "My",Space,Str "note!",Note [Plain [Str "This."]]]] diff --git a/tests/mediawiki-reader.wiki b/tests/mediawiki-reader.wiki new file mode 100644 index 000000000..a9769594d --- /dev/null +++ b/tests/mediawiki-reader.wiki @@ -0,0 +1,369 @@ += header = + +== header level two == + +===header level 3=== + +====header ''level'' four==== + +===== header level 5 ===== + +====== header level 6 ====== + +======= not a header ======== + + == not a header == + +== emph and strong == + +''emph'' '''strong''' + +'''''strong and emph''''' + +'''''emph inside'' strong''' + +'''strong with ''emph''''' + +'''''strong inside''' emph'' + +== horizontal rule == + +top +---- +bottom + +---- + +== nowiki == + +<nowiki>''not emph''</nowiki> + +== strikeout == + +<strike> This is ''struck out''</strike> + +== entities == + +hi & low + +hi & low + +Gödel + +̉પ + +== comments == + +inline<!-- secret --> comment + +<!-- secret --> + +between blocks + + <!-- secret --> + +== linebreaks == + +hi<br/>there + +hi<br> +there + +== : indents == + +hi +: there +bud + +hi +:: there +bud + +== p tags == + +hi there +<p> +bud +<p> +another +</p> + +== raw html == + +hi <span style="color:red">''there''</span>. + +<ins>inserted</ins> + +<div class="special"> +hi ''there'' +</div> + +== sup, sub, del == + +H<sub>2</sub>O base<sup>''exponent''</sup> +<del>hello</del> + +== inline code == + +<code>*→*</code> <tt>typed</tt> <hask>>>=</hask> + +== code blocks == + +<pre> +case xs of + (_:_) -> reverse xs + [] -> ['*'] +</pre> + +<haskell> +case xs of + (_:_) -> reverse xs + [] -> ['*'] +</haskell> + +<syntaxhighlight lang="ruby" line start=100> +widgets.each do |w| + print w.price +end +</syntaxhighlight> + +== block quotes == + +Regular paragraph +<blockquote> +This is a block quote. + +With two paragraphs. +</blockquote> +Nother paragraph. + +== external links == + +[http://google.com ''Google'' search engine] + +http://johnmacfarlane.net/pandoc/ + +[http://google.com] [http://yahoo.com] + +[mailto:info@example.org email me] + +== internal links == + +[[Help]] + +[[Help|the help page]] + +[[Help]]ers + +[[Help]]<nowiki/>ers + +[[Help:Contents|]] + +[[#My anchor]] + +[[Page#with anchor|and text]] + +== images == + +[[File:example.jpg|caption]] + +[[File:example.jpg|border|the ''caption'' with [http://google.com external link]]] + +[[File:example.jpg|frameless|border|30x40px|caption]] + +[[File:example.jpg]] + +== lists == + +* Start each line +* with an asterisk (*). +** More asterisks gives deeper +*** and deeper levels. +* Line breaks<br/>don't break levels. +*** But jumping levels creates empty space. +Any other start ends the list. + +# Start each line +# with a number sign (#). +## More number signs gives deeper +### and deeper +### levels. +# Line breaks<br/>don't break levels. +### But jumping levels creates empty space. +# Blank lines + +# end the list and start another. +Any other start also +ends the list. + +;item 1 +: definition 1 +;item 2 +: definition 2-1 +: definition 2-2 + +# one +# two +#* two point one +#* two point two +# three +#; three item one +#: three def one +# four +#: four def one +#: this looks like a continuation +#: and is often used +#: instead<br/>of <nowiki><br/></nowiki> +# {{{template +|author=John +|title=My Book +}}} +## five sub 1 +### five sub 1 sub 1 +## five sub 2 + +<ol> + <li>list item ''emph'' + <ol> + <li>list item B1</li> + <li>list item B2</li> + </ol>continuing list item A1 + </li> + <li>list item A2</li> +</ol> + +<ul> +#abc +#def +#ghi +</ul> + +<ol start="9"> +<li>Amsterdam</li> +<li>Rotterdam</li> +<li>The Hague</li> +</ol> + +== math == + +Here is some <math>x=\frac{y^\pi}{z}</math>. + +== preformatted blocks == + + Start each line with a space. + Text is '''preformatted''' and + ''markups'' '''''can''''' be done. + + hell yeah + + <nowiki>Start with a space in the first column, +(before the <nowiki>). + +Then your block format will be + maintained. + +This is good for copying in code blocks: + +def function(): + """documentation string""" + + if True: + print True + else: + print False</nowiki> + +Not<hr/> preformatted + +== templates == + +{{Welcome}} + +{{Foo:Bar}} + +{{Thankyou|all your effort|Me}} + +Written {{{date}}} by {{{name}}}. + +== tables == + +{| +|- +|Orange +|Apple +|- +|Bread +|Pie +|- +|Butter +|Ice cream +|} + +{| +|+Food complements +!Orange +!Apple +|- +|Bread +|Pie +|- +!Butter +|Ice cream +|} + +{| +|+Food complements +!Orange +!Apple +|- +|Bread + +and cheese +|Pie + +# apple +# carrot + +|} + +{| +| Orange || Apple || more +|- +| Bread || Pie || more +|- +| Butter || Ice cream || and more +|} + +{|width="50%" +! align="left" width="50%"| Left +! align="right"|Right +! align="center"|Center +|- +| left || 15.00 || centered +|- +| more || 2.0 || more +|} + +{| +|- +|Orange +|Apple +|- +|Bread +| +{| +!fruit +!topping +|- +|apple +|ice cream +|} +|- +|Butter +|Ice cream +|} + + +== notes == + +My note!<ref>This.</ref> + diff --git a/tests/tables.latex b/tests/tables.latex index 56b469a54..2a8013413 100644 --- a/tests/tables.latex +++ b/tests/tables.latex @@ -1,64 +1,59 @@ Simple table with caption: -\ctable[caption = {Demonstration of simple table syntax.}, -pos = H, center, botcap]{rlcl} -{% notes -} -{% rows -\FL +\begin{longtable}[c]{rlcl} +\hline\noalign{\medskip} Right & Left & Center & Default -\ML +\\\noalign{\medskip} +\hline\noalign{\medskip} 12 & 12 & 12 & 12 \\\noalign{\medskip} 123 & 123 & 123 & 123 \\\noalign{\medskip} 1 & 1 & 1 & 1 -\LL -} +\\\noalign{\medskip} +\hline +\noalign{\medskip} +\caption{Demonstration of simple table syntax.} +\end{longtable} Simple table without caption: -\ctable[pos = H, center, botcap]{rlcl} -{% notes -} -{% rows -\FL +\begin{longtable}[c]{rlcl} +\hline\noalign{\medskip} Right & Left & Center & Default -\ML +\\\noalign{\medskip} +\hline\noalign{\medskip} 12 & 12 & 12 & 12 \\\noalign{\medskip} 123 & 123 & 123 & 123 \\\noalign{\medskip} 1 & 1 & 1 & 1 -\LL -} +\\\noalign{\medskip} +\hline +\end{longtable} Simple table indented two spaces: -\ctable[caption = {Demonstration of simple table syntax.}, -pos = H, center, botcap]{rlcl} -{% notes -} -{% rows -\FL +\begin{longtable}[c]{rlcl} +\hline\noalign{\medskip} Right & Left & Center & Default -\ML +\\\noalign{\medskip} +\hline\noalign{\medskip} 12 & 12 & 12 & 12 \\\noalign{\medskip} 123 & 123 & 123 & 123 \\\noalign{\medskip} 1 & 1 & 1 & 1 -\LL -} +\\\noalign{\medskip} +\hline +\noalign{\medskip} +\caption{Demonstration of simple table syntax.} +\end{longtable} Multiline table with caption: -\ctable[caption = {Here's the caption. It may span multiple lines.}, -pos = H, center, botcap]{clrl} -{% notes -} -{% rows -\FL +\begin{longtable}[c]{clrl} +\hline\noalign{\medskip} \parbox[b]{0.15\columnwidth}{\centering Centered Header } & \parbox[b]{0.14\columnwidth}{\raggedright @@ -68,7 +63,8 @@ Right Aligned } & \parbox[b]{0.34\columnwidth}{\raggedright Default aligned } -\ML +\\\noalign{\medskip} +\hline\noalign{\medskip} \parbox[t]{0.15\columnwidth}{\centering First } & \parbox[t]{0.14\columnwidth}{\raggedright @@ -88,16 +84,16 @@ row } & \parbox[t]{0.34\columnwidth}{\raggedright Here's another one. Note the blank line between rows. } -\LL -} +\\\noalign{\medskip} +\hline +\noalign{\medskip} +\caption{Here's the caption. It may span multiple lines.} +\end{longtable} Multiline table without caption: -\ctable[pos = H, center, botcap]{clrl} -{% notes -} -{% rows -\FL +\begin{longtable}[c]{clrl} +\hline\noalign{\medskip} \parbox[b]{0.15\columnwidth}{\centering Centered Header } & \parbox[b]{0.14\columnwidth}{\raggedright @@ -107,7 +103,8 @@ Right Aligned } & \parbox[b]{0.34\columnwidth}{\raggedright Default aligned } -\ML +\\\noalign{\medskip} +\hline\noalign{\medskip} \parbox[t]{0.15\columnwidth}{\centering First } & \parbox[t]{0.14\columnwidth}{\raggedright @@ -127,31 +124,27 @@ row } & \parbox[t]{0.34\columnwidth}{\raggedright Here's another one. Note the blank line between rows. } -\LL -} +\\\noalign{\medskip} +\hline +\end{longtable} Table without column headers: -\ctable[pos = H, center, botcap]{rlcr} -{% notes -} -{% rows -\FL +\begin{longtable}[c]{rlcr} +\hline\noalign{\medskip} 12 & 12 & 12 & 12 \\\noalign{\medskip} 123 & 123 & 123 & 123 \\\noalign{\medskip} 1 & 1 & 1 & 1 -\LL -} +\\\noalign{\medskip} +\hline +\end{longtable} Multiline table without column headers: -\ctable[pos = H, center, botcap]{clrl} -{% notes -} -{% rows -\FL +\begin{longtable}[c]{clrl} +\hline\noalign{\medskip} \parbox[t]{0.15\columnwidth}{\centering First } & \parbox[t]{0.14\columnwidth}{\raggedright @@ -171,5 +164,6 @@ row } & \parbox[t]{0.34\columnwidth}{\raggedright Here's another one. Note the blank line between rows. } -\LL -} +\\\noalign{\medskip} +\hline +\end{longtable} diff --git a/tests/textile-reader.native b/tests/textile-reader.native index 7532a8fda..84dc50516 100644 --- a/tests/textile-reader.native +++ b/tests/textile-reader.native @@ -155,4 +155,4 @@ Pandoc (Meta {docTitle = [], docAuthors = [], docDate = []}) ,Para [Str "\174",Space,Str "Hi",Str "\174"] ,Para [Str "Hi",Str "\169",Str "2008",Space,Str "\169",Space,Str "2008"] ,Header 1 [Str "Footnotes"] -,Para [Str "A",Space,Str "note",Str ".",Note [Para [Str "The",Space,Str "note",LineBreak,Str "is",Space,Str "here",Str "!",Space,Str "Another",Space,Str "note",Note [Para [Str "Other",Space,Str "note."]],Str "."]]]] +,Para [Str "A",Space,Str "note",Str ".",Note [Para [Str "The",Space,Str "note",LineBreak,Str "is",Space,Str "here",Str "!"]],Space,Str "Another",Space,Str "note",Note [Para [Str "Other",Space,Str "note",Str "."]],Str "."]] diff --git a/tests/writer.context b/tests/writer.context index bb070ce67..ff82aa457 100644 --- a/tests/writer.context +++ b/tests/writer.context @@ -6,7 +6,7 @@ % Enable hyperlinks \setupinteraction[state=start, color=middleblue] -\setuppapersize [letter][letter] +\setuppapersize [letter][letter] \setuplayout [width=middle, backspace=1.5in, cutspace=1.5in, height=middle, topspace=0.75in, bottomspace=0.75in] diff --git a/tests/writer.html b/tests/writer.html index c9ef6f1f1..67bb57ead 100644 --- a/tests/writer.html +++ b/tests/writer.html @@ -281,29 +281,34 @@ These should not be escaped: \$ \\ \> \[ \{</code></pre> <dl> <dt>apple</dt> <dd>red fruit -</dd><dd>computer +</dd> +<dd>computer </dd> <dt>orange</dt> <dd>orange fruit -</dd><dd>bank +</dd> +<dd>bank </dd> </dl> <p>Multiple definitions, loose:</p> <dl> <dt>apple</dt> <dd><p>red fruit</p> -</dd><dd><p>computer</p> +</dd> +<dd><p>computer</p> </dd> <dt>orange</dt> <dd><p>orange fruit</p> -</dd><dd><p>bank</p> +</dd> +<dd><p>bank</p> </dd> </dl> <p>Blank line after term, indented marker, alternate markers:</p> <dl> <dt>apple</dt> <dd><p>red fruit</p> -</dd><dd><p>computer</p> +</dd> +<dd><p>computer</p> </dd> <dt>orange</dt> <dd><p>orange fruit</p> diff --git a/tests/writer.latex b/tests/writer.latex index 3efc08277..bab23cd64 100644 --- a/tests/writer.latex +++ b/tests/writer.latex @@ -786,7 +786,7 @@ Just a \href{/url/}{URL}. \href{/url/}{URL and title} -\href{/url/with\_underscore}{with\_underscore} +\href{/url/with_underscore}{with\_underscore} \href{mailto:nobody@nowhere.net}{Email link} diff --git a/windows/pandoc-setup.iss b/windows/pandoc-setup.iss index 712cf4f0d..47a480b1f 100644 --- a/windows/pandoc-setup.iss +++ b/windows/pandoc-setup.iss @@ -71,8 +71,6 @@ Source: "..\slidy\graphics\*"; DestDir: "{app}\slidy\graphics"; Flags: ignorever Source: "..\slidy\scripts\*"; DestDir: "{app}\slidy\scripts"; Flags: ignoreversion
Source: "..\slidy\styles\*"; DestDir: "{app}\slidy\styles"; Flags: ignoreversion
Source: "..\dzslides\*"; DestDir: "{app}\dzslides"; Flags: ignoreversion
-Source: "pcre-license.txt"; DestDir: "{app}"; Flags: ignoreversion
-Source: "pcre3.dll"; DestDir: "{sys}"; Flags: onlyifdoesntexist sharedfile
; NOTE: Don't use "Flags: ignoreversion" on any shared system files
[Icons]
diff --git a/windows/pcre-license.txt b/windows/pcre-license.txt deleted file mode 100644 index 73f8cde3d..000000000 --- a/windows/pcre-license.txt +++ /dev/null @@ -1,68 +0,0 @@ -PCRE LICENCE ------------- - -PCRE is a library of functions to support regular expressions whose syntax -and semantics are as close as possible to those of the Perl 5 language. - -Release 8 of PCRE is distributed under the terms of the "BSD" licence, as -specified below. The documentation for PCRE, supplied in the "doc" -directory, is distributed under the same terms as the software itself. - -The basic library functions are written in C and are freestanding. Also -included in the distribution is a set of C++ wrapper functions. - - -THE BASIC LIBRARY FUNCTIONS ---------------------------- - -Written by: Philip Hazel -Email local part: ph10 -Email domain: cam.ac.uk - -University of Cambridge Computing Service, -Cambridge, England. - -Copyright (c) 1997-2009 University of Cambridge -All rights reserved. - - -THE C++ WRAPPER FUNCTIONS -------------------------- - -Contributed by: Google Inc. - -Copyright (c) 2007-2008, Google Inc. -All rights reserved. - - -THE "BSD" LICENCE ------------------ - -Redistribution and use in source and binary forms, with or without -modification, are permitted provided that the following conditions are met: - - * Redistributions of source code must retain the above copyright notice, - this list of conditions and the following disclaimer. - - * Redistributions in binary form must reproduce the above copyright - notice, this list of conditions and the following disclaimer in the - documentation and/or other materials provided with the distribution. - - * Neither the name of the University of Cambridge nor the name of Google - Inc. nor the names of their contributors may be used to endorse or - promote products derived from this software without specific prior - written permission. - -THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS "AS IS" -AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED TO, THE -IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR PURPOSE -ARE DISCLAIMED. IN NO EVENT SHALL THE COPYRIGHT OWNER OR CONTRIBUTORS BE -LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL, SPECIAL, EXEMPLARY, OR -CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT LIMITED TO, PROCUREMENT OF -SUBSTITUTE GOODS OR SERVICES; LOSS OF USE, DATA, OR PROFITS; OR BUSINESS -INTERRUPTION) HOWEVER CAUSED AND ON ANY THEORY OF LIABILITY, WHETHER IN -CONTRACT, STRICT LIABILITY, OR TORT (INCLUDING NEGLIGENCE OR OTHERWISE) -ARISING IN ANY WAY OUT OF THE USE OF THIS SOFTWARE, EVEN IF ADVISED OF THE -POSSIBILITY OF SUCH DAMAGE. - -End diff --git a/windows/pcre3.dll b/windows/pcre3.dll Binary files differdeleted file mode 100644 index b5fd2a637..000000000 --- a/windows/pcre3.dll +++ /dev/null |
