diff options
30 files changed, 1376 insertions, 211 deletions
diff --git a/.travis.yml b/.travis.yml index 1b9b9a095..dbccb151c 100644 --- a/.travis.yml +++ b/.travis.yml @@ -43,6 +43,10 @@ matrix: compiler: ": #GHC 8.0.2" addons: {apt: {packages: [cabal-install-1.24,ghc-8.0.2,happy-1.19.5], sources: [hvr-ghc]}} + - env: BUILD=cabal GHCVER=8.2.1 CABALVER=1.24 + compiler: ": #GHC 8.2.1" + addons: {apt: {packages: [cabal-install-1.24,ghc-8.2.1,happy-1.19.5], sources: [hvr-ghc]}} + # Build with the newest GHC and cabal-install. This is an accepted failure, # see below. # - env: BUILD=cabal GHCVER=head CABALVER=head diff --git a/MANUAL.txt b/MANUAL.txt index 951558ca0..c8e042544 100644 --- a/MANUAL.txt +++ b/MANUAL.txt @@ -15,8 +15,8 @@ another, and a command-line tool that uses this library. It can read [Markdown], [CommonMark], [PHP Markdown Extra], [GitHub-Flavored Markdown], [MultiMarkdown], and (subsets of) [Textile], [reStructuredText], [HTML], [LaTeX], [MediaWiki markup], [TWiki markup], -[Haddock markup], [OPML], [Emacs Org mode], [DocBook], [Muse], [txt2tags], -[Vimwiki], [EPUB], [ODT], and [Word docx]; and it can +[TikiWiki markup], [Haddock markup], [OPML], [Emacs Org mode], [DocBook], +[Muse], [txt2tags], [Vimwiki], [EPUB], [ODT], and [Word docx]; and it can write plain text, [Markdown], [CommonMark], [PHP Markdown Extra], [GitHub-Flavored Markdown], [MultiMarkdown], [reStructuredText], [XHTML], [HTML5], [LaTeX] \(including @@ -85,6 +85,7 @@ Markdown can be expected to be lossy. [DokuWiki markup]: https://www.dokuwiki.org/dokuwiki [ZimWiki markup]: http://zim-wiki.org/manual/Help/Wiki_Syntax.html [TWiki markup]: http://twiki.org/cgi-bin/view/TWiki/TextFormattingRules +[TikiWiki markup]: https://doc.tiki.org/Wiki-Syntax-Text#The_Markup_Language_Wiki-Syntax [Haddock markup]: https://www.haskell.org/haddock/doc/html/ch03s08.html [groff man]: http://man7.org/linux/man-pages/man7/groff_man.7.html [groff ms]: http://man7.org/linux/man-pages/man7/groff_ms.7.html @@ -268,10 +269,10 @@ General options (reStructuredText), `html` (HTML), `docbook` (DocBook), `t2t` (txt2tags), `docx` (docx), `odt` (ODT), `epub` (EPUB), `opml` (OPML), `org` (Emacs Org mode), `mediawiki` (MediaWiki markup), `twiki` (TWiki - markup), `haddock` (Haddock markup), or `latex` (LaTeX). If - `+lhs` is appended to `markdown`, `rst`, `latex`, or `html`, the - input will be treated as literate Haskell source: see [Literate - Haskell support], below. Markdown + markup), `tikiwiki` (TikiWiki markup), `haddock` (Haddock markup), or + `latex` (LaTeX). If `+lhs` is appended to `markdown`, `rst`, `latex`, or + `html`, the input will be treated as literate Haskell source: see + [Literate Haskell support], below. Markdown syntax extensions can be individually enabled or disabled by appending `+EXTENSION` or `-EXTENSION` to the format name. So, for example, `markdown_strict+footnotes+definition_lists` is strict @@ -19,8 +19,8 @@ another, and a command-line tool that uses this library. It can read [Markdown], [CommonMark], [PHP Markdown Extra], [GitHub-Flavored Markdown], [MultiMarkdown], and (subsets of) [Textile], [reStructuredText], [HTML], [LaTeX], [MediaWiki markup], [TWiki markup], -[Haddock markup], [OPML], [Emacs Org mode], [DocBook], [Muse], [txt2tags], -[Vimwiki], [EPUB], [ODT], and [Word docx]; and it can +[TikiWiki markup], [Haddock markup], [OPML], [Emacs Org mode], [DocBook], +[Muse], [txt2tags], [Vimwiki], [EPUB], [ODT], and [Word docx]; and it can write plain text, [Markdown], [CommonMark], [PHP Markdown Extra], [GitHub-Flavored Markdown], [MultiMarkdown], [reStructuredText], [XHTML], [HTML5], [LaTeX] \(including diff --git a/linux/control.in b/linux/control.in index 79e0b1b6f..9c9f90be8 100644 --- a/linux/control.in +++ b/linux/control.in @@ -13,13 +13,13 @@ Description: general markup converter format to another, and a command-line tool that uses this library. It can read several dialects of Markdown and (subsets of) HTML, reStructuredText, LaTeX, DocBook, - MediaWiki markup, TWiki markup, Haddock markup, OPML, - Emacs Org-Mode, txt2tags, Muse, Vimwiki, Word Docx, ODT, - and Textile, and it can write Markdown, reStructuredText, - XHTML, HTML 5, LaTeX, ConTeXt, DocBook, JATS, OPML, TEI, - OpenDocument, ODT, Word docx, RTF, MediaWiki, DokuWiki, - ZimWiki, Textile, groff man, groff ms, plain text, - Emacs Org-Mode, AsciiDoc, Haddock markup, EPUB (v2 and v3), - FictionBook2, InDesign ICML, Muse, and several kinds of - HTML/javascript slide shows (S5, Slidy, Slideous, - DZSlides, reveal.js). + MediaWiki markup, TWiki markup, TikiWiki markup, Haddock + markup, OPML, Emacs Org-Mode, txt2tags, Muse, Vimwiki, + Word Docx, ODT, and Textile, and it can write Markdown, + reStructuredText, XHTML, HTML 5, LaTeX, ConTeXt, DocBook, + JATS, OPML, TEI, OpenDocument, ODT, Word docx, RTF, + MediaWiki, DokuWiki, ZimWiki, Textile, groff man, groff + ms, plain text, Emacs Org-Mode, AsciiDoc, Haddock markup, + EPUB (v2 and v3), FictionBook2, InDesign ICML, Muse, and + several kinds of HTML/javascript slide shows (S5, Slidy, + Slideous, DZSlides, reveal.js). diff --git a/man/pandoc.1 b/man/pandoc.1 index 15dbcbe07..aafd16070 100644 --- a/man/pandoc.1 +++ b/man/pandoc.1 @@ -11,15 +11,16 @@ Pandoc is a Haskell library for converting from one markup format to another, and a command\-line tool that uses this library. It can read Markdown, CommonMark, PHP Markdown Extra, GitHub\-Flavored Markdown, MultiMarkdown, and (subsets of) Textile, reStructuredText, -HTML, LaTeX, MediaWiki markup, TWiki markup, Haddock markup, OPML, Emacs -Org mode, DocBook, txt2tags, EPUB, ODT and Word docx; and it can write -plain text, Markdown, CommonMark, PHP Markdown Extra, GitHub\-Flavored -Markdown, MultiMarkdown, reStructuredText, XHTML, HTML5, LaTeX -(including \f[C]beamer\f[] slide shows), ConTeXt, RTF, OPML, DocBook, -OpenDocument, ODT, Word docx, GNU Texinfo, MediaWiki markup, DokuWiki -markup, ZimWiki markup, Haddock markup, EPUB (v2 or v3), FictionBook2, -Textile, groff man, groff ms, Emacs Org mode, AsciiDoc, InDesign ICML, -TEI Simple, and Slidy, Slideous, DZSlides, reveal.js or S5 HTML slide +HTML, LaTeX, MediaWiki markup, TWiki markup, TikiWiki markup, +Haddock markup, OPML, Emacs Org mode, DocBook, txt2tags, EPUB, ODT +and Word docx; and it can write plain text, Markdown, CommonMark, +PHP Markdown Extra, GitHub\-Flavored Markdown, MultiMarkdown, +reStructuredText, XHTML, HTML5, LaTeX (including \f[C]beamer\f[] +slide shows), ConTeXt, RTF, OPML, DocBook, OpenDocument, ODT, Word +docx, GNU Texinfo, MediaWiki markup, DokuWiki markup, ZimWiki +markup, Haddock markup, EPUB (v2 or v3), FictionBook2, Textile, +groff man, groff ms, Emacs Org mode, AsciiDoc, InDesign ICML, TEI +Simple, and Slidy, Slideous, DZSlides, reveal.js or S5 HTML slide shows. It can also produce PDF output on systems where LaTeX, ConTeXt, \f[C]pdfroff\f[], or \f[C]wkhtmltopdf\f[] is installed. @@ -231,13 +232,13 @@ Markdown), \f[C]textile\f[] (Textile), \f[C]rst\f[] (reStructuredText), (txt2tags), \f[C]docx\f[] (docx), \f[C]odt\f[] (ODT), \f[C]epub\f[] (EPUB), \f[C]opml\f[] (OPML), \f[C]org\f[] (Emacs Org mode), \f[C]mediawiki\f[] (MediaWiki markup), \f[C]twiki\f[] (TWiki markup), -\f[C]haddock\f[] (Haddock markup), or \f[C]latex\f[] (LaTeX). -If \f[C]+lhs\f[] is appended to \f[C]markdown\f[], \f[C]rst\f[], -\f[C]latex\f[], or \f[C]html\f[], the input will be treated as literate -Haskell source: see Literate Haskell support, below. -Markdown syntax extensions can be individually enabled or disabled by -appending \f[C]+EXTENSION\f[] or \f[C]\-EXTENSION\f[] to the format -name. +\f[C]tikiwiki\f[] (TikiWiki markup), \f[C]haddock\f[] (Haddock markup), +or \f[C]latex\f[] (LaTeX). If \f[C]+lhs\f[] is appended to +\f[C]markdown\f[], \f[C]rst\f[], \f[C]latex\f[], or \f[C]html\f[], +the input will be treated as literate Haskell source: see Literate +Haskell support, below. Markdown syntax extensions can be +individually enabled or disabled by appending \f[C]+EXTENSION\f[] or +\f[C]\-EXTENSION\f[] to the format name. So, for example, \f[C]markdown_strict+footnotes+definition_lists\f[] is strict Markdown with footnotes and definition lists enabled, and \f[C]markdown\-pipe_tables+hard_line_breaks\f[] is pandoc's Markdown diff --git a/pandoc.cabal b/pandoc.cabal index dd92690ce..9de072755 100644 --- a/pandoc.cabal +++ b/pandoc.cabal @@ -17,16 +17,16 @@ Description: Pandoc is a Haskell library for converting from one markup format to another, and a command-line tool that uses this library. It can read several dialects of Markdown and (subsets of) HTML, reStructuredText, LaTeX, DocBook, - MediaWiki markup, TWiki markup, Haddock markup, OPML, - Emacs Org-Mode, txt2tags, Muse, Vimwiki, Word Docx, ODT, - and Textile, and it can write Markdown, reStructuredText, - XHTML, HTML 5, LaTeX, ConTeXt, DocBook, JATS, OPML, TEI, - OpenDocument, ODT, Word docx, RTF, MediaWiki, DokuWiki, - ZimWiki, Textile, groff man, groff ms, plain text, + MediaWiki markup, TWiki markup, TikiWiki markup, Haddock + markup, OPML, Emacs Org-Mode, txt2tags, Muse, Vimwiki, Word + Docx, ODT, and Textile, and it can write Markdown, + reStructuredText, XHTML, HTML 5, LaTeX, ConTeXt, DocBook, + JATS, OPML, TEI, OpenDocument, ODT, Word docx, RTF, MediaWiki, + DokuWiki, ZimWiki, Textile, groff man, groff ms, plain text, Emacs Org-Mode, AsciiDoc, Haddock markup, EPUB (v2 and v3), FictionBook2, InDesign ICML, Muse, and several kinds of - HTML/javascript slide shows (S5, Slidy, Slideous, - DZSlides, reveal.js). + HTML/javascript slide shows (S5, Slidy, Slideous, DZSlides, + reveal.js). . In contrast to most existing tools for converting Markdown to HTML, pandoc has a modular design: it consists of a set of @@ -252,6 +252,7 @@ Extra-Source-Files: test/epub/*.native test/txt2tags.t2t test/twiki-reader.twiki + test/tikiwiki-reader.tikiwiki test/odt/odt/*.odt test/odt/markdown/*.md test/odt/native/*.native @@ -283,7 +284,7 @@ Flag old-locale Library Build-Depends: base >= 4.7 && < 5, syb >= 0.1 && < 0.8, - containers >= 0.1 && < 0.6, + containers >= 0.4.2.1 && < 0.6, unordered-containers >= 0.2 && < 0.3, parsec >= 3.1 && < 3.2, mtl >= 2.2 && < 2.3, @@ -379,6 +380,7 @@ Library Text.Pandoc.Readers.Native, Text.Pandoc.Readers.Haddock, Text.Pandoc.Readers.TWiki, + Text.Pandoc.Readers.TikiWiki, Text.Pandoc.Readers.Txt2Tags, Text.Pandoc.Readers.Docx, Text.Pandoc.Readers.Odt, @@ -546,7 +548,7 @@ Test-Suite test-pandoc tasty-quickcheck >= 0.8 && < 0.9, tasty-golden >= 2.3 && < 2.4, QuickCheck >= 2.4 && < 2.11, - containers >= 0.1 && < 0.6, + containers >= 0.4.2.1 && < 0.6, executable-path >= 0.0 && < 0.1, zip-archive >= 0.2.3.4 && < 0.4, mtl >= 2.2 && < 2.3 diff --git a/src/Text/Pandoc/App.hs b/src/Text/Pandoc/App.hs index 68bdc1432..0d4a82b70 100644 --- a/src/Text/Pandoc/App.hs +++ b/src/Text/Pandoc/App.hs @@ -218,7 +218,8 @@ convertWithOpts opts = do templ <- case optTemplate opts of _ | not standalone -> return Nothing Nothing -> do - deftemp <- getDefaultTemplate datadir format + deftemp <- runIO $ + getDefaultTemplate datadir format case deftemp of Left e -> E.throwIO e Right t -> return (Just t) @@ -991,10 +992,10 @@ options = , Option "D" ["print-default-template"] (ReqArg (\arg _ -> do - templ <- getDefaultTemplate Nothing arg + templ <- runIO $ getDefaultTemplate Nothing arg case templ of Right t -> UTF8.hPutStr stdout t - Left e -> E.throwIO $ PandocAppError (show e) + Left e -> E.throwIO e exitSuccess) "FORMAT") "" -- "Print default template for FORMAT" diff --git a/src/Text/Pandoc/Class.hs b/src/Text/Pandoc/Class.hs index 120ba8fee..46e300953 100644 --- a/src/Text/Pandoc/Class.hs +++ b/src/Text/Pandoc/Class.hs @@ -5,7 +5,8 @@ {-# LANGUAGE FlexibleContexts #-} {- -Copyright (C) 2016 Jesse Rosenthal <jrosenthal@jhu.edu> +Copyright (C) 2016-17 Jesse Rosenthal <jrosenthal@jhu.edu> +and John MacFarlane. This program is free software; you can redistribute it and/or modify it under the terms of the GNU General Public License as published by @@ -24,7 +25,7 @@ Foundation, Inc., 59 Temple Place, Suite 330, Boston, MA 02111-1307 USA {- | Module : Text.Pandoc.Class - Copyright : Copyright (C) 2016 Jesse Rosenthal + Copyright : Copyright (C) 2016-17 Jesse Rosenthal, John MacFarlane License : GNU GPL, version 2 or above Maintainer : Jesse Rosenthal <jrosenthal@jhu.edu> @@ -60,6 +61,7 @@ module Text.Pandoc.Class ( PandocMonad(..) , PandocPure(..) , FileTree(..) , FileInfo(..) + , addToFileTree , runIO , runIOorExplode , runPure @@ -101,7 +103,8 @@ import qualified Data.ByteString as B import qualified Data.ByteString.Lazy as BL import qualified System.Environment as IO (lookupEnv) import System.FilePath.Glob (match, compile) -import System.Directory (createDirectoryIfMissing) +import System.Directory (createDirectoryIfMissing, getDirectoryContents, + doesDirectoryExist) import System.FilePath ((</>), (<.>), takeDirectory, takeExtension, dropExtension, isRelative, normalise) import qualified System.FilePath.Glob as IO (glob) @@ -120,36 +123,64 @@ import qualified Data.Map as M import Text.Pandoc.Error import qualified Debug.Trace +-- | The PandocMonad typeclass contains all the potentially +-- IO-related functions used in pandoc's readers and writers. +-- Instances of this typeclass may implement these functions +-- in IO (as in 'PandocIO') or using an internal state that +-- represents a file system, time, and so on (as in 'PandocPure'). class (Functor m, Applicative m, Monad m, MonadError PandocError m) => PandocMonad m where + -- | Lookup an environment variable. lookupEnv :: String -> m (Maybe String) + -- | Get the current (UTC) time. getCurrentTime :: m UTCTime + -- | Get the locale's time zone. getCurrentTimeZone :: m TimeZone + -- | Return a new generator for random numbers. newStdGen :: m StdGen + -- | Return a new unique integer. newUniqueHash :: m Int + -- | Retrieve contents and mime type from a URL, raising + -- an error on failure. openURL :: String -> m (B.ByteString, Maybe MimeType) + -- | Read the lazy ByteString contents from a file path, + -- raising an error on failure. readFileLazy :: FilePath -> m BL.ByteString + -- | Read the strict ByteString contents from a file path, + -- raising an error on failure. readFileStrict :: FilePath -> m B.ByteString + -- | Read file from specified user data directory or, + -- if not found there, from Cabal data directory. readDataFile :: Maybe FilePath -> FilePath -> m B.ByteString + -- | Return a list of paths that match a glob, relative to + -- the working directory. See 'System.FilePath.Glob' for + -- the glob syntax. glob :: String -> m [FilePath] + -- | Return the modification time of a file. getModificationTime :: FilePath -> m UTCTime + -- | Get the value of the 'CommonState' used by all instances + -- of 'PandocMonad'. getCommonState :: m CommonState + -- | Set the value of the 'CommonState' used by all instances + -- of 'PandocMonad'. + -- | Get the value of a specific field of 'CommonState'. putCommonState :: CommonState -> m () - + -- | Get the value of a specific field of 'CommonState'. getsCommonState :: (CommonState -> a) -> m a getsCommonState f = f <$> getCommonState - + -- | Modify the 'CommonState'. modifyCommonState :: (CommonState -> CommonState) -> m () modifyCommonState f = getCommonState >>= putCommonState . f - + -- Output a log message. + logOutput :: LogMessage -> m () + -- Output a debug message to sterr, using 'Debug.Trace.trace'. + -- Note: this writes to stderr even in pure instances. trace :: String -> m () trace msg = do tracing <- getsCommonState stTrace when tracing $ Debug.Trace.trace ("[trace] " ++ msg) (return ()) - logOutput :: LogMessage -> m () - --- Functions defined for all PandocMonad instances +-- * Functions defined for all PandocMonad instances setVerbosity :: PandocMonad m => Verbosity -> m () setVerbosity verbosity = @@ -192,10 +223,10 @@ setResourcePath ps = modifyCommonState $ \st -> st{stResourcePath = ps} getResourcePath :: PandocMonad m => m [FilePath] getResourcePath = getsCommonState stResourcePath -getPOSIXTime :: (PandocMonad m) => m POSIXTime +getPOSIXTime :: PandocMonad m => m POSIXTime getPOSIXTime = utcTimeToPOSIXSeconds <$> getCurrentTime -getZonedTime :: (PandocMonad m) => m ZonedTime +getZonedTime :: PandocMonad m => m ZonedTime getZonedTime = do t <- getCurrentTime tz <- getCurrentTimeZone @@ -437,7 +468,6 @@ data PureState = PureState { stStdGen :: StdGen , stFiles :: FileTree , stUserDataDir :: FileTree , stCabalDataDir :: FileTree - , stFontFiles :: [FilePath] } instance Default PureState where @@ -452,7 +482,6 @@ instance Default PureState where , stFiles = mempty , stUserDataDir = mempty , stCabalDataDir = mempty - , stFontFiles = [] } @@ -479,6 +508,24 @@ newtype FileTree = FileTree {unFileTree :: M.Map FilePath FileInfo} getFileInfo :: FilePath -> FileTree -> Maybe FileInfo getFileInfo fp tree = M.lookup fp $ unFileTree tree +-- | Add the specified file to the FileTree. If file +-- is a directory, add its contents recursively. +addToFileTree :: FileTree -> FilePath -> IO FileTree +addToFileTree (FileTree treemap) fp = do + isdir <- doesDirectoryExist fp + if isdir + then do -- recursively add contents of directories + let isSpecial ".." = True + isSpecial "." = True + isSpecial _ = False + fs <- (map (fp </>) . filter (not . isSpecial)) <$> getDirectoryContents fp + foldM addToFileTree (FileTree treemap) fs + else do + contents <- B.readFile fp + mtime <- IO.getModificationTime fp + return $ FileTree $ + M.insert fp FileInfo{ infoFileMTime = mtime + , infoFileContents = contents } treemap newtype PandocPure a = PandocPure { unPandocPure :: ExceptT PandocError @@ -542,8 +589,8 @@ instance PandocMonad PandocPure where Nothing -> readDataFile Nothing fname glob s = do - fontFiles <- getsPureState stFontFiles - return (filter (match (compile s)) fontFiles) + FileTree ftmap <- getsPureState stFiles + return $ filter (match (compile s)) $ M.keys ftmap getModificationTime fp = do fps <- getsPureState stFiles diff --git a/src/Text/Pandoc/MediaBag.hs b/src/Text/Pandoc/MediaBag.hs index d8d6da345..f89c60c9e 100644 --- a/src/Text/Pandoc/MediaBag.hs +++ b/src/Text/Pandoc/MediaBag.hs @@ -79,5 +79,5 @@ lookupMedia fp (MediaBag mediamap) = M.lookup (splitDirectories fp) mediamap -- their corresponding mime types and the lengths in bytes of the contents. mediaDirectory :: MediaBag -> [(String, MimeType, Int)] mediaDirectory (MediaBag mediamap) = - M.foldWithKey (\fp (mime,contents) -> + M.foldrWithKey (\fp (mime,contents) -> (((Posix.joinPath fp), mime, fromIntegral $ BL.length contents):)) [] mediamap diff --git a/src/Text/Pandoc/Parsing.hs b/src/Text/Pandoc/Parsing.hs index 549042d14..0c97d4060 100644 --- a/src/Text/Pandoc/Parsing.hs +++ b/src/Text/Pandoc/Parsing.hs @@ -247,8 +247,9 @@ instance Monoid a => Monoid (Future s a) where mconcat = liftM mconcat . sequence -- | Parse characters while a predicate is true. -takeWhileP :: Stream [Char] m Char - => (Char -> Bool) -> ParserT [Char] st m [Char] +takeWhileP :: Monad m + => (Char -> Bool) + -> ParserT [Char] st m [Char] takeWhileP f = do -- faster than 'many (satisfy f)' inp <- getInput @@ -262,7 +263,7 @@ takeWhileP f = do -- Parse n characters of input (or the rest of the input if -- there aren't n characters). -takeP :: Stream [Char] m Char => Int -> ParserT [Char] st m [Char] +takeP :: Monad m => Int -> ParserT [Char] st m [Char] takeP n = do guard (n > 0) -- faster than 'count n anyChar' @@ -276,7 +277,7 @@ takeP n = do return xs -- | Parse any line of text -anyLine :: Stream [Char] m Char => ParserT [Char] st m [Char] +anyLine :: Monad m => ParserT [Char] st m [Char] anyLine = do -- This is much faster than: -- manyTill anyChar newline @@ -292,13 +293,13 @@ anyLine = do _ -> mzero -- | Parse any line, include the final newline in the output -anyLineNewline :: Stream [Char] m Char => ParserT [Char] st m [Char] +anyLineNewline :: Monad m => ParserT [Char] st m [Char] anyLineNewline = (++ "\n") <$> anyLine -- | Parse indent by specified number of spaces (or equiv. tabs) -indentWith :: Stream [Char] m Char +indentWith :: Stream s m Char => HasReaderOptions st - => Int -> ParserT [Char] st m [Char] + => Int -> ParserT s st m [Char] indentWith num = do tabStop <- getOption readerTabStop if (num < tabStop) @@ -394,9 +395,9 @@ stringAnyCase (x:xs) = do -- | Parse contents of 'str' using 'parser' and return result. parseFromString :: Monad m - => ParserT String st m a + => ParserT [Char] st m a -> String - -> ParserT String st m a + -> ParserT [Char] st m a parseFromString parser str = do oldPos <- getPosition setPosition $ initialPos "chunk" @@ -422,9 +423,9 @@ parseFromString' parser str = do return res -- | Parse raw line block up to and including blank lines. -lineClump :: Stream [Char] m Char => ParserT [Char] st m String +lineClump :: Monad m => ParserT [Char] st m String lineClump = blanklines - <|> (many1 (notFollowedBy blankline >> anyLine) >>= return . unlines) + <|> (unlines <$> many1 (notFollowedBy blankline >> anyLine)) -- | Parse a string of characters between an open character -- and a close character, including text between balanced @@ -520,7 +521,7 @@ uriScheme :: Stream s m Char => ParserT s st m String uriScheme = oneOfStringsCI (Set.toList schemes) -- | Parses a URI. Returns pair of original and URI-escaped version. -uri :: Stream [Char] m Char => ParserT [Char] st m (String, String) +uri :: Monad m => ParserT [Char] st m (String, String) uri = try $ do scheme <- uriScheme char ':' @@ -625,7 +626,9 @@ withHorizDisplacement parser = do -- | Applies a parser and returns the raw string that was parsed, -- along with the value produced by the parser. -withRaw :: Stream [Char] m Char => ParsecT [Char] st m a -> ParsecT [Char] st m (a, [Char]) +withRaw :: Monad m + => ParsecT [Char] st m a + -> ParsecT [Char] st m (a, [Char]) withRaw parser = do pos1 <- getPosition inp <- getInput @@ -786,7 +789,7 @@ charRef = do c <- characterReference return $ Str [c] -lineBlockLine :: Stream [Char] m Char => ParserT [Char] st m String +lineBlockLine :: Monad m => ParserT [Char] st m String lineBlockLine = try $ do char '|' char ' ' @@ -796,11 +799,11 @@ lineBlockLine = try $ do continuations <- many (try $ char ' ' >> anyLine) return $ white ++ unwords (line : continuations) -blankLineBlockLine :: Stream [Char] m Char => ParserT [Char] st m Char +blankLineBlockLine :: Stream s m Char => ParserT s st m Char blankLineBlockLine = try (char '|' >> blankline) -- | Parses an RST-style line block and returns a list of strings. -lineBlockLines :: Stream [Char] m Char => ParserT [Char] st m [String] +lineBlockLines :: Monad m => ParserT [Char] st m [String] lineBlockLines = try $ do lines' <- many1 (lineBlockLine <|> ((:[]) <$> blankLineBlockLine)) skipMany1 $ blankline <|> blankLineBlockLine @@ -870,7 +873,7 @@ widthsFromIndices numColumns' indices = -- (which may be grid), then the rows, -- which may be grid, separated by blank lines, and -- ending with a footer (dashed line followed by blank line). -gridTableWith :: (Stream [Char] m Char, HasReaderOptions st, +gridTableWith :: (Monad m, HasReaderOptions st, Functor mf, Applicative mf, Monad mf) => ParserT [Char] st m (mf Blocks) -- ^ Block list parser -> Bool -- ^ Headerless table @@ -879,7 +882,7 @@ gridTableWith blocks headless = tableWith (gridTableHeader headless blocks) (gridTableRow blocks) (gridTableSep '-') gridTableFooter -gridTableWith' :: (Stream [Char] m Char, HasReaderOptions st, +gridTableWith' :: (Monad m, HasReaderOptions st, Functor mf, Applicative mf, Monad mf) => ParserT [Char] st m (mf Blocks) -- ^ Block list parser -> Bool -- ^ Headerless table @@ -919,7 +922,7 @@ gridTableSep :: Stream s m Char => Char -> ParserT s st m Char gridTableSep ch = try $ gridDashedLines ch >> return '\n' -- | Parse header for a grid table. -gridTableHeader :: (Stream [Char] m Char, Functor mf, Applicative mf, Monad mf) +gridTableHeader :: (Monad m, Functor mf, Applicative mf, Monad mf) => Bool -- ^ Headerless table -> ParserT [Char] st m (mf Blocks) -> ParserT [Char] st m (mf [Blocks], [Alignment], [Int]) @@ -952,7 +955,7 @@ gridTableRawLine indices = do return (gridTableSplitLine indices line) -- | Parse row of grid table. -gridTableRow :: (Stream [Char] m Char, Functor mf, Applicative mf, Monad mf) +gridTableRow :: (Monad m, Functor mf, Applicative mf, Monad mf) => ParserT [Char] st m (mf Blocks) -> [Int] -> ParserT [Char] st m (mf [Blocks]) @@ -981,8 +984,8 @@ gridTableFooter = blanklines --- -- | Removes the ParsecT layer from the monad transformer stack -readWithM :: (Monad m) - => ParserT [Char] st m a -- ^ parser +readWithM :: Monad m + => ParserT [Char] st m a -- ^ parser -> st -- ^ initial state -> String -- ^ input -> m (Either PandocError a) @@ -998,7 +1001,7 @@ readWith :: Parser [Char] st a readWith p t inp = runIdentity $ readWithM p t inp -- | Parse a string with @parser@ (for testing). -testStringWith :: (Show a) +testStringWith :: Show a => ParserT [Char] ParserState Identity a -> [Char] -> IO () diff --git a/src/Text/Pandoc/Readers.hs b/src/Text/Pandoc/Readers.hs index 0374d27d5..78a2038a4 100644 --- a/src/Text/Pandoc/Readers.hs +++ b/src/Text/Pandoc/Readers.hs @@ -58,6 +58,7 @@ module Text.Pandoc.Readers , readNative , readJSON , readTWiki + , readTikiWiki , readTxt2Tags , readEPUB , readMuse @@ -92,6 +93,7 @@ import Text.Pandoc.Readers.Org import Text.Pandoc.Readers.RST import Text.Pandoc.Readers.Textile import Text.Pandoc.Readers.TWiki +import Text.Pandoc.Readers.TikiWiki import Text.Pandoc.Readers.Txt2Tags import Text.Pandoc.Shared (mapLeft) import Text.Parsec.Error @@ -126,6 +128,7 @@ readers = [ ("native" , TextReader readNative) ,("latex" , TextReader readLaTeX) ,("haddock" , TextReader readHaddock) ,("twiki" , TextReader readTWiki) + ,("tikiwiki" , TextReader readTikiWiki) ,("docx" , ByteStringReader readDocx) ,("odt" , ByteStringReader readOdt) ,("t2t" , TextReader readTxt2Tags) diff --git a/src/Text/Pandoc/Readers/HTML.hs b/src/Text/Pandoc/Readers/HTML.hs index 734973e33..3a0d6eb14 100644 --- a/src/Text/Pandoc/Readers/HTML.hs +++ b/src/Text/Pandoc/Readers/HTML.hs @@ -188,6 +188,7 @@ block = do , pBody , pDiv , pPlain + , pFigure , pRawHtmlBlock ] trace (take 60 $ show $ B.toList res) @@ -553,6 +554,25 @@ pPara = do contents <- trimInlines <$> pInTags "p" inline return $ B.para contents +pFigure :: PandocMonad m => TagParser m Blocks +pFigure = do + TagOpen _ _ <- pSatisfy (matchTagOpen "figure" []) + skipMany pBlank + let pImg = pOptInTag "p" pImage <* skipMany pBlank + pCapt = option mempty $ pInTags "figcaption" inline <* skipMany pBlank + pImgCapt = do + img <- pImg + cap <- pCapt + return (img, cap) + pCaptImg = do + cap <- pCapt + img <- pImg + return (img, cap) + (imgMany, caption) <- pImgCapt <|> pCaptImg + TagClose _ <- pSatisfy (matchTagClose "figure") + let (Image attr _ (url, tit)):_ = B.toList imgMany + return $ B.para $ B.imageWith attr url ("fig:" ++ tit) caption + pCodeBlock :: PandocMonad m => TagParser m Blocks pCodeBlock = try $ do TagOpen _ attr' <- pSatisfy (matchTagOpen "pre" []) diff --git a/src/Text/Pandoc/Readers/LaTeX.hs b/src/Text/Pandoc/Readers/LaTeX.hs index a9bafb03b..5877bbbe1 100644 --- a/src/Text/Pandoc/Readers/LaTeX.hs +++ b/src/Text/Pandoc/Readers/LaTeX.hs @@ -37,13 +37,13 @@ module Text.Pandoc.Readers.LaTeX ( readLaTeX, applyMacros, rawLaTeXInline, rawLaTeXBlock, - macro, inlineCommand ) where import Control.Applicative (many, optional, (<|>)) import Control.Monad import Control.Monad.Except (throwError) +import Control.Monad.Trans (lift) import Data.Char (chr, isAlphaNum, isLetter, ord, isDigit) import Data.Default import Data.Text (Text) @@ -199,77 +199,45 @@ withVerbatimMode parser = do updateState $ \st -> st{ sVerbatimMode = False } return result -rawLaTeXBlock :: (PandocMonad m, HasMacros s, HasReaderOptions s) - => ParserT String s m String -rawLaTeXBlock = do - lookAhead (try (char '\\' >> letter)) +rawLaTeXParser :: (PandocMonad m, HasMacros s, HasReaderOptions s) + => LP m a -> ParserT String s m String +rawLaTeXParser parser = do inp <- getInput let toks = tokenize $ T.pack inp - let rawblock = do - (_, raw) <- try $ - withRaw (environment <|> macroDef <|> blockCommand) - return raw pstate <- getState let lstate = def{ sOptions = extractReaderOptions pstate } - res <- runParserT rawblock lstate "source" toks + res <- lift $ runParserT ((,) <$> try (snd <$> withRaw parser) <*> getState) + lstate "source" toks case res of Left _ -> mzero - Right raw -> takeP (T.length (untokenize raw)) - -macro :: (PandocMonad m, HasMacros s, HasReaderOptions s) - => ParserT String s m Blocks -macro = do - guardEnabled Ext_latex_macros - lookAhead (char '\\' *> oneOfStrings ["new", "renew", "provide"] *> - oneOfStrings ["command", "environment"]) - inp <- getInput - let toks = tokenize $ T.pack inp - let rawblock = do - (_, raw) <- withRaw $ try macroDef - st <- getState - return (raw, st) - pstate <- getState - let lstate = def{ sOptions = extractReaderOptions pstate - , sMacros = extractMacros pstate } - res <- runParserT rawblock lstate "source" toks - case res of - Left _ -> mzero Right (raw, st) -> do - updateState (updateMacros (const $ sMacros st)) - mempty <$ takeP (T.length (untokenize raw)) + updateState (updateMacros ((sMacros st) <>)) + takeP (T.length (untokenize raw)) applyMacros :: (PandocMonad m, HasMacros s, HasReaderOptions s) => String -> ParserT String s m String -applyMacros s = do - (guardEnabled Ext_latex_macros >> - do let retokenize = doMacros 0 *> (toksToString <$> getInput) +applyMacros s = (guardDisabled Ext_latex_macros >> return s) <|> + do let retokenize = doMacros 0 *> + (toksToString <$> many (satisfyTok (const True))) pstate <- getState let lstate = def{ sOptions = extractReaderOptions pstate , sMacros = extractMacros pstate } res <- runParserT retokenize lstate "math" (tokenize (T.pack s)) case res of Left e -> fail (show e) - Right s' -> return s') <|> return s + Right s' -> return s' -rawLaTeXInline :: (PandocMonad m, HasMacros s, HasReaderOptions s) +rawLaTeXBlock :: (PandocMonad m, HasMacros s, HasReaderOptions s) => ParserT String s m String +rawLaTeXBlock = do + lookAhead (try (char '\\' >> letter)) + rawLaTeXParser (environment <|> macroDef <|> blockCommand) + +rawLaTeXInline :: (PandocMonad m, HasMacros s, HasReaderOptions s) + => ParserT String s m String rawLaTeXInline = do lookAhead (try (char '\\' >> letter) <|> char '$') - inp <- getInput - let toks = tokenize $ T.pack inp - let rawinline = do - (_, raw) <- try $ withRaw (inlineEnvironment <|> inlineCommand') - st <- getState - return (raw, st) - pstate <- getState - let lstate = def{ sOptions = extractReaderOptions pstate - , sMacros = extractMacros pstate } - res <- runParserT rawinline lstate "source" toks - case res of - Left _ -> mzero - Right (raw, s) -> do - updateState $ updateMacros (const $ sMacros s) - takeP (T.length (untokenize raw)) + rawLaTeXParser (inlineEnvironment <|> inlineCommand') inlineCommand :: PandocMonad m => ParserT String ParserState m Inlines inlineCommand = do @@ -607,6 +575,16 @@ mkImage options src = do return $ imageWith attr (addExtension src defaultExt) "" alt _ -> return $ imageWith attr src "" alt +doxspace :: PandocMonad m => LP m Inlines +doxspace = do + (space <$ lookAhead (satisfyTok startsWithLetter)) <|> return mempty + where startsWithLetter (Tok _ Word t) = + case T.uncons t of + Just (c, _) | isLetter c -> True + _ -> False + startsWithLetter _ = False + + -- converts e.g. \SI{1}[\$]{} to "$ 1" or \SI{1}{\euro} to "1 €" dosiunitx :: PandocMonad m => LP m Inlines dosiunitx = do @@ -1339,13 +1317,28 @@ inlineCommands = M.fromList $ -- fontawesome , ("faCheck", lit "\10003") , ("faClose", lit "\10007") + -- xspace + , ("xspace", doxspace) + -- etoolbox + , ("ifstrequal", ifstrequal) ] +ifstrequal :: PandocMonad m => LP m Inlines +ifstrequal = do + str1 <- tok + str2 <- tok + ifequal <- braced + ifnotequal <- braced + if str1 == str2 + then getInput >>= setInput . (ifequal ++) + else getInput >>= setInput . (ifnotequal ++) + return mempty + coloredInline :: PandocMonad m => String -> LP m Inlines coloredInline stylename = do - skipopts - color <- braced - spanWith ("",[],[("style",stylename ++ ": " ++ toksToString color)]) <$> tok + skipopts + color <- braced + spanWith ("",[],[("style",stylename ++ ": " ++ toksToString color)]) <$> tok ttfamily :: PandocMonad m => LP m Inlines ttfamily = (code . stringify . toList) <$> tok @@ -1359,14 +1352,20 @@ rawInlineOr name' fallback = do getRawCommand :: PandocMonad m => Text -> LP m String getRawCommand txt = do - (_, rawargs) <- withRaw - ((if txt == "\\write" - then () <$ satisfyTok isWordTok -- digits - else return ()) *> - skipangles *> - skipopts *> - option "" (try (optional sp *> dimenarg)) *> - many braced) + (_, rawargs) <- withRaw $ + case txt of + "\\write" -> do + void $ satisfyTok isWordTok -- digits + void braced + "\\titleformat" -> do + void braced + skipopts + void $ count 4 braced + _ -> do + skipangles + skipopts + option "" (try (optional sp *> dimenarg)) + void $ many braced return $ T.unpack (txt <> untokenize rawargs) isBlockCommand :: Text -> Bool @@ -1394,6 +1393,7 @@ treatAsBlock = Set.fromList , "newpage" , "clearpage" , "pagebreak" + , "titleformat" ] isInlineCommand :: Text -> Bool @@ -1453,22 +1453,14 @@ begin_ :: PandocMonad m => Text -> LP m () begin_ t = (try $ do controlSeq "begin" spaces - symbol '{' - spaces - Tok _ Word txt <- satisfyTok isWordTok - spaces - symbol '}' + txt <- untokenize <$> braced guard (t == txt)) <?> ("\\begin{" ++ T.unpack t ++ "}") end_ :: PandocMonad m => Text -> LP m () end_ t = (try $ do controlSeq "end" spaces - symbol '{' - spaces - Tok _ Word txt <- satisfyTok isWordTok - spaces - symbol '}' + txt <- untokenize <$> braced guard $ t == txt) <?> ("\\end{" ++ T.unpack t ++ "}") preamble :: PandocMonad m => LP m Blocks @@ -1523,17 +1515,18 @@ authors = try $ do macroDef :: PandocMonad m => LP m Blocks macroDef = do - guardEnabled Ext_latex_macros mempty <$ ((commandDef <|> environmentDef) <* doMacros 0) where commandDef = do (name, macro') <- newcommand - updateState $ \s -> s{ sMacros = M.insert name macro' (sMacros s) } + guardDisabled Ext_latex_macros <|> + updateState (\s -> s{ sMacros = M.insert name macro' (sMacros s) }) environmentDef = do (name, macro1, macro2) <- newenvironment - updateState $ \s -> s{ sMacros = - M.insert name macro1 (sMacros s) } - updateState $ \s -> s{ sMacros = - M.insert ("end" <> name) macro2 (sMacros s) } + guardDisabled Ext_latex_macros <|> + do updateState $ \s -> s{ sMacros = + M.insert name macro1 (sMacros s) } + updateState $ \s -> s{ sMacros = + M.insert ("end" <> name) macro2 (sMacros s) } -- @\newenvironment{envname}[n-args][default]{begin}{end}@ -- is equivalent to -- @\newcommand{\envname}[n-args][default]{begin}@ @@ -1568,11 +1561,8 @@ newenvironment = do controlSeq "renewenvironment" <|> controlSeq "provideenvironment" optional $ symbol '*' - symbol '{' - spaces - Tok _ Word name <- satisfyTok isWordTok spaces - symbol '}' + name <- untokenize <$> braced spaces numargs <- option 0 $ try bracketedNum spaces @@ -1640,9 +1630,25 @@ blockCommand = try $ do star <- option "" ("*" <$ symbol '*' <* optional sp) let name' = name <> star let names = ordNub [name', name] - let raw = do - guard $ isBlockCommand name || not (isInlineCommand name) + let rawDefiniteBlock = do + guard $ isBlockCommand name rawBlock "latex" <$> getRawCommand (txt <> star) + -- heuristic: if it could be either block or inline, we + -- treat it if block if we have a sequence of block + -- commands followed by a newline. But we stop if we + -- hit a \startXXX, since this might start a raw ConTeXt + -- environment (this is important because this parser is + -- used by the Markdown reader). + let startCommand = try $ do + Tok _ (CtrlSeq n) _ <- anyControlSeq + guard $ "start" `T.isPrefixOf` n + let rawMaybeBlock = try $ do + guard $ not $ isInlineCommand name + curr <- rawBlock "latex" <$> getRawCommand (txt <> star) + rest <- many $ notFollowedBy startCommand *> blockCommand + lookAhead $ blankline <|> startCommand + return $ curr <> mconcat rest + let raw = rawDefiniteBlock <|> rawMaybeBlock lookupListDefault raw names blockCommands closing :: PandocMonad m => LP m Blocks @@ -1879,16 +1885,12 @@ addImageCaption = walkM go go x = return x coloredBlock :: PandocMonad m => String -> LP m Blocks -coloredBlock stylename = do - skipopts +coloredBlock stylename = try $ do + skipopts color <- braced + notFollowedBy (grouped inline) let constructor = divWith ("",[],[("style",stylename ++ ": " ++ toksToString color)]) - inlineContents <|> constructor <$> blockContents - where inlineContents = do - ils <- grouped inline - rest <- inlines - return (para (ils <> rest)) - blockContents = grouped block + constructor <$> grouped block graphicsPath :: PandocMonad m => LP m Blocks graphicsPath = do diff --git a/src/Text/Pandoc/Readers/Markdown.hs b/src/Text/Pandoc/Readers/Markdown.hs index ab6a32b78..d7e59c7fd 100644 --- a/src/Text/Pandoc/Readers/Markdown.hs +++ b/src/Text/Pandoc/Readers/Markdown.hs @@ -61,8 +61,8 @@ import Text.Pandoc.Options import Text.Pandoc.Parsing hiding (tableWith) import Text.Pandoc.Readers.HTML (htmlInBalanced, htmlTag, isBlockTag, isCommentTag, isInlineTag, isTextTag) -import Text.Pandoc.Readers.LaTeX (rawLaTeXBlock, rawLaTeXInline, applyMacros, - macro) +import Text.Pandoc.Readers.LaTeX (rawLaTeXBlock, + rawLaTeXInline, applyMacros) import Text.Pandoc.Shared import qualified Text.Pandoc.UTF8 as UTF8 import Text.Pandoc.XML (fromEntities) @@ -507,7 +507,6 @@ block = do , htmlBlock , table , codeBlockIndented - , latexMacro , rawTeXBlock , lineBlock , blockQuote @@ -1096,13 +1095,6 @@ rawVerbatimBlock = htmlInBalanced isVerbTag isVerbTag (TagOpen "script" _) = True isVerbTag _ = False -latexMacro :: PandocMonad m => MarkdownParser m (F Blocks) -latexMacro = try $ do - guardEnabled Ext_latex_macros - skipNonindentSpaces - res <- macro - return $ return res - rawTeXBlock :: PandocMonad m => MarkdownParser m (F Blocks) rawTeXBlock = do guardEnabled Ext_raw_tex diff --git a/src/Text/Pandoc/Readers/Muse.hs b/src/Text/Pandoc/Readers/Muse.hs index 1ae73c148..9d967a9de 100644 --- a/src/Text/Pandoc/Readers/Muse.hs +++ b/src/Text/Pandoc/Readers/Muse.hs @@ -115,11 +115,10 @@ htmlElement :: PandocMonad m => String -> MuseParser m (Attr, String) htmlElement tag = try $ do (TagOpen _ attr, _) <- htmlTag (~== TagOpen tag []) content <- manyTill anyChar (endtag <|> endofinput) - return (htmlAttrToPandoc attr, trim content) + return (htmlAttrToPandoc attr, content) where endtag = void $ htmlTag (~== TagClose tag) endofinput = lookAhead $ try $ skipMany blankline >> skipSpaces >> eof - trim = dropWhile (=='\n') . reverse . dropWhile (=='\n') . reverse htmlAttrToPandoc :: [Attribute String] -> Attr htmlAttrToPandoc attrs = (ident, classes, keyvals) @@ -132,7 +131,7 @@ parseHtmlContentWithAttrs :: PandocMonad m => String -> MuseParser m a -> MuseParser m (Attr, [a]) parseHtmlContentWithAttrs tag parser = do (attr, content) <- htmlElement tag - parsedContent <- try $ parseContent content + parsedContent <- try $ parseContent (content ++ "\n") return (attr, parsedContent) where parseContent = parseFromString $ nested $ manyTill parser endOfContent diff --git a/src/Text/Pandoc/Readers/TikiWiki.hs b/src/Text/Pandoc/Readers/TikiWiki.hs new file mode 100644 index 000000000..4acbaa30b --- /dev/null +++ b/src/Text/Pandoc/Readers/TikiWiki.hs @@ -0,0 +1,658 @@ +{-# LANGUAGE RelaxedPolyRec, FlexibleInstances, TypeSynonymInstances #-} +{-# LANGUAGE OverloadedStrings #-} +{-# LANGUAGE FlexibleContexts #-} + +{- | + Module : Text.Pandoc.Readers.TikiWiki + Copyright : Copyright (C) 2017 Robin Lee Powell + License : GPLv2 + + Maintainer : Robin Lee Powell <robinleepowell@gmail.com> + Stability : alpha + Portability : portable + +Conversion of TikiWiki text to 'Pandoc' document. +-} + +module Text.Pandoc.Readers.TikiWiki ( readTikiWiki + ) where + +import Control.Monad +import Control.Monad.Except (throwError) +import Text.Pandoc.Definition +import qualified Text.Pandoc.Builder as B +import Text.Pandoc.Options +import Text.Pandoc.Parsing hiding (enclosed, nested) +import Text.Printf (printf) +import Text.Pandoc.XML (fromEntities) +import Text.Pandoc.Class (PandocMonad(..), CommonState(..)) +import Text.Pandoc.Shared (crFilter) +import Text.Pandoc.Logging (Verbosity(..)) +import Data.Maybe (fromMaybe) +import Data.List (intercalate) +import qualified Data.Foldable as F +import Data.Text (Text) +import qualified Data.Text as T + +-- | Read TikiWiki from an input string and return a Pandoc document. +readTikiWiki :: PandocMonad m + => ReaderOptions + -> Text + -> m Pandoc +readTikiWiki opts s = do + res <- readWithM parseTikiWiki def{ stateOptions = opts } + (T.unpack (crFilter s) ++ "\n\n") + case res of + Left e -> throwError e + Right d -> return d + +type TikiWikiParser = ParserT [Char] ParserState + +-- +-- utility functions +-- + +tryMsg :: PandocMonad m => String -> TikiWikiParser m a -> TikiWikiParser m a +tryMsg msg p = try p <?> msg + +skip :: PandocMonad m => TikiWikiParser m a -> TikiWikiParser m () +skip parser = parser >> return () + +nested :: PandocMonad m => TikiWikiParser m a -> TikiWikiParser m a +nested p = do + nestlevel <- stateMaxNestingLevel <$> getState + guard $ nestlevel > 0 + updateState $ \st -> st{ stateMaxNestingLevel = stateMaxNestingLevel st - 1 } + res <- p + updateState $ \st -> st{ stateMaxNestingLevel = nestlevel } + return res + +-- +-- main parser +-- + +parseTikiWiki :: PandocMonad m => TikiWikiParser m Pandoc +parseTikiWiki = do + bs <- mconcat <$> many block + spaces + eof + return $ B.doc bs + +block :: PandocMonad m => TikiWikiParser m B.Blocks +block = do + verbosity <- getsCommonState stVerbosity + pos <- getPosition + res <- mempty <$ skipMany1 blankline + <|> blockElements + <|> para + skipMany blankline + when (verbosity >= INFO) $ do + trace (printf "line %d: %s" (sourceLine pos) (take 60 $ show $ B.toList res)) + return res + +blockElements :: PandocMonad m => TikiWikiParser m B.Blocks +blockElements = choice [ table + , hr + , header + , mixedList + , definitionList + , codeMacro + ] + +-- top +-- ---- +-- bottom +-- +-- ---- +-- +hr :: PandocMonad m => TikiWikiParser m B.Blocks +hr = try $ do + string "----" + many (char '-') + newline + return $ B.horizontalRule + +-- ! header +-- +-- !! header level two +-- +-- !!! header level 3 +-- +header :: PandocMonad m => TikiWikiParser m B.Blocks +header = tryMsg "header" $ do + level <- many1 (char '!') >>= return . length + guard $ level <= 6 + skipSpaces + content <- B.trimInlines . mconcat <$> manyTill inline newline + attr <- registerHeader nullAttr content + return $ B.headerWith attr level $ content + +tableRow :: PandocMonad m => TikiWikiParser m [B.Blocks] +tableRow = try $ do +-- row <- sepBy1 (many1Till inline $ oneOf "\n|") (try $ string "|" <* notFollowedBy (oneOf "|\n")) +-- return $ map (B.plain . mconcat) row + row <- sepBy1 ((many1 $ noneOf "\n|") >>= parseColumn) (try $ string "|" <* notFollowedBy (oneOf "|\n")) + return $ map B.plain row + where + parseColumn x = do + parsed <- parseFromString (many1 inline) x + return $ mconcat parsed + + + +-- Tables: +-- +-- ||foo|| +-- +-- ||row1-column1|row1-column2||row2-column1|row2-column2|| +-- +-- ||row1-column1|row1-column2 +-- row2-column1|row2-column2|| +-- +-- ||row1-column1|row1-column2 +-- row2-column1|row2-column2||row3-column1|row3-column2|| +-- +-- || Orange | Apple | more +-- Bread | Pie | more +-- Butter | Ice cream | and more || +-- +table :: PandocMonad m => TikiWikiParser m B.Blocks +table = try $ do + string "||" + rows <- sepBy1 tableRow (try $ string "\n" <|> (string "||" <* notFollowedBy (string "\n"))) + string "||" + newline + -- return $ B.simpleTable (headers rows) $ trace ("rows: " ++ (show rows)) rows + return $ B.simpleTable (headers rows) $ rows + where + -- The headers are as many empty srings as the number of columns + -- in the first row + headers rows = map (B.plain . B.str) $ take (length $ rows !! 0) $ repeat "" + +para :: PandocMonad m => TikiWikiParser m B.Blocks +para = many1Till inline endOfParaElement >>= return . result . mconcat + where + endOfParaElement = lookAhead $ endOfInput <|> endOfPara <|> newBlockElement + endOfInput = try $ skipMany blankline >> skipSpaces >> eof + endOfPara = try $ blankline >> skipMany1 blankline + newBlockElement = try $ blankline >> skip blockElements + result content = if F.all (==Space) content + then mempty + else B.para $ B.trimInlines content + +-- ;item 1: definition 1 +-- ;item 2: definition 2-1 +-- + definition 2-2 +-- ;item ''3'': definition ''3'' +-- +definitionList :: PandocMonad m => TikiWikiParser m B.Blocks +definitionList = tryMsg "definitionList" $ do + elements <- many1 $ parseDefinitionListItem + return $ B.definitionList elements + where + parseDefinitionListItem :: PandocMonad m => TikiWikiParser m (B.Inlines, [B.Blocks]) + parseDefinitionListItem = do + skipSpaces >> char ';' <* skipSpaces + term <- many1Till inline $ char ':' <* skipSpaces + line <- listItemLine 1 + return $ (mconcat term, [B.plain line]) + +data ListType = None | Numbered | Bullet deriving (Ord, Eq, Show) + +data ListNesting = LN { lntype :: ListType, lnnest :: Int } deriving (Ord, Eq, Show) + +-- The first argument is a stack (most recent == head) of our list +-- nesting status; the list type and the nesting level; if we're in +-- a number list in a bullet list it'd be +-- [LN Numbered 2, LN Bullet 1] +-- +-- Mixed list example: +-- +-- # one +-- # two +-- ** two point one +-- ** two point two +-- # three +-- # four +-- +mixedList :: PandocMonad m => TikiWikiParser m B.Blocks +mixedList = try $ do + items <- try $ many1 listItem + return $ mconcat $ fixListNesting $ spanFoldUpList (LN None 0) items + +-- See the "Handling Lists" section of DESIGN-CODE for why this +-- function exists. It's to post-process the lists and do some +-- mappends. +-- +-- We need to walk the tree two items at a time, so we can see what +-- we're going to join *to* before we get there. +-- +-- Because of that, it seemed easier to do it by hand than to try to +-- figre out a fold or something. +fixListNesting :: [B.Blocks] -> [B.Blocks] +fixListNesting [] = [] +fixListNesting (first:[]) = [recurseOnList first] +-- fixListNesting nestall | trace ("\n\nfixListNesting: " ++ (show nestall)) False = undefined +-- fixListNesting nestall@(first:second:rest) = +fixListNesting (first:second:rest) = + let secondBlock = head $ B.toList second in + case secondBlock of + BulletList _ -> fixListNesting $ [(mappend (recurseOnList first) (recurseOnList second))] ++ rest + OrderedList _ _ -> fixListNesting $ [(mappend (recurseOnList first) (recurseOnList second))] ++ rest + _ -> [recurseOnList first] ++ fixListNesting (second:rest) + +-- This function walks the Block structure for fixListNesting, +-- because it's a bit complicated, what with converting to and from +-- lists and so on. +recurseOnList :: B.Blocks -> B.Blocks +-- recurseOnList item | trace ("rOL: " ++ (show $ length $ B.toList item) ++ ", " ++ (show $ B.toList item)) False = undefined +recurseOnList items + | (length $ B.toList items) == 1 = + let itemBlock = head $ B.toList items in + case itemBlock of + BulletList listItems -> B.bulletList $ fixListNesting $ map B.fromList listItems + OrderedList _ listItems -> B.orderedList $ fixListNesting $ map B.fromList listItems + _ -> items + + -- The otherwise works because we constructed the blocks, and we + -- know for a fact that no mappends have been run on them; each + -- Blocks consists of exactly one Block. + -- + -- Anything that's not like that has already been processed by + -- fixListNesting; don't bother to process it again. + | otherwise = items + + +-- Turn the list if list items into a tree by breaking off the first +-- item, splitting the remainder of the list into items that are in +-- the tree of the first item and those that aren't, wrapping the +-- tree of the first item in its list time, and recursing on both +-- sections. +spanFoldUpList :: ListNesting -> [(ListNesting, B.Blocks)] -> [B.Blocks] +spanFoldUpList _ [] = [] +spanFoldUpList ln (first:[]) = + listWrap ln (fst first) [snd first] +spanFoldUpList ln (first:rest) = + let (span1, span2) = span (splitListNesting (fst first)) rest + newTree1 = listWrap ln (fst first) $ [snd first] ++ spanFoldUpList (fst first) span1 + newTree2 = spanFoldUpList ln span2 + in + newTree1 ++ newTree2 + +-- Decide if the second item should be in the tree of the first +-- item, which is true if the second item is at a deeper nesting +-- level and of the same type. +splitListNesting :: ListNesting -> (ListNesting, B.Blocks) -> Bool +splitListNesting ln1 (ln2, _) = + if (lnnest ln1) < (lnnest ln2) then + True + else + if ln1 == ln2 then + True + else + False + +-- If we've moved to a deeper nesting level, wrap the new level in +-- the appropriate type of list. +listWrap :: ListNesting -> ListNesting -> [B.Blocks] -> [B.Blocks] +listWrap upperLN curLN retTree = + if upperLN == curLN then + retTree + else + case lntype curLN of + None -> [] + Bullet -> [B.bulletList retTree] + Numbered -> [B.orderedList retTree] + +listItem :: PandocMonad m => TikiWikiParser m (ListNesting, B.Blocks) +listItem = choice [ + bulletItem + , numberedItem + ] + + +-- * Start each line +-- * with an asterisk (*). +-- ** More asterisks gives deeper +-- *** and deeper levels. +-- +bulletItem :: PandocMonad m => TikiWikiParser m (ListNesting, B.Blocks) +bulletItem = try $ do + prefix <- many1 $ char '*' + many1 $ char ' ' + content <- listItemLine (length prefix) + return $ (LN Bullet (length prefix), B.plain content) + +-- # Start each line +-- # with a number (1.). +-- ## More number signs gives deeper +-- ### and deeper +-- +numberedItem :: PandocMonad m => TikiWikiParser m (ListNesting, B.Blocks) +numberedItem = try $ do + prefix <- many1 $ char '#' + many1 $ char ' ' + content <- listItemLine (length prefix) + return $ (LN Numbered (length prefix), B.plain content) + +listItemLine :: PandocMonad m => Int -> TikiWikiParser m B.Inlines +listItemLine nest = lineContent >>= parseContent >>= return + where + lineContent = do + content <- anyLine + continuation <- optionMaybe listContinuation + return $ filterSpaces content ++ "\n" ++ (maybe "" id continuation) + filterSpaces = reverse . dropWhile (== ' ') . reverse + listContinuation = string (take nest (repeat '+')) >> lineContent + parseContent x = do + parsed <- parseFromString (many1 inline) x + return $ mconcat parsed + +-- Turn the CODE macro attributes into Pandoc code block attributes. +mungeAttrs :: [(String, String)] -> (String, [String], [(String, String)]) +mungeAttrs rawAttrs = ("", classes, rawAttrs) + where + -- "colors" is TikiWiki CODE macro for "name of language to do + -- highlighting for"; turn the value into a class + color = fromMaybe "" $ lookup "colors" rawAttrs + -- ln = 1 means line numbering. It's also the default. So we + -- emit numberLines as a class unless ln = 0 + lnRaw = fromMaybe "1" $ lookup "ln" rawAttrs + ln = if lnRaw == "0" then + "" + else + "numberLines" + classes = filter (/= "") [color, ln] + +codeMacro :: PandocMonad m => TikiWikiParser m B.Blocks +codeMacro = try $ do + string "{CODE(" + rawAttrs <- macroAttrs + string ")}" + body <- manyTill anyChar (try (string "{CODE}")) + newline + if length rawAttrs > 0 + then + return $ B.codeBlockWith (mungeAttrs rawAttrs) body + else + return $ B.codeBlock body + + +-- +-- inline parsers +-- + +inline :: PandocMonad m => TikiWikiParser m B.Inlines +inline = choice [ whitespace + , noparse + , strong + , emph + , nbsp + , image + , htmlComment + , strikeout + , code + , wikiLink + , notExternalLink + , externalLink + , superTag + , superMacro + , subTag + , subMacro + , escapedChar + , colored + , centered + , underlined + , boxed + , breakChars + , str + , symbol + ] <?> "inline" + +whitespace :: PandocMonad m => TikiWikiParser m B.Inlines +whitespace = (lb <|> regsp) >>= return + where lb = try $ skipMany spaceChar >> linebreak >> return B.space + regsp = try $ skipMany1 spaceChar >> return B.space + +-- UNSUPPORTED, as there doesn't seem to be any facility in calibre +-- for this +nbsp :: PandocMonad m => TikiWikiParser m B.Inlines +nbsp = try $ do + string "~hs~" + return $ B.str " NOT SUPPORTED BEGIN: ~hs~ (non-breaking space) :END " + +-- UNSUPPORTED, as the desired behaviour (that the data be +-- *retained* and stored as a comment) doesn't exist in calibre, and +-- silently throwing data out seemed bad. +htmlComment :: PandocMonad m => TikiWikiParser m B.Inlines +htmlComment = try $ do + string "~hc~" + inner <- many1 $ noneOf "~" + string "~/hc~" + return $ B.str $ " NOT SUPPORTED: ~hc~ (html comment opener) BEGIN: " ++ inner ++ " ~/hc~ :END " + +linebreak :: PandocMonad m => TikiWikiParser m B.Inlines +linebreak = newline >> notFollowedBy newline >> (lastNewline <|> innerNewline) + where lastNewline = eof >> return mempty + innerNewline = return B.space + +between :: (Monoid c, PandocMonad m, Show b) => TikiWikiParser m a -> TikiWikiParser m b -> (TikiWikiParser m b -> TikiWikiParser m c) -> TikiWikiParser m c +between start end p = + mconcat <$> try (start >> notFollowedBy whitespace >> many1Till (p end) end) + +enclosed :: (Monoid b, PandocMonad m, Show a) => TikiWikiParser m a -> (TikiWikiParser m a -> TikiWikiParser m b) -> TikiWikiParser m b +enclosed sep p = between sep (try $ sep <* endMarker) p + where + endMarker = lookAhead $ skip endSpace <|> skip (oneOf ".,!?:)|'_") <|> eof + endSpace = (spaceChar <|> newline) >> return B.space + + +nestedInlines :: (Show a, PandocMonad m) => TikiWikiParser m a -> TikiWikiParser m B.Inlines +nestedInlines end = innerSpace <|> nestedInline + where + innerSpace = try $ whitespace <* (notFollowedBy end) + nestedInline = notFollowedBy whitespace >> nested inline + +-- {img attId="39" imalign="right" link="http://info.tikiwiki.org" alt="Panama Hat"} +-- +-- {img attId="37", thumb="mouseover", styleimage="border", desc="150"} +-- +-- {img src="img/wiki_up/393px-Pears.jpg" thumb="y" imalign="center" stylebox="border" button="y" desc="Pretty pears" max="200" rel="box"} +-- +image :: PandocMonad m => TikiWikiParser m B.Inlines +image = try $ do + string "{img " + rawAttrs <- sepEndBy1 imageAttr spaces + string "}" + let src = fromMaybe "" $ lookup "src" rawAttrs + let title = fromMaybe src $ lookup "desc" rawAttrs + let alt = fromMaybe title $ lookup "alt" rawAttrs + let classes = map fst $ filter (\(_,b) -> b == "" || b == "y") rawAttrs + if length src > 0 + then + return $ B.imageWith ("", classes, rawAttrs) src title (B.str alt) + else + return $ B.str $ " NOT SUPPORTED: image without src attribute BEGIN: {img " ++ (printAttrs rawAttrs) ++ "} :END " + where + printAttrs attrs = intercalate " " $ map (\(a, b) -> a ++ "=\"" ++ b ++ "\"") attrs + +imageAttr :: PandocMonad m => TikiWikiParser m (String, String) +imageAttr = try $ do + key <- many1 (noneOf "=} \t\n") + char '=' + optional $ char '"' + value <- many1 (noneOf "}\"\n") + optional $ char '"' + optional $ char ',' + return (key, value) + + +-- __strong__ +strong :: PandocMonad m => TikiWikiParser m B.Inlines +strong = try $ enclosed (string "__") nestedInlines >>= return . B.strong + +-- ''emph'' +emph :: PandocMonad m => TikiWikiParser m B.Inlines +emph = try $ enclosed (string "''") nestedInlines >>= return . B.emph + +-- ~246~ +escapedChar :: PandocMonad m => TikiWikiParser m B.Inlines +escapedChar = try $ do + string "~" + inner <- many1 $ oneOf "0123456789" + string "~" + return $ B.str $ [(toEnum ((read inner) :: Int)) :: Char] + +-- UNSUPPORTED, as there doesn't seem to be any facility in calibre +-- for this +centered :: PandocMonad m => TikiWikiParser m B.Inlines +centered = try $ do + string "::" + inner <- many1 $ noneOf ":\n" + string "::" + return $ B.str $ " NOT SUPPORTED: :: (centered) BEGIN: ::" ++ inner ++ ":: :END " + +-- UNSUPPORTED, as there doesn't seem to be any facility in calibre +-- for this +colored :: PandocMonad m => TikiWikiParser m B.Inlines +colored = try $ do + string "~~" + inner <- many1 $ noneOf "~\n" + string "~~" + return $ B.str $ " NOT SUPPORTED: ~~ (colored) BEGIN: ~~" ++ inner ++ "~~ :END " + +-- UNSUPPORTED, as there doesn't seem to be any facility in calibre +-- for this +underlined :: PandocMonad m => TikiWikiParser m B.Inlines +underlined = try $ do + string "===" + inner <- many1 $ noneOf "=\n" + string "===" + return $ B.str $ " NOT SUPPORTED: ==== (underlined) BEGIN: ===" ++ inner ++ "=== :END " + +-- UNSUPPORTED, as there doesn't seem to be any facility in calibre +-- for this +boxed :: PandocMonad m => TikiWikiParser m B.Inlines +boxed = try $ do + string "^" + inner <- many1 $ noneOf "^\n" + string "^" + return $ B.str $ " NOT SUPPORTED: ^ (boxed) BEGIN: ^" ++ inner ++ "^ :END " + +-- --text-- +strikeout :: PandocMonad m => TikiWikiParser m B.Inlines +strikeout = try $ enclosed (string "--") nestedInlines >>= return . B.strikeout + +nestedString :: (Show a, PandocMonad m) => TikiWikiParser m a -> TikiWikiParser m String +nestedString end = innerSpace <|> (count 1 nonspaceChar) + where + innerSpace = try $ many1 spaceChar <* notFollowedBy end + +breakChars :: PandocMonad m => TikiWikiParser m B.Inlines +breakChars = try $ string "%%%" >> return B.linebreak + +-- superscript: foo{TAG(tag=>sup)}super{TAG}foo / bar{SUP()}super2{SUP}bar +superTag :: PandocMonad m => TikiWikiParser m B.Inlines +superTag = try $ between (string "{TAG(tag=>sup)}") (string "{TAG}") nestedString >>= return . B.superscript . B.text . fromEntities + +superMacro :: PandocMonad m => TikiWikiParser m B.Inlines +superMacro = try $ do + string "{SUP(" + manyTill anyChar (string ")}") + body <- manyTill anyChar (string "{SUP}") + return $ B.superscript $ B.text body + +-- subscript: baz{TAG(tag=>sub)}sub{TAG}qux / qux{SUB()}sub2{SUB}qux +subTag :: PandocMonad m => TikiWikiParser m B.Inlines +subTag = try $ between (string "{TAG(tag=>sub)}") (string "{TAG}") nestedString >>= return . B.subscript . B.text . fromEntities + +subMacro :: PandocMonad m => TikiWikiParser m B.Inlines +subMacro = try $ do + string "{SUB(" + manyTill anyChar (string ")}") + body <- manyTill anyChar (string "{SUB}") + return $ B.subscript $ B.text body + +-- -+text+- +code :: PandocMonad m => TikiWikiParser m B.Inlines +code = try $ between (string "-+") (string "+-") nestedString >>= return . B.code . fromEntities + +macroAttr :: PandocMonad m => TikiWikiParser m (String, String) +macroAttr = try $ do + key <- many1 (noneOf "=)") + char '=' + optional $ char '"' + value <- many1 (noneOf " )\"") + optional $ char '"' + return (key, value) + +macroAttrs :: PandocMonad m => TikiWikiParser m [(String, String)] +macroAttrs = try $ do + attrs <- sepEndBy macroAttr spaces + return attrs + +-- ~np~ __not bold__ ~/np~ +noparse :: PandocMonad m => TikiWikiParser m B.Inlines +noparse = try $ do + string "~np~" + body <- manyTill anyChar (string "~/np~") + return $ B.str body + +str :: PandocMonad m => TikiWikiParser m B.Inlines +str = (many1 alphaNum <|> count 1 characterReference) >>= return . B.str + +symbol :: PandocMonad m => TikiWikiParser m B.Inlines +symbol = count 1 nonspaceChar >>= return . B.str + +-- [[not a link] +notExternalLink :: PandocMonad m => TikiWikiParser m B.Inlines +notExternalLink = try $ do + start <- string "[[" + body <- many (noneOf "\n[]") + end <- string "]" + return $ B.text (start ++ body ++ end) + +-- [http://www.somesite.org url|Some Site title] +-- ((internal link)) +-- +-- The ((...)) wiki links and [...] external links are handled +-- exactly the same; this abstracts that out +makeLink :: PandocMonad m => String -> String -> String -> TikiWikiParser m B.Inlines +makeLink start middle end = try $ do + st <- getState + guard $ stateAllowLinks st + setState $ st{ stateAllowLinks = False } + (url, title, anchor) <- wikiLinkText start middle end + parsedTitle <- parseFromString (many1 inline) title + setState $ st{ stateAllowLinks = True } + return $ B.link (url++anchor) "" $ mconcat $ parsedTitle + +wikiLinkText :: PandocMonad m => String -> String -> String -> TikiWikiParser m (String, String, String) +wikiLinkText start middle end = do + string start + url <- many1 (noneOf $ middle ++ "\n") + seg1 <- option url linkContent + seg2 <- option "" linkContent + string end + if seg2 /= "" + then + return (url, seg2, seg1) + else + return (url, seg1, "") + where + linkContent = do + (char '|') + mystr <- many (noneOf middle) + return $ mystr + +externalLink :: PandocMonad m => TikiWikiParser m B.Inlines +externalLink = makeLink "[" "]|" "]" + +-- NB: this wiki linking is unlikely to work for anyone besides me +-- (rlpowell); it happens to work for me because my Hakyll code has +-- post-processing that treats pandoc .md titles as valid link +-- targets, so something like +-- [see also this other post](My Other Page) is perfectly valid. +wikiLink :: PandocMonad m => TikiWikiParser m B.Inlines +wikiLink = makeLink "((" ")|" "))" + diff --git a/src/Text/Pandoc/Templates.hs b/src/Text/Pandoc/Templates.hs index 1a26b7168..516cc4b2f 100644 --- a/src/Text/Pandoc/Templates.hs +++ b/src/Text/Pandoc/Templates.hs @@ -38,28 +38,28 @@ module Text.Pandoc.Templates ( module Text.DocTemplates , getDefaultTemplate ) where -import qualified Control.Exception as E (IOException, try) import Control.Monad.Except (throwError) import Data.Aeson (ToJSON (..)) import qualified Data.Text as T import System.FilePath ((<.>), (</>)) import Text.DocTemplates (Template, TemplateTarget, applyTemplate, compileTemplate, renderTemplate, varListToJSON) -import Text.Pandoc.Class (PandocMonad) +import Text.Pandoc.Class (PandocMonad(readDataFile)) import Text.Pandoc.Error -import Text.Pandoc.Shared (readDataFileUTF8) +import qualified Text.Pandoc.UTF8 as UTF8 -- | Get default template for the specified writer. -getDefaultTemplate :: (Maybe FilePath) -- ^ User data directory to search first +getDefaultTemplate :: PandocMonad m + => (Maybe FilePath) -- ^ User data directory to search 1st -> String -- ^ Name of writer - -> IO (Either E.IOException String) + -> m String getDefaultTemplate user writer = do let format = takeWhile (`notElem` ("+-" :: String)) writer -- strip off extensions case format of - "native" -> return $ Right "" - "json" -> return $ Right "" - "docx" -> return $ Right "" - "fb2" -> return $ Right "" + "native" -> return "" + "json" -> return "" + "docx" -> return "" + "fb2" -> return "" "odt" -> getDefaultTemplate user "opendocument" "html" -> getDefaultTemplate user "html5" "docbook" -> getDefaultTemplate user "docbook5" @@ -70,7 +70,7 @@ getDefaultTemplate user writer = do "markdown_mmd" -> getDefaultTemplate user "markdown" "markdown_phpextra" -> getDefaultTemplate user "markdown" _ -> let fname = "templates" </> "default" <.> format - in E.try $ readDataFileUTF8 user fname + in UTF8.toString <$> readDataFile user fname -- | Like 'applyTemplate', but runs in PandocMonad and -- raises an error if compilation fails. diff --git a/src/Text/Pandoc/Writers/Shared.hs b/src/Text/Pandoc/Writers/Shared.hs index 2047285eb..3f612f40a 100644 --- a/src/Text/Pandoc/Writers/Shared.hs +++ b/src/Text/Pandoc/Writers/Shared.hs @@ -84,7 +84,7 @@ metaToJSON' blockWriter inlineWriter (Meta metamap) = do renderedMap <- Traversable.mapM (metaValueToJSON blockWriter inlineWriter) metamap - return $ M.foldWithKey defField (Object H.empty) renderedMap + return $ M.foldrWithKey defField (Object H.empty) renderedMap -- | Add variables to JSON object, replacing any existing values. -- Also include @meta-json@, a field containing a string representation diff --git a/test/Tests/Old.hs b/test/Tests/Old.hs index b807719bc..c4dd4d322 100644 --- a/test/Tests/Old.hs +++ b/test/Tests/Old.hs @@ -137,6 +137,9 @@ tests = [ testGroup "markdown" , testGroup "twiki" [ test "reader" ["-r", "twiki", "-w", "native", "-s"] "twiki-reader.twiki" "twiki-reader.native" ] + , testGroup "tikiwiki" + [ test "reader" ["-r", "tikiwiki", "-w", "native", "-s"] + "tikiwiki-reader.tikiwiki" "tikiwiki-reader.native" ] , testGroup "other writers" $ map (\f -> testGroup f $ writerTests f) [ "opendocument" , "context" , "texinfo", "icml", "tei" , "man" , "plain" , "rtf", "org", "asciidoc", "zimwiki" diff --git a/test/Tests/Readers/Muse.hs b/test/Tests/Readers/Muse.hs index fe0a59992..a069bb972 100644 --- a/test/Tests/Readers/Muse.hs +++ b/test/Tests/Readers/Muse.hs @@ -314,5 +314,33 @@ tests = , para "* Bar" ] ] + , "List inside a tag" =: + T.unlines + [ "<quote>" + , " 1. First" + , "" + , " 2. Second" + , "" + , " 3. Third" + , "</quote>" + ] =?> + blockQuote (orderedListWith (1, Decimal, Period) [ para "First" + , para "Second" + , para "Third" + ]) + -- Amusewiki requires block tags to be on separate lines, + -- but Emacs Muse allows them to be on the same line as contents. + , "List inside an inline tag" =: + T.unlines + [ "<quote> 1. First" + , "" + , " 2. Second" + , "" + , " 3. Third</quote>" + ] =?> + blockQuote (orderedListWith (1, Decimal, Period) [ para "First" + , para "Second" + , para "Third" + ]) ] ] diff --git a/test/Tests/Writers/Docx.hs b/test/Tests/Writers/Docx.hs index 215952893..ec147604c 100644 --- a/test/Tests/Writers/Docx.hs +++ b/test/Tests/Writers/Docx.hs @@ -6,6 +6,7 @@ import Tests.Helpers import Text.Pandoc.Class (runIOorExplode) import Text.Pandoc.Definition import Text.Pandoc.Options +import Text.Pandoc.Walk import Text.Pandoc.Readers.Docx import Text.Pandoc.Readers.Native import Text.Pandoc.Writers.Docx @@ -19,16 +20,22 @@ compareOutput :: Options -> FilePath -> FilePath -> IO (Pandoc, Pandoc) -compareOutput opts nativeFileIn nativeFileOut = do +compareOutput (wopts, ropts) nativeFileIn nativeFileOut = do nf <- UTF8.toText <$> BS.readFile nativeFileIn nf' <- UTF8.toText <$> BS.readFile nativeFileOut - let wopts = fst opts - df <- runIOorExplode $ do - d <- readNative def nf - writeDocx wopts{writerUserDataDir = Just (".." </> "data")} d - df' <- runIOorExplode (readNative def nf') - p <- runIOorExplode $ readDocx (snd opts) df - return (p, df') + runIOorExplode $ do + roundtripped <- readNative def nf >>= + writeDocx wopts{writerUserDataDir = Just (".." </> "data")} >>= + readDocx ropts + orig <- readNative def nf' + return (walk fixImages roundtripped, walk fixImages orig) + +-- make all image filenames "image", since otherwise round-trip +-- tests fail because of different behavior of Data.Unique in +-- different ghc versions... +fixImages :: Inline -> Inline +fixImages (Image attr alt (_,tit)) = Image attr alt ("image",tit) +fixImages x = x testCompareWithOptsIO :: Options -> String -> FilePath -> FilePath -> IO TestTree testCompareWithOptsIO opts name nativeFileIn nativeFileOut = do diff --git a/test/command/3681.md b/test/command/3681.md new file mode 100644 index 000000000..d0805e820 --- /dev/null +++ b/test/command/3681.md @@ -0,0 +1,27 @@ +``` +% pandoc -f latex -t native +\newcommand{\cicd}{CI/CD\xspace} + +Software developers create \cicd pipelines to… Following issue can be resolved by \cicd: +^D +[Para [Str "Software",Space,Str "developers",Space,Str "create",Space,Str "CI/CD",Space,Str "pipelines",Space,Str "to\8230",Space,Str "Following",Space,Str "issue",Space,Str "can",Space,Str "be",Space,Str "resolved",Space,Str "by",Space,Str "CI/CD:"]] +``` + +``` +% pandoc -f latex -t native +\newcommand{\cicd}{CI/CD\xspace} + +\cicd\footnote{\url{https://en.wikipedia.org/wiki/CI/CD}} is awesome. +^D +[Para [Str "CI/CD",Note [Para [Link ("",[],[]) [Str "https://en.wikipedia.org/wiki/CI/CD"] ("https://en.wikipedia.org/wiki/CI/CD","")]],Space,Str "is",Space,Str "awesome."]] +``` + +``` +% pandoc -f latex -t native +\newcommand{\cicd}{CI/CD\xspace} +\newcommand{\pipeline}{pipeline\xspace} + +\cicd\pipeline. +^D +[Para [Str "CI/CD",Space,Str "pipeline."]] +``` diff --git a/test/command/3803.md b/test/command/3803.md new file mode 100644 index 000000000..a2e60359d --- /dev/null +++ b/test/command/3803.md @@ -0,0 +1,10 @@ +``` +% pandoc -f markdown+raw_tex -t latex +\begin{blah*} +*ok* +\end{blah*} +^D +\begin{blah*} +*ok* +\end{blah*} +``` diff --git a/test/command/3804.md b/test/command/3804.md new file mode 100644 index 000000000..c13c2ef42 --- /dev/null +++ b/test/command/3804.md @@ -0,0 +1,6 @@ +``` +% pandoc -t native +\titleformat{\chapter}[display]{\normalfont\large\bfseries}{第\thechapter{}章}{20pt}{\Huge} +^D +[RawBlock (Format "latex") "\\titleformat{\\chapter}[display]{\\normalfont\\large\\bfseries}{\31532\\thechapter{}\31456}{20pt}{\\Huge}"] +``` diff --git a/test/command/html-read-figure.md b/test/command/html-read-figure.md new file mode 100644 index 000000000..9c604c706 --- /dev/null +++ b/test/command/html-read-figure.md @@ -0,0 +1,45 @@ +``` +% pandoc -f html -t native +<figure> + <img src="foo.png" title="voyage"> + <figcaption>bar</figcaption> +</figure> +^D +[Para [Image ("",[],[]) [Str "bar"] ("foo.png","fig:voyage")]] +``` + +``` +% pandoc -f html -t native +<figure> + <figcaption>bar</figcaption> + <img src="foo.png" title="voyage"> +</figure> +^D +[Para [Image ("",[],[]) [Str "bar"] ("foo.png","fig:voyage")]] +``` + +``` +% pandoc -f html -t native +<figure> + <img src="foo.png" title="voyage"> +</figure> +^D +[Para [Image ("",[],[]) [] ("foo.png","fig:voyage")]] +``` + +``` +% pandoc -f html -t native +<figure> + <p><img src="foo.png" title="voyage"></p> + <figcaption>bar</figcaption> +</figure> +^D +[Para [Image ("",[],[]) [Str "bar"] ("foo.png","fig:voyage")]] +``` + +``` +% pandoc -f html -t native +<figure><img src="foo.png" title="voyage" alt="this is ignored"><figcaption>bar <strong>baz</strong></figcaption></figure> +^D +[Para [Image ("",[],[]) [Str "bar",Space,Strong [Str "baz"]] ("foo.png","fig:voyage")]] +``` diff --git a/test/command/ifstrequal.md b/test/command/ifstrequal.md new file mode 100644 index 000000000..4ad04d2e1 --- /dev/null +++ b/test/command/ifstrequal.md @@ -0,0 +1,10 @@ +``` +% pandoc -f latex -t native +\ifstrequal{a}{b}{yes}{\emph{no}} +\newcommand{\h}[1]{\ifstrequal{#1}{a}{\'a}{#1}} +\h{a} +\h{b} +^D +[Para [Emph [Str "no"]] +,Para [Str "\225",SoftBreak,Str "b"]] +``` diff --git a/test/command/macros.md b/test/command/macros.md new file mode 100644 index 000000000..055c86d25 --- /dev/null +++ b/test/command/macros.md @@ -0,0 +1,17 @@ +``` +% pandoc -f markdown+latex_macros -t markdown +\newcommand{\my}{\phi} +$\my+\my$ +^D +\newcommand{\my}{\phi} +$\phi+\phi$ +``` + +``` +% pandoc -f markdown-latex_macros -t markdown +\newcommand{\my}{\phi} +$\my+\my$ +^D +\newcommand{\my}{\phi} +$\my+\my$ +``` diff --git a/test/markdown-reader-more.native b/test/markdown-reader-more.native index 5d63a21de..a24417ffe 100644 --- a/test/markdown-reader-more.native +++ b/test/markdown-reader-more.native @@ -56,6 +56,7 @@ Pandoc (Meta {unMeta = fromList [("author",MetaList [MetaInlines [Str "Author",S ,OrderedList (3,Example,TwoParens) [[Plain [Str "Third",Space,Str "example."]]] ,Header 2 ("macros",[],[]) [Str "Macros"] +,RawBlock (Format "latex") "\\newcommand{\\tuple}[1]{\\langle #1 \\rangle}" ,Para [Math InlineMath "\\langle x,y \\rangle"] ,Header 2 ("case-insensitive-references",[],[]) [Str "Case-insensitive",Space,Str "references"] ,Para [Link ("",[],[]) [Str "Fum"] ("/fum","")] diff --git a/test/tikiwiki-reader.native b/test/tikiwiki-reader.native new file mode 100644 index 000000000..2ab053217 --- /dev/null +++ b/test/tikiwiki-reader.native @@ -0,0 +1,130 @@ +Pandoc (Meta {unMeta = fromList []}) +[Header 1 ("header",[],[]) [Str "header"] +,Header 2 ("header-level-two",[],[]) [Str "header",Space,Str "level",Space,Str "two"] +,Header 3 ("header-level-3",[],[]) [Str "header",Space,Str "level",Space,Str "3"] +,Header 4 ("header-_level_-four",[],[]) [Str "header",Space,Str "_level_",Space,Str "four"] +,Header 5 ("header-level-5",[],[]) [Str "header",Space,Str "level",Space,Str "5"] +,Header 6 ("header-level-6",[],[]) [Str "header",Space,Str "level",Space,Str "6"] +,Para [Str "!!!!!!!",Space,Str "not",Space,Str "a",Space,Str "header"] +,Para [Str "--++",Space,Str "not",Space,Str "a",Space,Str "header"] +,Header 1 ("emph-and-strong",[],[]) [Str "emph",Space,Str "and",Space,Str "strong"] +,Para [Emph [Str "emph"],Space,Strong [Str "strong"]] +,Para [Emph [Strong [Str "strong",Space,Str "and",Space,Str "emph",Space,Str "1"]]] +,Para [Strong [Emph [Str "strong",Space,Str "and",Space,Str "emph",Space,Str "2"]]] +,Para [Strong [Emph [Str "emph",Space,Str "inside"],Space,Str "strong"]] +,Para [Strong [Str "strong",Space,Str "with",Space,Emph [Str "emph"]]] +,Para [Emph [Strong [Str "strong",Space,Str "inside"],Space,Str "emph"]] +,Header 1 ("horizontal-rule",[],[]) [Str "horizontal",Space,Str "rule"] +,Para [Str "top"] +,HorizontalRule +,Para [Str "bottom"] +,HorizontalRule +,Header 1 ("nop",[],[]) [Str "nop"] +,Para [Str "__not emph__"] +,Header 1 ("entities",[],[]) [Str "entities"] +,Para [Str "hi",Space,Str "&",Space,Str "low"] +,Para [Str "hi",Space,Str "&",Space,Str "low"] +,Para [Str "G\246del"] +,Para [Str "\777\2730"] +,Header 1 ("linebreaks",[],[]) [Str "linebreaks"] +,Para [Str "hi",LineBreak,Str "there"] +,Para [Str "hi",LineBreak,Str "there"] +,Header 1 ("inline-code",[],[]) [Str "inline",Space,Str "code"] +,Para [Code ("",[],[]) "*\8594*",Space,Code ("",[],[]) "typed",Space,Code ("",[],[]) ">>="] +,Header 1 ("code-blocks",[],[]) [Str "code",Space,Str "blocks"] +,CodeBlock ("",[],[]) "\ncase xs of\n (_:_) -> reverse xs\n [] -> ['*']\n" +,CodeBlock ("",["haskell"],[("colors","haskell"),("ln","0")]) "\ncase xs of\n (_:_) -> reverse xs\n [] -> ['*']\n" +,Header 1 ("external-links",[],[]) [Str "external",Space,Str "links"] +,Para [Link ("",[],[]) [Emph [Str "Google"],Space,Str "search",Space,Str "engine"] ("http://google.com","")] +,Para [Link ("",[],[]) [Str "http://pandoc.org"] ("http://pandoc.org","")] +,Para [Link ("",[],[]) [Str "http://google.com"] ("http://google.com",""),Space,Link ("",[],[]) [Str "http://yahoo.com"] ("http://yahoo.com","")] +,Para [Link ("",[],[]) [Str "email",Space,Str "me"] ("mailto:info@example.org","")] +,Para [Str "http://google.com"] +,Para [Str "info@example.org"] +,Header 1 ("lists",[],[]) [Str "lists"] +,BulletList + [[Plain [Str "Start",Space,Str "each",Space,Str "line",Space]] + ,[Plain [Str "with",Space,Str "an",Space,Str "asterisk",Space,Str "(*).",Space] + ,BulletList + [[Plain [Str "More",Space,Str "asterisks",Space,Str "gives",Space,Str "deeper",Space] + ,BulletList + [[Plain [Str "and",Space,Str "deeper",Space,Str "levels.",Space]]]]]] + ,[Plain [Str "Line",Space,Str "breaks",LineBreak,Str "don't",Space,Str "break",Space,Str "levels.",Space]] + ,[Plain [Str "Continuations",Space,Str "are",Space,Str "also",Space,Str "possible",Space] + ,BulletList + [[Plain [Str "and",Space,Str "do",Space,Str "not",Space,Str "break",Space,Str "the",Space,Str "list",Space,Str "flow",Space]]]] + ,[Plain [Str "Level",Space,Str "one",Space]]] +,Para [Str "Any",Space,Str "other",Space,Str "start",Space,Str "ends",Space,Str "the",Space,Str "list."] +,OrderedList (1,DefaultStyle,DefaultDelim) + [[Plain [Str "Start",Space,Str "each",Space,Str "line",Space]] + ,[Plain [Str "with",Space,Str "a",Space,Str "number",Space,Str "(1.).",Space] + ,OrderedList (1,DefaultStyle,DefaultDelim) + [[Plain [Str "More",Space,Str "number",Space,Str "signs",Space,Str "gives",Space,Str "deeper",Space] + ,OrderedList (1,DefaultStyle,DefaultDelim) + [[Plain [Str "and",Space,Str "deeper",Space]] + ,[Plain [Str "levels.",Space]]]]]] + ,[Plain [Str "Line",Space,Str "breaks",LineBreak,Str "don't",Space,Str "break",Space,Str "levels.",Space]] + ,[Plain [Str "Blank",Space,Str "lines",Space]]] +,OrderedList (1,DefaultStyle,DefaultDelim) + [[Plain [Str "end",Space,Str "the",Space,Str "list",Space,Str "and",Space,Str "start",Space,Str "another.",Space]]] +,Para [Str "Any",Space,Str "other",Space,Str "start",Space,Str "also",Space,Str "ends",Space,Str "the",Space,Str "list."] +,DefinitionList + [([Str "item",Space,Str "1"], + [[Plain [Str "definition",Space,Str "1",Space]]]) + ,([Str "item",Space,Str "2"], + [[Plain [Str "definition",Space,Str "2-1",Space,Str "definition",Space,Str "2-2",Space]]]) + ,([Str "item",Space,Emph [Str "3"]], + [[Plain [Str "definition",Space,Emph [Str "3"],Space]]])] +,OrderedList (1,DefaultStyle,DefaultDelim) + [[Plain [Str "one",Space]] + ,[Plain [Str "two",Space] + ,BulletList + [[Plain [Str "two",Space,Str "point",Space,Str "one",Space]] + ,[Plain [Str "two",Space,Str "point",Space,Str "two",Space]]]] + ,[Plain [Str "three",Space]] + ,[Plain [Str "four",Space]] + ,[Plain [Str "five",Space] + ,OrderedList (1,DefaultStyle,DefaultDelim) + [[Plain [Str "five",Space,Str "sub",Space,Str "1",Space] + ,OrderedList (1,DefaultStyle,DefaultDelim) + [[Plain [Str "five",Space,Str "sub",Space,Str "1",Space,Str "sub",Space,Str "1",Space]]]] + ,[Plain [Str "five",Space,Str "sub",Space,Str "2",Space]]]]] +,Header 1 ("tables",[],[]) [Str "tables"] +,Table [] [AlignDefault,AlignDefault] [0.0,0.0] + [[Plain [Str ""]] + ,[Plain [Str ""]]] + [[[Plain [Str "Orange"]] + ,[Plain [Str "Apple"]]] + ,[[Plain [Str "Bread"]] + ,[Plain [Str "Pie"]]] + ,[[Plain [Str "Butter"]] + ,[Plain [Str "Ice",Space,Str "cream"]]]] +,Table [] [AlignDefault,AlignDefault] [0.0,0.0] + [[Plain [Str ""]] + ,[Plain [Str ""]]] + [[[Plain [Str "Orange"]] + ,[Plain [Str "Apple"]]] + ,[[Plain [Str "Bread"]] + ,[Plain [Str "Pie"]]] + ,[[Plain [Strong [Str "Butter"]]] + ,[Plain [Str "Ice",Space,Str "cream"]]]] +,Table [] [AlignDefault,AlignDefault] [0.0,0.0] + [[Plain [Str ""]] + ,[Plain [Str ""]]] + [[[Plain [Str "Orange"]] + ,[Plain [Str "Apple"]]] + ,[[Plain [Str "Bread",LineBreak,LineBreak,Str "and",Space,Str "cheese"]] + ,[Plain [Str "Pie",LineBreak,LineBreak,Strong [Str "apple"],Space,Str "and",Space,Emph [Str "carrot"],Space]]]] +,Table [] [AlignDefault,AlignDefault,AlignDefault] [0.0,0.0,0.0] + [[Plain [Str ""]] + ,[Plain [Str ""]] + ,[Plain [Str ""]]] + [[[Plain [Space,Str "Orange",Space]] + ,[Plain [Space,Str "Apple",Space]] + ,[Plain [Space,Str "more"]]] + ,[[Plain [Space,Str "Bread",Space]] + ,[Plain [Space,Str "Pie",Space]] + ,[Plain [Space,Str "more"]]] + ,[[Plain [Space,Str "Butter",Space]] + ,[Plain [Space,Str "Ice",Space,Str "cream",Space]] + ,[Plain [Space,Str "and",Space,Str "more",Space]]]]] diff --git a/test/tikiwiki-reader.tikiwiki b/test/tikiwiki-reader.tikiwiki new file mode 100644 index 000000000..d1971feb1 --- /dev/null +++ b/test/tikiwiki-reader.tikiwiki @@ -0,0 +1,148 @@ +! header + +!! header level two + +!!! header level 3 + +!!!! header _level_ four + +!!!!! header level 5 + +!!!!!! header level 6 + +!!!!!!! not a header + + --++ not a header + +! emph and strong + +''emph'' __strong__ + +''__strong and emph 1__'' + +__''strong and emph 2''__ + +__''emph inside'' strong__ + +__strong with ''emph''__ + +''__strong inside__ emph'' + +! horizontal rule + +top +---- +bottom + +---- + +! nop + +~np~__not emph__~/np~ + +! entities + +hi & low + +hi & low + +Gödel + +̉પ + +! linebreaks + +hi%%%there + +hi%%% +there + +! inline code + +-+*→*+- -+typed+- -+>>=+- + +! code blocks + +{CODE()} +case xs of + (_:_) -> reverse xs + [] -> ['*'] +{CODE} + +{CODE(colors="haskell" ln=0)} +case xs of + (_:_) -> reverse xs + [] -> ['*'] +{CODE} + +! external links + +[http://google.com|''Google'' search engine] + +[http://pandoc.org] + +[http://google.com] [http://yahoo.com] + +[mailto:info@example.org|email me] + +http://google.com + +info@example.org + +! lists + +* Start each line +* with an asterisk (*). +** More asterisks gives deeper +*** and deeper levels. +* Line breaks%%%don't break levels. +* Continuations ++ are also possible +** and do not break the list flow +* Level one +Any other start ends the list. + +# Start each line +# with a number (1.). +## More number signs gives deeper +### and deeper +### levels. +# Line breaks%%%don't break levels. +# Blank lines + +# end the list and start another. +Any other start also +ends the list. + +;item 1: definition 1 +;item 2: definition 2-1 ++ definition 2-2 +;item ''3'': definition ''3'' + +# one +# two +** two point one +** two point two +# three +# four +# five +## five sub 1 +### five sub 1 sub 1 +## five sub 2 + +! tables + +||Orange|Apple +Bread|Pie +Butter|Ice cream|| + +||Orange|Apple +Bread|Pie +__Butter__|Ice cream|| + +||Orange|Apple +Bread%%%%%%and cheese|Pie%%%%%%__apple__ and ''carrot'' || + +|| Orange | Apple | more + Bread | Pie | more + Butter | Ice cream | and more || |