% Using the pandoc API % John MacFarlane Pandoc can be used as a Haskell library, to write your own conversion tools or power a web application. This document offers an introduction to using the pandoc API. Detailed API documentation at the level of individual functions and types is available at . # Pandoc's architecture Pandoc is structured as a set of *readers*, which translate various input formats into an abstract syntax tree (the Pandoc AST) representing a structured document, and a set of *writers*, which render this AST into various input formats. Pictorially: ``` [input format] ==reader==> [Pandoc AST] ==writer==> [output format] ``` This architecture allows pandoc to perform $M \times n$ conversions with $M$ readers and $N$ writers. The Pandoc AST is defined in the [pandoc-types](https://hackage.haskell.org/package/pandoc-types) package. You should start by looking at the Haddock documentation for [Text.Pandoc.Definition](https://hackage.haskell.org/package/pandoc-types/docs/Text-Pandoc-Definition.html). As you'll see, a `Pandoc` is composed of some metadata and a list of `Block`s. There are various kinds of `Block`, including `Para` (paragraph), `Header` (section heading), and `BlockQuote`. Some of the `Block`s (like `BlockQuote`) contain lists of `Block`s, while others (like `Para`) contain lists of `Inline`s, and still others (like `CodeBlock`) contain plain text or nothing. `Inline`s are the basic elements of paragraphs. The distinction between `Block` and `Inline` in the type system makes it impossible to represent, for example, a link (`Inline`) whose link text is a block quote (`Block`). This expressive limitation is mostly a help rather than a hindrance, since many of the formats pandoc supports have similar limitations. The best way to explore the pandoc AST is to use `pandoc -t native`, which will display the AST correspoding to some Markdown input: ``` % echo -e "1. *foo*\n2. bar" | pandoc -t native [OrderedList (1,Decimal,Period) [[Plain [Emph [Str "foo"]]] ,[Plain [Str "bar"]]]] ``` # A simple example Here is a simple example of the use of a pandoc reader and writer to perform a conversion inside ghci: ``` import Text.Pandoc import qualified Data.Text as T import qualified Data.Text.IO as TIO main :: IO () main = do result <- runIO $ do doc <- readMarkdown def (T.pack "[testing](url)") writeRST def doc rst <- handleError result TIO.putStrLn rst ``` Some notes: 1. The first part constructs a conversion pipeline: the input string is passed to `readMarkdown`, and the resulting Pandoc AST (`doc`) is then rendered by `writeRST`. The conversion pipeline is "run" by `runIO`---more on that below. 2. `result` has the type `Either PandocError Text`. We could pattern-match on this manually, but it's simpler in this context to use the `handleError` function from Text.Pandoc.Error. This exits with an appropriate error code and message if the value is a `Left`, and returns the `Text` if the value is a `Right`. # The PandocMonad class Let's look at the types of `readMarkdown` and `writeRST`: ```haskell readMarkdown :: PandocMonad m => ReaderOptions -> Text -> m Pandoc writeRST :: PandocMonad m => WriterOptions -> Pandoc -> m Text ``` The `PandocMonad m =>` part is a typeclass constraint. It says that `readMarkdown` and `writeRST` define computations that can be used in any instance of the `PandocMonad` type class. `PandocMonad` is defined in the module Text.Pandoc.Class. Two instances of `PandocMonad` are provided: `PandocIO` and `PandocPure`. The difference is that computations run in `PandocIO` are allowed to do IO (for example, read a file), while computations in `PandocPure` are free of any side effects. `PandocPure` is useful for sandboxed environments, when you want to prevent users from doing anything malicious. To run the conversion in `PandocIO`, use `runIO` (as above). To run it in `PandocPure`, use `runPure`. As you can see from the Haddocks, [Text.Pandoc.Class](https://hackage.haskell.org/package/pandoc/docs/Text-Pandoc-Class.html) exports many auxiliary functions that can be used in any instance of `PandocMonad`. For example: ```haskell -- | Get the verbosity level. getVerbosity :: PandocMonad m => m Verbosity -- | Set the verbosity level. setVerbosity :: PandocMonad m => Verbosity -> m () -- Get the accomulated log messages (in temporal order). getLog :: PandocMonad m => m [LogMessage] getLog = reverse <$> getsCommonState stLog -- | Log a message using 'logOutput'. Note that -- 'logOutput' is called only if the verbosity -- level exceeds the level of the message, but -- the message is added to the list of log messages -- that will be retrieved by 'getLog' regardless -- of its verbosity level. report :: PandocMonad m => LogMessage -> m () -- | Fetch an image or other item from the local filesystem or the net. -- Returns raw content and maybe mime type. fetchItem :: PandocMonad m => String -> m (B.ByteString, Maybe MimeType) setResourcePath :: PandocMonad m => [FilePath] -> m () ``` If we wanted more verbose informational messages during the conversion we defined in the previous section, we could do this: ```haskell result <- runIO $ do setVerbosity INFO doc <- readMarkdown def (T.pack "[testing](url)") writeRST def doc ``` # Options The first argument of each reader or writer is for options controlling the behavior of the reader or writer: `ReaderOptions` for readers and `WriterOptions` for writers. These are defined in [Text.Pandoc.Options](https://hackage.haskell.org/package/pandoc/docs/Text-Pandoc-Options.html). It is a good idea to study these options to see what can be adjusted. `def` (from Data.Default) denotes a default value for each kind of option. (You can also use `defaultWriterOptions` and `defaultReaderOptions`.) Generally you'll want to use the defaults and modify them only when needed, for example: ```haskell writeRST def{ writerReferenceLinks = True } ``` Some particularly important options to know about: 1. `writerTemplate`: By default, this is `Nothing`, which means that a document fragment will be produced. If you want a full document, you need to specify `Just template`, where `template` is a String containing the template's contents (not the path). 2. `readerExtensions` and `writerExtensions`: These specify the extensions to be used in parsing and rendering. Extensions are defined in [Text.Pandoc.Extensions](https://hackage.haskell.org/package/pandoc/docs/Text-Pandoc-Extensions.html). # Builder Inlines vs Inline, etc. Concatenating lists is slow. So we use special types Inlines and Blocks that wrap Sequences of Inline and Block elements. Monoid - makes it easy to build up docs programatically. Example. Here’s a JSON data source about CNG fueling stations in the Chicago area: cng_fuel_chicago.json. Boss says: write me a letter in Word listing all the stations that take the Voyager card. ``` [ { "state" : "IL", "city" : "Chicago", "fuel_type_code" : "CNG", "zip" : "60607", "station_name" : "Clean Energy - Yellow Cab", "cards_accepted" : "A D M V Voyager Wright_Exp CleanEnergy", "street_address" : "540 W Grenshaw" }, ... ``` No need to open Word for this job! fuel.hs ``` {-# LANGUAGE OverloadedStrings #-} import Text.Pandoc.Builder import Text.Pandoc import Data.Monoid ((<>), mempty, mconcat) import Data.Aeson import Control.Applicative import Control.Monad (mzero) import qualified Data.ByteString.Lazy as BL import qualified Data.Text as T import Data.List (intersperse) data Station = Station{ address :: String , name :: String , cardsAccepted :: [String] } deriving Show instance FromJSON Station where parseJSON (Object v) = Station <$> v .: "street_address" <*> v .: "station_name" <*> (words <$> (v .:? "cards_accepted" .!= "")) parseJSON _ = mzero createLetter :: [Station] -> Pandoc createLetter stations = doc $ para "Dear Boss:" <> para "Here are the CNG stations that accept Voyager cards:" <> simpleTable [plain "Station", plain "Address", plain "Cards accepted"] (map stationToRow stations) <> para "Your loyal servant," <> plain (image "JohnHancock.png" "" mempty) where stationToRow station = [ plain (text $ name station) , plain (text $ address station) , plain (mconcat $ intersperse linebreak $ map text $ cardsAccepted station) ] main :: IO () main = do json <- BL.readFile "cng_fuel_chicago.json" let letter = case decode json of Just stations -> createLetter [s | s <- stations, "Voyager" `elem` cardsAccepted s] Nothing -> error "Could not decode JSON" BL.writeFile "letter.docx" =<< writeDocx def letter putStrLn "Created letter.docx" ``` # Templates and other data files readDataFile # Handling errors and warnings # Generic transformations Walk and syb for AST transformations # Filters Filters: see filters.md applyFilters, applyLuaFilters from Text.Pandoc.App. # PDF # Creating a front-end Text.Pandoc.App