From 07bf48d07b58d490c14022a6714f37aa97ca53ab Mon Sep 17 00:00:00 2001 From: John MacFarlane Date: Tue, 24 Oct 2017 22:25:45 -0700 Subject: More progress on using-the-pandoc-api.md. --- doc/using-the-pandoc-api.md | 270 ++++++++++++++++++++++++++++++++++++++------ 1 file changed, 238 insertions(+), 32 deletions(-) (limited to 'doc/using-the-pandoc-api.md') diff --git a/doc/using-the-pandoc-api.md b/doc/using-the-pandoc-api.md index 75bf227f1..c00db5b10 100644 --- a/doc/using-the-pandoc-api.md +++ b/doc/using-the-pandoc-api.md @@ -9,77 +9,283 @@ Detailed API documentation at the level of individual functions and types is available at . -# Pandoc's architecture, and a simple example +# Pandoc's architecture -Pandoc structure, readers, writers. +Pandoc is structured as a set of *readers*, which translate +various input formats into an abstract syntax tree (the +Pandoc AST) representing a structured document, and a set of +*writers*, which render this AST into various input formats. +Pictorially: -example of using a reader. +``` +[input format] ==reader==> [Pandoc AST] ==writer==> [output format] +``` + +This architecture allows pandoc to perform $M \times n$ +conversions with $M$ readers and $N$ writers. + +The Pandoc AST is defined in the +[pandoc-types](https://hackage.haskell.org/package/pandoc-types) +package. You should start by looking at the Haddock +documentation for +[Text.Pandoc.Definition](https://hackage.haskell.org/package/pandoc-types/docs/Text-Pandoc-Definition.html). As you'll see, a `Pandoc` is +composed of some metadata and a list of `Block`s. There are +various kinds of `Block`, including `Para` (paragraph), +`Header` (section heading), and `BlockQuote`. Some of the +`Block`s (like `BlockQuote`) contain lists of `Block`s, +while others (like `Para`) contain lists of `Inline`s, and +still others (like `CodeBlock`) contain plain text or +nothing. `Inline`s are the basic elements of paragraphs. +The distinction between `Block` and `Inline` in the type +system makes it impossible to represent, for example, +a link (`Inline`) whose link text is a block quote (`Block`). +This expressive limitation is mostly a help rather than a +hindrance, since many of the formats pandoc supports have +similar limitations. + +The best way to explore the pandoc AST is to use `pandoc -t +native`, which will display the AST correspoding to some +Markdown input: + +``` +% echo -e "1. *foo*\n2. bar" | pandoc -t native +[OrderedList (1,Decimal,Period) + [[Plain [Emph [Str "foo"]]] + ,[Plain [Str "bar"]]]] +``` + +# A simple example + +Here is a simple example of the use of a pandoc reader and +writer to perform a conversion inside ghci: + +``` +import Text.Pandoc +import qualified Data.Text as T +import qualified Data.Text.IO as TIO + +main :: IO () +main = do + result <- runIO $ do + doc <- readMarkdown def (T.pack "[testing](url)") + writeRST def doc + rst <- handleError result + TIO.putStrLn rst +``` -example of using a writer. +Some notes: -chaining them together. +1. The first part constructs a conversion pipeline: the input + string is passed to `readMarkdown`, and the resulting Pandoc + AST (`doc`) is then rendered by `writeRST`. The conversion + pipeline is "run" by `runIO`---more on that below. + +2. `result` has the type `Either PandocError Text`. We could + pattern-match on this manually, but it's simpler in this + context to use the `handleError` function from + Text.Pandoc.Error. This exits with an appropriate error + code and message if the value is a `Left`, and returns the + `Text` if the value is a `Right`. # The PandocMonad class -Pandoc's functions define computations that can be run in -any instance of the `PandocMonad` typeclass. Two instances -are provided: `PandocIO` and `PandocPure`. The difference is -that computations run in `PandocIO` are allowed to do IO -(for example, read a file), while computations in `PandocPure` -are free of any side effects. `PandocPure` is useful when -you want to prevent users from doing anything malicious. +Let's look at the types of `readMarkdown` and `writeRST`: + +```haskell +readMarkdown :: PandocMonad m => ReaderOptions -> Text -> m Pandoc + +writeRST :: PandocMonad m => WriterOptions -> Pandoc -> m Text +``` -Here's an example of such a computation, from the module -`Text.Pandoc.Class`: +The `PandocMonad m =>` part is a typeclass constraint. +It says that `readMarkdown` and `writeRST` define computations +that can be used in any instance of the `PandocMonad` +type class. `PandocMonad` is defined in the module +Text.Pandoc.Class. + +Two instances of `PandocMonad` are provided: `PandocIO` and +`PandocPure`. The difference is that computations run in +`PandocIO` are allowed to do IO (for example, read a file), +while computations in `PandocPure` are free of any side effects. +`PandocPure` is useful for sandboxed environments, when you want +to prevent users from doing anything malicious. To run the +conversion in `PandocIO`, use `runIO` (as above). To run it in +`PandocPure`, use `runPure`. + +As you can see from the Haddocks, +[Text.Pandoc.Class](https://hackage.haskell.org/package/pandoc/docs/Text-Pandoc-Class.html) +exports many auxiliary functions that can be used in any +instance of `PandocMonad`. For example: ```haskell -- | Get the verbosity level. getVerbosity :: PandocMonad m => m Verbosity -``` -motivations -Class. +-- | Set the verbosity level. +setVerbosity :: PandocMonad m => Verbosity -> m () -# The Pandoc structure +-- Get the accomulated log messages (in temporal order). +getLog :: PandocMonad m => m [LogMessage] +getLog = reverse <$> getsCommonState stLog -blocks/inlines +-- | Log a message using 'logOutput'. Note that +-- 'logOutput' is called only if the verbosity +-- level exceeds the level of the message, but +-- the message is added to the list of log messages +-- that will be retrieved by 'getLog' regardless +-- of its verbosity level. +report :: PandocMonad m => LogMessage -> m () -# Readers and writers +-- | Fetch an image or other item from the local filesystem or the net. +-- Returns raw content and maybe mime type. +fetchItem :: PandocMonad m + => String + -> m (B.ByteString, Maybe MimeType) -getReader, getWriter +setResourcePath :: PandocMonad m => [FilePath] -> m () +``` + +If we wanted more verbose informational messages +during the conversion we defined in the previous +section, we could do this: + +```haskell + result <- runIO $ do + setVerbosity INFO + doc <- readMarkdown def (T.pack "[testing](url)") + writeRST def doc +``` # Options -various reader and writer options you can set. -templates -extensions +The first argument of each reader or writer is for +options controlling the behavior of the reader or writer: +`ReaderOptions` for readers and `WriterOptions` +for writers. These are defined in +[Text.Pandoc.Options](https://hackage.haskell.org/package/pandoc/docs/Text-Pandoc-Options.html). It is a good idea to study these +options to see what can be adjusted. + +`def` (from Data.Default) denotes a default value for +each kind of option. (You can also use `defaultWriterOptions` +and `defaultReaderOptions`.) Generally you'll want to use +the defaults and modify them only when needed, for example: + +```haskell + writeRST def{ writerReferenceLinks = True } +``` + +Some particularly important options to know about: + +1. `writerTemplate`: By default, this is `Nothing`, which + means that a document fragment will be produced. If you + want a full document, you need to specify `Just template`, + where `template` is a String containing the template's + contents (not the path). + +2. `readerExtensions` and `writerExtensions`: These specify + the extensions to be used in parsing and rendering. + Extensions are defined in [Text.Pandoc.Extensions](https://hackage.haskell.org/package/pandoc/docs/Text-Pandoc-Extensions.html). # Builder + Inlines vs Inline, etc. -Monoid -example: report from CSV data +Concatenating lists is slow. So we use special types Inlines and Blocks that wrap Sequences of Inline and Block elements. + +Monoid - makes it easy to build up docs programatically. + +Example. +Here’s a JSON data source about CNG fueling stations in the +Chicago area: cng_fuel_chicago.json. Boss says: write me a +letter in Word listing all the stations that take the Voyager +card. + +``` +[ { + "state" : "IL", + "city" : "Chicago", + "fuel_type_code" : "CNG", + "zip" : "60607", + "station_name" : "Clean Energy - Yellow Cab", + "cards_accepted" : "A D M V Voyager Wright_Exp CleanEnergy", + "street_address" : "540 W Grenshaw" +}, ... +``` + +No need to open Word for this job! fuel.hs + +``` +{-# LANGUAGE OverloadedStrings #-} +import Text.Pandoc.Builder +import Text.Pandoc +import Data.Monoid ((<>), mempty, mconcat) +import Data.Aeson +import Control.Applicative +import Control.Monad (mzero) +import qualified Data.ByteString.Lazy as BL +import qualified Data.Text as T +import Data.List (intersperse) + +data Station = Station{ + address :: String + , name :: String + , cardsAccepted :: [String] + } deriving Show + +instance FromJSON Station where + parseJSON (Object v) = Station <$> + v .: "street_address" <*> + v .: "station_name" <*> + (words <$> (v .:? "cards_accepted" .!= "")) + parseJSON _ = mzero + +createLetter :: [Station] -> Pandoc +createLetter stations = doc $ + para "Dear Boss:" <> + para "Here are the CNG stations that accept Voyager cards:" <> + simpleTable [plain "Station", plain "Address", plain "Cards accepted"] + (map stationToRow stations) <> + para "Your loyal servant," <> + plain (image "JohnHancock.png" "" mempty) + where + stationToRow station = + [ plain (text $ name station) + , plain (text $ address station) + , plain (mconcat $ intersperse linebreak $ map text $ cardsAccepted station) + ] + +main :: IO () +main = do + json <- BL.readFile "cng_fuel_chicago.json" + let letter = case decode json of + Just stations -> createLetter [s | s <- stations, + "Voyager" `elem` cardsAccepted s] + Nothing -> error "Could not decode JSON" + BL.writeFile "letter.docx" =<< writeDocx def letter + putStrLn "Created letter.docx" +``` + + # Templates and other data files +readDataFile + # Handling errors and warnings # Generic transformations Walk and syb for AST transformations -# Filters - -just the basic idea of toJSONFilter -the rest can be left to filters.md +Filters: see filters.md -# Self-contained +but, how do you run filters from a program? +need to export these functions from Text.Pandoc.App! # PDF -# Custom PandocMonad instances # Creating a front-end -- cgit v1.2.3