diff options
author | Igor Pashev <pashev.igor@gmail.com> | 2021-12-29 15:00:59 +0200 |
---|---|---|
committer | Igor Pashev <pashev.igor@gmail.com> | 2021-12-29 15:00:59 +0200 |
commit | b4361712899fd0183fea5513180cb383979616de (patch) | |
tree | 688ab7ee2ab3a8cd32b4e37b506099aec95388f7 /doc | |
parent | 726ad97faebe59e024d68d293e663c02bbe423c8 (diff) | |
parent | d960282b105a6469c760b4308a3b81da723b7256 (diff) | |
download | pandoc-b4361712899fd0183fea5513180cb383979616de.tar.gz |
Merge https://github.com/jgm/pandoc
Diffstat (limited to 'doc')
-rw-r--r-- | doc/custom-readers.md | 684 | ||||
-rw-r--r-- | doc/custom-writers.md | 52 | ||||
-rw-r--r-- | doc/faqs.md | 8 | ||||
-rw-r--r-- | doc/filters.md | 18 | ||||
-rw-r--r-- | doc/getting-started.md | 1 | ||||
-rw-r--r-- | doc/jats.md | 36 | ||||
-rw-r--r-- | doc/lua-filters.md | 743 | ||||
-rw-r--r-- | doc/short-guide-to-pandocs-sources.md | 259 |
8 files changed, 1676 insertions, 125 deletions
diff --git a/doc/custom-readers.md b/doc/custom-readers.md new file mode 100644 index 000000000..37b6d6a3e --- /dev/null +++ b/doc/custom-readers.md @@ -0,0 +1,684 @@ +--- +author: +- John MacFarlane +date: 'November 18, 2021' +title: Creating Custom Pandoc Readers in Lua +--- + +# Introduction + +If you need to parse a format not already handled by pandoc, +you can create a custom reader using the [Lua] language. +Pandoc has a built-in Lua interpreter, so you needn't +install any additional software to do this. + +[Lua]: https://www.lua.org + +A custom reader is a Lua file that defines a function +called `Reader`, which takes two arguments: + +- the raw input to be parsed, as a list of sources +- optionally, a table of reader options, e.g. + `{ columns = 62, standalone = true }`. + +The `Reader` function should return a `Pandoc` AST. +This can be created using functions in the [`pandoc` module], +which is automatically in scope. (Indeed, all of the utility +functions that are available for [Lua filters] are available +in custom readers, too.) + +Each source item corresponds to a file or stream passed to pandoc +containing its text and name. E.g., if a single file `input.txt` +is passed to pandoc, then the list of sources will contain just a +single element `s`, where `s.name == 'input.txt'` and `s.text` +contains the file contents as a string. + +The sources list, as well as each of its elements, can be +converted to a string via the Lua standard library function +`tostring`. + +[Lua filters]: https://pandoc.org/lua-filters.html +[`pandoc` module]: https://pandoc.org/lua-filters.html#module-pandoc + +A minimal example would be + +```lua +function Reader(input) + return pandoc.Pandoc({ pandoc.CodeBlock(tostring(input)) }) +end +``` + +This just returns a document containing a big code block with all +of the input. Or, to create a separate code block for each input +file, one might write + +``` lua +function Reader(input) + return pandoc.Pandoc(input:map( + function (s) return pandoc.CodeBlock(s.text) end)) +end +``` + +In a nontrivial reader, you'll want to parse the input. +You can do this using standard Lua library functions +(for example, the [patterns] library), or with the powerful +and fast [lpeg] parsing library, which is automatically in scope. +You can also use external Lua libraries (for example, +an XML parser). + +A previous pandoc version passed a raw string instead of a list +of sources to the Reader function. Reader functions that rely on +this are obsolete, but still supported: Pandoc analyzes any +script error, detecting when code assumed the old behavior. The +code is rerun with raw string input in this case, thereby +ensuring backwards compatibility. + +[patterns]: http://lua-users.org/wiki/PatternsTutorial +[lpeg]: http://www.inf.puc-rio.br/~roberto/lpeg/ + +# Example: plain text reader + +This is a simple example using [lpeg] to parse the input +into space-separated strings and blankline-separated paragraphs. + +```lua +-- A sample custom reader that just parses text into blankline-separated +-- paragraphs with space-separated words. + +-- For better performance we put these functions in local variables: +local P, S, R, Cf, Cc, Ct, V, Cs, Cg, Cb, B, C, Cmt = + lpeg.P, lpeg.S, lpeg.R, lpeg.Cf, lpeg.Cc, lpeg.Ct, lpeg.V, + lpeg.Cs, lpeg.Cg, lpeg.Cb, lpeg.B, lpeg.C, lpeg.Cmt + +local whitespacechar = S(" \t\r\n") +local wordchar = (1 - whitespacechar) +local spacechar = S(" \t") +local newline = P"\r"^-1 * P"\n" +local blanklines = newline * (spacechar^0 * newline)^1 +local endline = newline - blanklines + +-- Grammar +G = P{ "Pandoc", + Pandoc = Ct(V"Block"^0) / pandoc.Pandoc; + Block = blanklines^0 * V"Para" ; + Para = Ct(V"Inline"^1) / pandoc.Para; + Inline = V"Str" + V"Space" + V"SoftBreak" ; + Str = wordchar^1 / pandoc.Str; + Space = spacechar^1 / pandoc.Space; + SoftBreak = endline / pandoc.SoftBreak; +} + +function Reader(input) + return lpeg.match(G, tostring(input)) +end +``` + +Example of use: + +``` +% pandoc -f plain.lua -t native +*Hello there*, this is plain text with no formatting +except paragraph breaks. + +- Like this one. +^D +[ Para + [ Str "*Hello" + , Space + , Str "there*," + , Space + , Str "this" + , Space + , Str "is" + , Space + , Str "plain" + , Space + , Str "text" + , Space + , Str "with" + , Space + , Str "no" + , Space + , Str "formatting" + , SoftBreak + , Str "except" + , Space + , Str "paragraph" + , Space + , Str "breaks." + ] +, Para + [ Str "-" + , Space + , Str "Like" + , Space + , Str "this" + , Space + , Str "one." + ] +] +``` + +# Example: a RIS bibliography reader + +This is a parser for [RIS bibliography] files. It can be used +to convert them to CSL JSON or YAML, BibTeX, or BibLaTeX. + +[RIS bibliography]: https://en.wikipedia.org/wiki/RIS_(file_format) + +```lua +-- A sample custom reader for RIS bibliography format +-- https://en.wikipedia.org/wiki/RIS_(file_format) +-- The references are converted to inline pandoc/CSL YAML +-- references in the metadata. + +local inspect = require"inspect" + +local types = + { ABST = "article", + ADVS = "motion-picture", + AGGR = "dataset", + ANCIENT = "book", + ART = "graphic", + BILL = "bill", + BLOG = "post-weblog", + BOOK = "book", + CASE = "legal_case", + CHAP = "chapter", + CHART = "graphic", + CLSWK = "book", + COMP = "program", + CONF = "paper-conference", + CPAPER = "paper-conference", + CTLG = "catalog", + DATA = "dataset", + DBASE = "dataset", + DICT = "book", + EBOOK = "book", + ECHAP = "chapter", + EDBOOK = "book", + EJOUR = "article", + WEB = "webpage", + ENCYC = "entry-encyclopedia", + EQUA = "figure", + FIGURE = "figure", + GEN = "entry", + GOVDOC = "report", + GRANT = "report", + HEAR = "report", + ICOMM = "personal_communication", + INPR = "article-journal", + JFULL = "article-journal", + JOUR = "article-journal", + LEGAL = "legal_case", + MANSCPT = "manuscript", + MAP = "map", + MGZN = "article-magazine", + MPCT = "motion-picture", + MULTI = "webpage", + MUSIC = "musical_score", + NEWS = "article-newspaper", + PAMP = "pamphlet", + PAT = "patent", + PCOMM = "personal_communication", + RPRT = "report", + SER = "article", + SLIDE = "graphic", + SOUND = "musical_score", + STAND = "report", + STAT = "legislation", + THES = "thesis", + UNBILL = "bill", + UNPB = "unpublished", + VIDEO = "graphic" + } + +local function clean(refpairs) + local ref = {} + for i = 1, #refpairs do + local k,v = table.unpack(refpairs[i]) + if k == "TY" then + ref["type"] = types[v] + elseif k == "VL" then + ref.volume = v + elseif k == "KW" then + ref.keyword = v + elseif k == "PB" then + ref.publisher = v + elseif k == "CY" or k == "PP" then + ref["publisher-place"] = v + elseif k == "SP" then + if ref.page then + ref.page = v .. ref.page + else + ref.page = v + end + elseif k == "EP" then + if ref.page then + ref.page = ref.page .. "-" .. v + else + ref.page = "-" .. v + end + elseif k == "AU" or k == "A1" or k == "A2" or k == "A3" then + if ref.author then + table.insert(ref.author, v) + else + ref.author = {v} + end + elseif k == "TI" or k == "T1" or k == "CT" or + (k == "BT" and ref.type == "book") then + ref.title = v + elseif k == "ET" then + ref.edition = v + elseif k == "NV" then + ref["number-of-volumes"] = v + elseif k == "AB" then + ref.abstract = v + elseif k == "ED" then + if ref.editor then + table.insert(ref.editor, v) + else + ref.editor = {v} + end + elseif k == "JO" or k == "JF" or k == "T2" or + (k == "BT" and ref.type ~= "book") then + ref["container-title"] = v + elseif k == "PY" or k == "Y1" then + ref.issued = v + elseif k == "IS" then + ref.issue = v + elseif k == "SN" then + ref.ISSN = v + elseif k == "L" then + ref.lang = v + elseif k == "UR" or k == "LK" then + ref.URL = v + end + end + return ref +end + +function Reader(input, reader_options) + local refs = {} + local thisref = {} + local ids = {} + for line in string.gmatch(tostring(input), "[^\n]*") do + key, val = string.match(line, "([A-Z][A-Z0-9]) %- (.*)") + if key == "ER" then + -- clean up fields + local newref = clean(thisref) + -- ensure we have an id and if not, create a sensible one + if not newref.id then + newref.id = "" + for _,x in ipairs(newref.author) do + newref.id = newref.id .. string.match(pandoc.utils.stringify(x), "%a+") + end + if newref.issued then + newref.id = newref.id .. string.match(newref.issued, "%d+") + end + if ids[newref.id] then -- add disambiguator if needed + newref.id = newref.id .. "-" .. #ids + end + end + table.insert(ids, newref.id) + table.insert(refs, newref) + thisref = {} + elseif key then + table.insert(thisref, {key, val}) + end + end + return pandoc.Pandoc({}, pandoc.Meta { references = refs } ) +end +``` + +Example of use: + +``` +% pandoc -f ris.lua -t bibtex +TY - JOUR +AU - Shannon, Claude E. +PY - 1948 +DA - July +TI - A Mathematical Theory of Communication +T2 - Bell System Technical Journal +SP - 379 +EP - 423 +VL - 27 +ER - +TY - JOUR +T1 - On computable numbers, with an application to the Entscheidungsproblem +A1 - Turing, Alan Mathison +JO - Proc. of London Mathematical Society +VL - 47 +IS - 1 +SP - 230 +EP - 265 +Y1 - 1937 +ER - +^D +@article{Shannon1948, + author = {Shannon, Claude E.}, + title = {A {Mathematical} {Theory} of {Communication}}, + journal = {Bell System Technical Journal}, + volume = {27}, + pages = {379-423}, + year = {1948} +} +@article{Turing1937, + author = {Turing, Alan Mathison}, + title = {On Computable Numbers, with an Application to the + {Entscheidungsproblem}}, + journal = {Proc. of London Mathematical Society}, + volume = {47}, + number = {1}, + pages = {230-265}, + year = {1937} +} +``` + +# Example: a wiki Creole reader + +This is a parser for [Creole common wiki markup]. +It uses an [lpeg] grammar. Fun fact: this custom reader is faster than +pandoc's built-in creole reader! This shows that high-performance +readers can be designed in this way. + +[Creole common wiki markup]: http://www.wikicreole.org/wiki/CheatSheet + + +```lua +-- A sample custom reader for Creole 1.0 (common wiki markup) +-- http://www.wikicreole.org/wiki/CheatSheet + +-- For better performance we put these functions in local variables: +local P, S, R, Cf, Cc, Ct, V, Cs, Cg, Cb, B, C, Cmt = + lpeg.P, lpeg.S, lpeg.R, lpeg.Cf, lpeg.Cc, lpeg.Ct, lpeg.V, + lpeg.Cs, lpeg.Cg, lpeg.Cb, lpeg.B, lpeg.C, lpeg.Cmt + +local whitespacechar = S(" \t\r\n") +local specialchar = S("/*~[]\\{}|") +local wordchar = (1 - (whitespacechar + specialchar)) +local spacechar = S(" \t") +local newline = P"\r"^-1 * P"\n" +local blankline = spacechar^0 * newline +local endline = newline * #-blankline +local endequals = spacechar^0 * P"="^0 * spacechar^0 * newline +local cellsep = spacechar^0 * P"|" + +local function trim(s) + return (s:gsub("^%s*(.-)%s*$", "%1")) +end + +local function ListItem(lev, ch) + local start + if ch == nil then + start = S"*#" + else + start = P(ch) + end + local subitem = function(c) + if lev < 6 then + return ListItem(lev + 1, c) + else + return (1 - 1) -- fails + end + end + local parser = spacechar^0 + * start^lev + * #(- start) + * spacechar^0 + * Ct((V"Inline" - (newline * spacechar^0 * S"*#"))^0) + * newline + * (Ct(subitem("*")^1) / pandoc.BulletList + + + Ct(subitem("#")^1) / pandoc.OrderedList + + + Cc(nil)) + / function (ils, sublist) + return { pandoc.Plain(ils), sublist } + end + return parser +end + +-- Grammar +G = P{ "Doc", + Doc = Ct(V"Block"^0) + / pandoc.Pandoc ; + Block = blankline^0 + * ( V"Header" + + V"HorizontalRule" + + V"CodeBlock" + + V"List" + + V"Table" + + V"Para") ; + Para = Ct(V"Inline"^1) + * newline + / pandoc.Para ; + HorizontalRule = spacechar^0 + * P"----" + * spacechar^0 + * newline + / pandoc.HorizontalRule; + Header = (P("=")^1 / string.len) + * spacechar^1 + * Ct((V"Inline" - endequals)^1) + * endequals + / pandoc.Header; + CodeBlock = P"{{{" + * blankline + * C((1 - (newline * P"}}}"))^0) + * newline + * P"}}}" + / pandoc.CodeBlock; + Placeholder = P"<<<" + * C(P(1) - P">>>")^0 + * P">>>" + / function() return pandoc.Div({}) end; + List = V"BulletList" + + V"OrderedList" ; + BulletList = Ct(ListItem(1,'*')^1) + / pandoc.BulletList ; + OrderedList = Ct(ListItem(1,'#')^1) + / pandoc.OrderedList ; + Table = (V"TableHeader" + Cc{}) + * Ct(V"TableRow"^1) + / function(headrow, bodyrows) + local numcolumns = #(bodyrows[1]) + local aligns = {} + local widths = {} + for i = 1,numcolumns do + aligns[i] = pandoc.AlignDefault + widths[i] = 0 + end + return pandoc.utils.from_simple_table( + pandoc.SimpleTable({}, aligns, widths, headrow, bodyrows)) + end ; + TableHeader = Ct(V"HeaderCell"^1) + * cellsep^-1 + * spacechar^0 + * newline ; + TableRow = Ct(V"BodyCell"^1) + * cellsep^-1 + * spacechar^0 + * newline ; + HeaderCell = cellsep + * P"=" + * spacechar^0 + * Ct((V"Inline" - (newline + cellsep))^0) + / function(ils) return { pandoc.Plain(ils) } end ; + BodyCell = cellsep + * spacechar^0 + * Ct((V"Inline" - (newline + cellsep))^0) + / function(ils) return { pandoc.Plain(ils) } end ; + Inline = V"Emph" + + V"Strong" + + V"LineBreak" + + V"Link" + + V"URL" + + V"Image" + + V"Str" + + V"Space" + + V"SoftBreak" + + V"Escaped" + + V"Placeholder" + + V"Code" + + V"Special" ; + Str = wordchar^1 + / pandoc.Str; + Escaped = P"~" + * C(P(1)) + / pandoc.Str ; + Special = specialchar + / pandoc.Str; + Space = spacechar^1 + / pandoc.Space ; + SoftBreak = endline + * # -(V"HorizontalRule" + V"CodeBlock") + / pandoc.SoftBreak ; + LineBreak = P"\\\\" + / pandoc.LineBreak ; + Code = P"{{{" + * C((1 - P"}}}")^0) + * P"}}}" + / trim / pandoc.Code ; + Link = P"[[" + * C((1 - (P"]]" + P"|"))^0) + * (P"|" * Ct((V"Inline" - P"]]")^1))^-1 * P"]]" + / function(url, desc) + local txt = desc or {pandoc.Str(url)} + return pandoc.Link(txt, url) + end ; + Image = P"{{" + * #-P"{" + * C((1 - (S"}"))^0) + * (P"|" * Ct((V"Inline" - P"}}")^1))^-1 + * P"}}" + / function(url, desc) + local txt = desc or "" + return pandoc.Image(txt, url) + end ; + URL = P"http" + * P"s"^-1 + * P":" + * (1 - (whitespacechar + (S",.?!:;\"'" * #whitespacechar)))^1 + / function(url) + return pandoc.Link(pandoc.Str(url), url) + end ; + Emph = P"//" + * Ct((V"Inline" - P"//")^1) + * P"//" + / pandoc.Emph ; + Strong = P"**" + * Ct((V"Inline" -P"**")^1) + * P"**" + / pandoc.Strong ; +} + +function Reader(input, reader_options) + return lpeg.match(G, tostring(input)) +end +``` + +Example of use: + +``` +% pandoc -f creole.lua -t markdown +== Wiki Creole + +You can make things **bold** or //italic// or **//both//** or //**both**//. + +Character formatting extends across line breaks: **bold, +this is still bold. This line deliberately does not end in star-star. + +Not bold. Character formatting does not cross paragraph boundaries. + +You can use [[internal links]] or [[http://www.wikicreole.org|external links]], +give the link a [[internal links|different]] name. +^D +## Wiki Creole + +You can make things **bold** or *italic* or ***both*** or ***both***. + +Character formatting extends across line breaks: \*\*bold, this is still +bold. This line deliberately does not end in star-star. + +Not bold. Character formatting does not cross paragraph boundaries. + +You can use [internal links](internal links) or [external +links](http://www.wikicreole.org), give the link a +[different](internal links) name. +``` + +# Example: parsing JSON from an API + +This custom reader consumes the JSON output of +<https://www.reddit.com/r/haskell.json> and produces +a document containing the current top articles on the +Haskell subreddit. + +It assumes that the `luajson` library is available. (It can be +installed using `luarocks install luajson`---but be sure you are +installing it for Lua 5.3, which is the version packaged with +pandoc.) + + +```lua +-- consumes the output of https://www.reddit.com/r/haskell.json + +local json = require'json' -- luajson must be available + +local function read_inlines(raw) + local doc = pandoc.read(raw, "commonmark") + return pandoc.utils.blocks_to_inlines(doc.blocks) +end + +local function read_blocks(raw) + local doc = pandoc.read(raw, "commonmark") + return doc.blocks +end + +function Reader(input) + + local parsed = json.decode(tostring(input)) + local blocks = {} + + for _,entry in ipairs(parsed.data.children) do + local d = entry.data + table.insert(blocks, pandoc.Header(2, + pandoc.Link(read_inlines(d.title), d.url))) + for _,block in ipairs(read_blocks(d.selftext)) do + table.insert(blocks, block) + end + end + + return pandoc.Pandoc(blocks) + +end +``` + +Similar code can be used to consume JSON output from other APIs. + +Note that the content of the text fields is markdown, so we +convert it using `pandoc.read()`. + + +# Example: syntax-highlighted code files + +This is a reader that puts the content of each input file into a +code block, sets the file's extension as the block's class to +enable code highlighting, and places the filename as a header +above each code block. + +``` lua +function to_code_block (source) + local _, lang = pandoc.path.split_extension(source.name) + return pandoc.Div{ + pandoc.Header(1, source.name == '' and '<stdin>' or source.name), + pandoc.CodeBlock(source.text, {class=lang}), + } +end + +function Reader (input, opts) + return pandoc.Pandoc(input:map(to_code_block)) +end +``` diff --git a/doc/custom-writers.md b/doc/custom-writers.md new file mode 100644 index 000000000..6df603288 --- /dev/null +++ b/doc/custom-writers.md @@ -0,0 +1,52 @@ +--- +author: +- John MacFarlane +date: 'November 21, 2021' +title: Creating Custom Pandoc Writers in Lua +--- + +# Introduction + +If you need to render a format not already handled by pandoc, +or you want to change how pandoc renders a format, +you can create a custom writer using the [Lua] language. +Pandoc has a built-in Lua interpreter, so you needn't +install any additional software to do this. + +[Lua]: https://www.lua.org + +A custom writer is a Lua file that defines functions for +rendering each element of a pandoc AST. + +For example, + +``` lua +function Para(s) + return "<paragraph>" .. s .. "</paragraph>" +end +``` + +The best way to go about creating a custom writer is to modify +the example that comes with pandoc. To get the example, you +can do + +``` +pandoc --print-default-data-file sample.lua > sample.lua +``` + +# A custom HTML writer + +`sample.lua` is a full-features HTML writer, with explanatory +comments. To use it, just use the path to the custom writer as +the writer name: + +``` +pandoc -t sample.lua myfile.md +``` + +`sample.lua` defines all the functions needed by any custom +writer, so you can design your own custom writer by modifying +the functions in `sample.lua` according to your needs. + +``` {.lua include="sample.lua"} +``` diff --git a/doc/faqs.md b/doc/faqs.md index 2f3368837..317153d5f 100644 --- a/doc/faqs.md +++ b/doc/faqs.md @@ -124,5 +124,13 @@ topography as TeX, particularly when it comes to math, but they may be fine for many purposes.) +## Converting to PDF on an M1 Mac, I get a "Cannot allocate memory" error. + +We are not sure why this happens, but we have found that +fully specifying the `pdflatex` path avoids the error. For +example, + + pandoc -o my.pdf --pdf-engine=/Library/TeX/texbin/pdflatex + ::: diff --git a/doc/filters.md b/doc/filters.md index 004a83b7d..c921fc007 100644 --- a/doc/filters.md +++ b/doc/filters.md @@ -460,36 +460,36 @@ filter. Object fields: - `readerAbbreviations` + `abbreviations` : set of known abbreviations (array of strings). - `readerColumns` + `columns` : number of columns in terminal; an integer. - `readerDefaultImageExtension` + default-image-extension` : default extension for images; a string. - `readerExtensions` + `extensions` : integer representation of the syntax extensions bit field. - `readerIndentedCodeClasses` + `indented-code-classes` : default classes for indented code blocks; array of strings. - `readerStandalone` + `standalone` : whether the input was a standalone document with header; either `true` or `false`. - `readerStripComments` + `strip-comments` : HTML comments are stripped instead of parsed as raw HTML; either `true` or `false`. - `readerTabStop` + `tab-stop` : width (i.e. equivalent number of spaces) of tab stops; integer. - `readerTrackChanges` + `track-changes` : track changes setting for docx; one of `"accept-changes"`, `"reject-changes"`, and `"all-changes"`. diff --git a/doc/getting-started.md b/doc/getting-started.md index a8f7315fd..b73923daa 100644 --- a/doc/getting-started.md +++ b/doc/getting-started.md @@ -212,6 +212,7 @@ Now save your file as `test1.md` in the directory Note: If you use plain text a lot, you'll want a better editor than `Notepad` or `TextEdit`. You might want to look at +[Visual Studio Code](https://code.visualstudio.com/) or [Sublime Text](https://www.sublimetext.com/) or (if you're willing to put in some time learning an unfamiliar interface) [Vim](https://www.vim.org) or [Emacs](https://www.gnu.org/software/emacs). diff --git a/doc/jats.md b/doc/jats.md index c099e0a04..9b8351840 100644 --- a/doc/jats.md +++ b/doc/jats.md @@ -45,17 +45,28 @@ Metadata Values element. `affiliation` - : list of affiliation identifiers; marks the organizations - with which an author is affiliated. Each identifier in this - list must also occur as the `id` of an affiliation listed in - the top-level `affiliation` list. + : either full affiliation entries as described in field + `affiliation`, or a list of affiliation identifiers. + + The identifiers link to the organizations with which an + author is affiliated. Each identifier in this list must + also occur as the `id` of an affiliation listed in the + top-level `affiliation` list. + + If the top-level `affiliation` field is set, then this + entry assumed to be a list of identifiers, and a list of + full entries if that field is unset. + + Full entries must be given if the articleauthoring tag + set it used, as affiliation links are not allowed in that + schema. `equal-contrib` : boolean attribute used to mark authors who contributed equally to the work. The - [`equal-contrib`][attr:equal-contrib] attribute is added - to the author's [`<contrib>`] element if this is set to a - truthy value. + [`equal-contrib`][attr:equal-contrib] attribute, set to + `yes`, is added to the author's [`<contrib>`] element if + this is set to a truthy value. `cor-id` : identifier linking to the contributor's correspondence @@ -336,12 +347,21 @@ Metadata Values : Additional notes concerning the whole article. Added to the article's frontmatter via the [`<notes>`][elem:notes] element. +`subtitle` +: Subordinate part of the document title. Added to the + document's front matter as a + [`<subtitle>`][elem:article-title] element. + `tags` : list of keywords. Items are used as contents of the [`<kwd>`][elem:kwd] element; the elements are grouped in a [`<kwd-group>`][elem:kwd-group] with the [`kwd-group-type`][attr:kwd-group-type] value `author`. +`title` +: The article title. Added to the document's front matter via the + [`<article-title>`][elem:article-title] element. + Required Metadata ----------------- @@ -378,6 +398,7 @@ Required metadata values: [elem:abstract]: https://jats.nlm.nih.gov/publishing/tag-library/1.2/element/abstract.html [elem:article-id]: https://jats.nlm.nih.gov/publishing/tag-library/1.2/element/article-id.html [elem:article-meta]: https://jats.nlm.nih.gov/publishing/tag-library/1.2/element/article-meta.html +[elem:article-title]: https://jats.nlm.nih.gov/publishing/tag-library/1.2/element/article-title.html [elem:copyright-holder]: https://jats.nlm.nih.gov/publishing/tag-library/1.2/element/copyright-holder.html [elem:copyright-statement]: https://jats.nlm.nih.gov/publishing/tag-library/1.2/element/copyright-statement.html [elem:copyright-year]: https://jats.nlm.nih.gov/publishing/tag-library/1.2/element/copyright-year.html @@ -400,6 +421,7 @@ Required metadata values: [elem:string-name]: https://jats.nlm.nih.gov/publishing/tag-library/1.2/element/string-name.html [elem:subj-group]: https://jats.nlm.nih.gov/publishing/tag-library/1.2/element/subj-group.html [elem:subject]: https://jats.nlm.nih.gov/publishing/tag-library/1.2/element/subject.html +[elem:subtitle]: https://jats.nlm.nih.gov/publishing/tag-library/1.2/element/subtitle.html [elem:surname]: https://jats.nlm.nih.gov/publishing/tag-library/1.2/element/surname.html [elem:xref]: https://jats.nlm.nih.gov/publishing/tag-library/1.2/element/xref.html diff --git a/doc/lua-filters.md b/doc/lua-filters.md index e2a65fe1e..eba4dcfe9 100644 --- a/doc/lua-filters.md +++ b/doc/lua-filters.md @@ -109,7 +109,8 @@ element filtering function. In other words, filter entries will be called for each corresponding element in the document, getting the respective element as input. -The return of a filter function must be one of the following: +The return value of a filter function must be one of the +following: - nil: this means that the object should remain unchanged. - a pandoc object: this must be of the same type as the input @@ -173,7 +174,26 @@ This functionality has been added in pandoc 2.9.2. [Inlines filter example]: #remove-spaces-before-citations -## Execution Order +## Traversal order + +The traversal order of filters can be selected by setting the key +`traverse` to either `'topdown'` or `'typewise'`; the default is +`'typewise'`. + +Example: + +``` lua +local filter = { + traverse = 'topdown', + -- ... filter functions ... +} +return {filter} +``` + +Support for this was added in pandoc 2.16.3; previous versions +ignore the `traverse` setting. + +### Typewise traversal Element filter functions within a filter set are called in a fixed order, skipping any which are not present: @@ -203,6 +223,31 @@ All functions in set (1) are thus run before those in (2), causing the filter function for *Meta* to be run before the filtering of *Str* elements is started. +### Topdown traversal + +It is sometimes more natural to traverse the document tree +depth-first from the root towards the leaves, and all in a single +run. + +For example, a block list `[Plain [Str "a"], Para [Str +"b"]]`{.haskell} will try the following filter functions, in +order: `Blocks`, `Plain`, `Inlines`, `Str`, `Para`, `Inlines`, +`Str`. + +Topdown traversals can be cut short by returning `false` as a +second value from the filter function. No child-element of +the returned element is processed in that case. + +For example, to exclude the contents of a footnote from being +processed, one might write + +``` lua +traverse = 'topdown' +function Note (n) + return n, false +end +``` + ## Global variables Pandoc passes additional data to Lua filters by setting global @@ -245,6 +290,38 @@ variables. variable is of type [CommonState] and is read-only. +`pandoc` +: The *pandoc* module, described in the next section, is + available through the global `pandoc`. The other modules + described herein are loaded as subfields under their + respective name. + +`lpeg` +: This variable holds the `lpeg` module, a package based on + Parsing Expression Grammars (PEG). It provides excellent + parsing utilities and is documented on the official [LPeg + homepage]. Pandoc uses a built-int version of the library, + unless it has been configured by the package maintainer to + rely on a system-wide installation. + + Note that the result of `require 'lpeg'` is not necessarily + equal to this value; the `require` mechanism prefers the + system's lpeg library over the built-in version. + +`re` +: Contains the LPeg.re module, which is built on top of LPeg + and offers an implementation of a [regex engine]. Pandoc + uses a built-in version of the library, unless it has been + configured by the package maintainer to rely on a system-wide + installation. + + Note that the result of `require 're` is not necessarily + equal to this value; the `require` mechanism prefers the + system's lpeg library over the built-in version. + +[LPeg homepage]: http://www.inf.puc-rio.br/~roberto/lpeg/ +[regex engine]: http://www.inf.puc-rio.br/~roberto/lpeg/re.html + # Pandoc Module The `pandoc` Lua module is loaded into the filter's Lua @@ -491,8 +568,9 @@ will output: This is the filter we use when converting `MANUAL.txt` to man pages. It converts level-1 headers to uppercase (using -`walk_block` to transform inline elements inside headers), -removes footnotes, and replaces links with regular text. +[`walk`](#type-block:walk) to transform inline elements inside +headers), removes footnotes, and replaces links with regular +text. ``` lua -- we use preloaded text to get a UTF-8 aware 'upper' function @@ -500,10 +578,11 @@ local text = require('text') function Header(el) if el.level == 1 then - return pandoc.walk_block(el, { + return el:walk { Str = function(el) return pandoc.Str(text.upper(el.text)) - end }) + end + } end end @@ -579,7 +658,7 @@ wordcount = { function Pandoc(el) -- skip metadata, just count body: - pandoc.walk_block(pandoc.Div(el.blocks), wordcount) + el.blocks:walk(wordcount) print(words .. " words in body") os.exit(0) end @@ -695,12 +774,12 @@ end function RawBlock(el) if starts_with('\\begin{tikzpicture}', el.text) then local filetype = extension_for[FORMAT] or 'svg' - local fname = system.get_working_directory() .. '/' .. - pandoc.sha1(el.text) .. '.' .. filetype + local fbasename = pandoc.sha1(el.text) .. '.' .. filetype + local fname = system.get_working_directory() .. '/' .. fbasename if not file_exists(fname) then tikz2image(el.text, filetype, fname) end - return pandoc.Para({pandoc.Image({}, fname)}) + return pandoc.Para({pandoc.Image({}, fbasename)}) else return el end @@ -752,8 +831,8 @@ Usage: Pandoc document Values of this type can be created with the -[`pandoc.Pandoc`](#pandoc.pandoc) constructor. Object equality is -determined via [`pandoc.utils.equals`]. +[`pandoc.Pandoc`](#pandoc.pandoc) constructor. Pandoc values are +equal in Lua if and only if they are equal in Haskell. `blocks` : document content ([List] of [Blocks]) @@ -761,83 +840,113 @@ determined via [`pandoc.utils.equals`]. `meta` : document meta information ([Meta] object) -## Meta {#type-meta} -Meta information on a document; string-indexed collection of -[MetaValues]. +### walk {#type-pandoc:walk} -Values of this type can be created with the -[`pandoc.Meta`](#pandoc.meta) constructor. Object equality is -determined via [`pandoc.utils.equals`]. +`walk(self, lua_filter)` -## MetaValue {#type-metavalue} +Applies a Lua filter to the Pandoc element. Just as for +full-document filters, the order in which elements are traversed +can be controlled by setting the `traverse` field of the filter; +see the section on [traversal order][Traversal order]. -Document meta information items. +Parameters: -Object equality is determined via [`pandoc.utils.equals`]. +`self` +: the element ([Pandoc](#type-pandoc)) -### MetaBlocks {#type-metablocks} +`lua_filter` +: map of filter functions (table) -A list of blocks usable as meta value ([List] of [Blocks]). +Result: -Fields: +- filtered document ([Pandoc][]) -`tag`, `t` -: the literal `MetaBlocks` (string) +Usage: -### MetaBool {#type-metabool} + -- returns `pandoc.Pandoc{pandoc.Para{pandoc.Str 'Bye'}}` + return pandoc.Pandoc{pandoc.Para('Hi')}:walk { + Str = function (_) return 'Bye' end, + } -Alias for Lua boolean, i.e. the values `true` and `false`. -### MetaInlines {#type-metainlines} +## Meta {#type-meta} -List of inlines used in metadata ([List] of [Inlines]) +Meta information on a document; string-indexed collection of +[MetaValues]. Values of this type can be created with the -[`pandoc.MetaInlines`](#pandoc.metainlines) constructor. +[`pandoc.Meta`](#pandoc.meta) constructor. Meta values are equal +in Lua if and only if they are equal in Haskell. -Fields: - -`tag`, `t` -: the literal `MetaInlines` (string) +## MetaValue {#type-metavalue} -### MetaList {#type-metalist} +Document meta information items. This is not a separate type, but +describes a set of types that can be used in places were a +MetaValue is expected. The types correspond to the following +Haskell type constructors: + +- boolean → MetaBool +- string or number → MetaString +- Inlines → MetaInlines +- Blocks → MetaBlocks +- List/integer indexed table → MetaList +- string-indexed table → MetaMap + +The corresponding constructors +[`pandoc.MetaBool`](#pandoc.metabool), +[`pandoc.MetaString`](#pandoc.metastring), +[`pandoc.MetaInlines`](#pandoc.metainlines), +[`pandoc.MetaBlocks`](#pandoc.metablocks), +[`pandoc.MetaList`](#pandoc.metalist), and +[`pandoc.MetaMap`](#pandoc.metamap) +can be used to ensure that a value is treated in the intended +way. E.g., an empty table is normally treated as a `MetaMap`, but +can be made into an empty `MetaList` by calling +`pandoc.MetaList{}`. However, the same can be accomplished by +using the generic functions like `pandoc.List`, `pandoc.Inlines`, +or `pandoc.Blocks`. -A list of other metadata values ([List] of [MetaValues]). +## Block {#type-block} -Values of this type can be created with the -[`pandoc.MetaList`](#pandoc.metalist) constructor. +Block values are equal in Lua if and only if they are equal in +Haskell. -Fields: +### Common methods -`tag`, `t` -: the literal `MetaList` (string) +#### walk {#type-block:walk} -All methods available for [List]s can be used on this type as -well. +`walk(self, lua_filter)` -### MetaMap {#type-metamap} +Applies a Lua filter to the block element. Just as for +full-document filters, the order in which elements are traversed +can be controlled by setting the `traverse` field of the filter; +see the section on [traversal order][Traversal order]. -A string-indexed map of meta-values. (table). +Note that the filter is applied to the subtree, but not to the +`self` block element. The rationale is that otherwise the element +could be deleted by the filter, or replaced with multiple block +elements, which might lead to possibly unexpected results. -Values of this type can be created with the -[`pandoc.MetaMap`](#pandoc.metamap) constructor. +Parameters: -Fields: +`self` +: the element ([Block](#type-block)) -`tag`, `t` -: the literal `MetaMap` (string) +`lua_filter` +: map of filter functions (table) -*Note*: The fields will be shadowed if the map contains a field -with the same name as those listed. +Result: -### MetaString {#type-metastring} +- filtered block ([Block][]) -Plain Lua string value (string). +Usage: -## Block {#type-block} + -- returns `pandoc.Para{pandoc.Str 'Bye'}` + return pandoc.Para('Hi'):walk { + Str = function (_) return 'Bye' end, + } -Object equality is determined via [`pandoc.utils.equals`]. ### BlockQuote {#type-blockquote} @@ -1133,9 +1242,94 @@ left-aligned, right-aligned, and centered, respectively. The default alignment is `AlignDefault` (often equivalent to centered). +## Blocks {#type-blocks} + +List of [Block] elements, with the same methods as a generic +[List](#type-list). It is usually not necessary to create values +of this type in user scripts, as pandoc can convert other types +into Blocks wherever a value of this type is expected: + +- a list of [Block] (or Block-like) values is used directly; +- a single [Inlines] value is wrapped into a [Plain] element; +- string values are turned into an [Inlines] value by splitting + the string into words (see [Inlines](#type-inlines)), and + then wrapping the result into a Plain singleton. + +### Methods + +Lists of type `Blocks` share all methods available in generic +lists, see the [`pandoc.List` module](#module-pandoc.list). + +Additionally, the following methods are available on Blocks +values: + +#### walk {#type-blocks:walk} + +`walk(self, lua_filter)` + +Applies a Lua filter to the Blocks list. Just as for +full-document filters, the order in which elements are traversed +can be controlled by setting the `traverse` field of the filter; +see the section on [traversal order][Traversal order]. + +Parameters: + +`self` +: the list ([Blocks](#type-blocks)) + +`lua_filter` +: map of filter functions (table) + +Result: + +- filtered list ([Blocks](#type-blocks)) + +Usage: + + -- returns `pandoc.Blocks{pandoc.Para('Salve!')}` + return pandoc.Blocks{pandoc.Plain('Salve!)}:walk { + Plain = function (p) return pandoc.Para(p.content) end, + } + ## Inline {#type-inline} -Object equality is determined via [`pandoc.utils.equals`]. +Inline values are equal in Lua if and only if they are equal in +Haskell. + +### Common methods + +#### walk {#type-inline:walk} + +`walk(self, lua_filter)` + +Applies a Lua filter to the Inline element. Just as for +full-document filters, the order in which elements are traversed +can be controlled by setting the `traverse` field of the filter; +see the section on [traversal order][Traversal order]. + +Note that the filter is applied to the subtree, but not to the +`self` inline element. The rationale is that otherwise the +element could be deleted by the filter, or replaced with multiple +inline elements, which might lead to possibly unexpected results. + +Parameters: + +`self` +: the element ([Inline](#type-inline)) + +`lua_filter` +: map of filter functions (table) + +Result: + +- filtered inline element ([Inline][]) + +Usage: + + -- returns `pandoc.SmallCaps('SPQR)` + return pandoc.SmallCaps('spqr'):walk { + Str = function (s) return string.upper(s.text) end, + } ### Cite {#type-cite} @@ -1206,9 +1400,6 @@ Values of this type can be created with the Fields: -`attr` -: attributes ([Attr]) - `caption` : text used to describe the image ([List] of [Inlines]) @@ -1216,7 +1407,10 @@ Fields: : path to the image file (string) `title` -: brief image description +: brief image description (string) + +`attr` +: attributes ([Attr]) `identifier` : alias for `attr.identifier` (string) @@ -1503,6 +1697,57 @@ Fields: `tag`, `t` : the literal `Underline` (string) +## Inlines {#type-inlines} + +List of [Inline] elements, with the same methods as a generic +[List](#type-list). It is usually not necessary to create values +of this type in user scripts, as pandoc can convert other types +into Blocks wherever a value of this type is expected: + +- lists of [Inline] (or Inline-like) values are used directly; +- single [Inline] values are converted into a list containing + just that element; +- String values are split into words, converting line breaks + into [SoftBreak](#type-softbreak) elements, and other + whitespace characters into [Spaces](#type-space). + +### Methods + +Lists of type `Inlines` share all methods available in generic +lists, see the [`pandoc.List` module](#module-pandoc.list). + +Additionally, the following methods are available on *Inlines* +values: + +#### walk {#type-inlines:walk} + +`walk(self, lua_filter)` + +Applies a Lua filter to the Inlines list. Just as for +full-document filters, the order in which elements are handled +are are Inline → Inlines → Block → Blocks. The filter is applied +to all list items *and* to the list itself. + +Parameters: + +`self` +: the list ([Inlines](#type-inlines)) + +`lua_filter` +: map of filter functions (table) + +Result: + +- filtered list ([Inlines](#type-inlines)) + +Usage: + + -- returns `pandoc.Inlines{pandoc.SmallCaps('SPQR')}` + return pandoc.Inlines{pandoc.Emph('spqr')}:walk { + Str = function (s) return string.upper(s.text) end, + Emph = function (e) return pandoc.SmallCaps(e.content) end, + } + ## Element components @@ -1522,7 +1767,8 @@ This also works when using the `attr` setter: local span = pandoc.Span 'text' span.attr = {id = 'text', class = 'a b', other_attribute = '1'} -Object equality is determined via [`pandoc.utils.equals`]. +Attr values are equal in Lua if and only if they are equal in +Haskell. Fields: @@ -1540,6 +1786,9 @@ Fields: List of key/value pairs. Values can be accessed by using keys as indices to the list table. +Attributes values are equal in Lua if and only if they are equal +in Haskell. + ### Caption {#type-caption} The caption of a table, with an optional short caption. @@ -1568,12 +1817,21 @@ Fields: : cell contents (list of [Blocks]). `col_span` -: number of columns occupied by the cell; the height of the cell - (integer). +: number of columns spanned by the cell; the width of the cell + in columns (integer). `row_span` -: number of rows occupied by the cell; the height of the cell - (integer). +: number of rows spanned by the cell; the height of the cell in + rows (integer). + +`identifier` +: alias for `attr.identifier` (string) + +`classes` +: alias for `attr.classes` ([List] of strings) + +`attributes` +: alias for `attr.attributes` ([Attributes]) ### Citation {#type-citation} @@ -1582,7 +1840,8 @@ Single citation entry Values of this type can be created with the [`pandoc.Citation`](#pandoc.citation) constructor. -Object equality is determined via [`pandoc.utils.equals`]. +Citation values are equal in Lua if and only if they are equal in +Haskell. Fields: @@ -1610,7 +1869,8 @@ Fields: Column alignment and width specification for a single table column. -This is a pair with the following components: +This is a pair, i.e., a plain table, with the following +components: 1. cell alignment ([Alignment]). 2. table column width, as a fraction of the total table width @@ -1623,8 +1883,6 @@ List attributes Values of this type can be created with the [`pandoc.ListAttributes`](#pandoc.listattributes) constructor. -Object equality is determined via [`pandoc.utils.equals`]. - Fields: `start` @@ -1643,10 +1901,13 @@ Fields: A table row. -Tuple fields: +Fields: -1. row attributes -2. row cells (list of [Cells]) +`attr` +: element attributes ([Attr][]) + +`cells` +: list of table cells ([List][] of [Cell][]s) ### TableBody {#type-tablebody} @@ -1656,35 +1917,59 @@ number of row header columns. Fields: `attr` -: table body attributes ([Attr]) +: table body attributes ([Attr][]) `body` -: table body rows (list of [Rows]) +: table body rows ([List][] of [Row][]s) `head` -: intermediate head (list of [Rows]) +: intermediate head ([List][] of [Row][]s) `row_head_columns` : number of columns taken up by the row head of each row of a - [TableBody]. The row body takes up the remaining columns. + [TableBody][]. The row body takes up the remaining columns. ### TableFoot {#type-tablefoot} The foot of a table. -This is a pair with the following components: +Fields: -1. attributes -2. foot rows ([Rows]) +`attr` +: element attributes ([Attr][]) + +`rows` +: list of rows ([List][] of [Row][]s) + +`identifier` +: alias for `attr.identifier` (string) + +`classes` +: alias for `attr.classes` ([List][] of strings) + +`attributes` +: alias for `attr.attributes` ([Attributes][]) ### TableHead {#type-tablehead} The head of a table. -This is a pair with the following components: +Fields: + +`attr` +: element attributes ([Attr][]) + +`rows` +: list of rows ([List][] of [Row][]s) + +`identifier` +: alias for `attr.identifier` (string) + +`classes` +: alias for `attr.classes` ([List][] of strings) -1. attributes -2. head rows ([Rows]) +`attributes` +: alias for `attr.attributes` ([Attributes][]) ## ReaderOptions {#type-readeroptions} @@ -1836,7 +2121,7 @@ Values of this type can be created with the `must_be_at_least(actual, expected [, error_message])` Raise an error message if the actual version is older than the -expected version; does nothing if actual is equal to or newer +expected version; does nothing if `actual` is equal to or newer than the expected version. Parameters: @@ -1867,6 +2152,7 @@ Usage: [Block]: #type-block [Blocks]: #type-block [Caption]: #type-caption +[Cell]: #type-cell [Cells]: #type-cell [Citation]: #type-citation [Citations]: #type-citation @@ -1884,6 +2170,8 @@ Usage: [LogMessage]: #type-logmessage [Pandoc]: #type-pandoc [Para]: #type-para +[Plain]: #type-plain +[Row]: #type-row [Rows]: #type-row [SimpleTable]: #type-simpletable [Table]: #type-table @@ -1891,7 +2179,6 @@ Usage: [TableFoot]: #type-tablefoot [TableHead]: #type-tablehead [Version]: #type-version -[`pandoc.utils.equals`]: #pandoc.utils.equals # Module text @@ -1978,71 +2265,83 @@ format, and functions to filter and modify a subtree. [`MetaBlocks (blocks)`]{#pandoc.metablocks} -: Meta blocks +: Creates a value to be used as a MetaBlocks value in meta + data; creates a copy of the input list via `pandoc.Blocks`, + discarding all non-list keys. Parameters: `blocks`: : blocks - Returns: [MetaBlocks] object + Returns: [Blocks](#type-blocks) [`MetaInlines (inlines)`]{#pandoc.metainlines} -: Meta inlines +: Creates a value to be used as a MetaInlines value in meta + data; creates a copy of the input list via `pandoc.Inlines`, + discarding all non-list keys. Parameters: `inlines`: : inlines - Returns: [MetaInlines] object + Returns: [Inlines](#types-inlines) [`MetaList (meta_values)`]{#pandoc.metalist} -: Meta list +: Creates a value to be used as a MetaList in meta data; + creates a copy of the input list via `pandoc.List`, + discarding all non-list keys. Parameters: `meta_values`: : list of meta values - Returns: [MetaList] object + Returns: [List] [`MetaMap (key_value_map)`]{#pandoc.metamap} -: Meta map +: Creates a value to be used as a MetaMap in meta data; creates + a copy of the input table, keeping only pairs with string + keys and discards all other keys. Parameters: `key_value_map`: : a string-indexed map of meta values - Returns: [MetaMap] object + Returns: table [`MetaString (str)`]{#pandoc.metastring} -: Creates string to be used in meta data. +: Creates a value to be used as a MetaString in meta data; this + is the identity function for boolean values and exists only + for completeness. Parameters: `str`: : string value - Returns: [MetaString] object + Returns: string [`MetaBool (bool)`]{#pandoc.metabool} -: Creates boolean to be used in meta data. +: Creates a value to be used as MetaBool in meta data; this is + the identity function for boolean values and exists only for + completeness. Parameters: `bool`: : boolean value - Returns: [MetaBool] object + Returns: boolean -## Blocks +## Block [`BlockQuote (content)`]{#pandoc.blockquote} @@ -2222,6 +2521,20 @@ format, and functions to filter and modify a subtree. Returns: [Table](#type-table) object +## Blocks + +[`Blocks (block_like_elements)`]{#pandoc.blocks} + +: Creates a [Blocks](#type-blocks) list. + + Parameters: + + `block_like_elements`: + : List where each element can be treated as a [Block] + value, or a single such value. + + Returns: [Blocks] list + ## Inline [`Cite (content, citations)`]{#pandoc.cite} @@ -2511,6 +2824,27 @@ format, and functions to filter and modify a subtree. Returns: [Underline](#type-underline) object +## Inlines + +[`Inlines (inline_like_elements)`]{#pandoc.inlines} + +: Converts its argument into an [Inlines](#type-inlines) list: + + - copies a list of [Inline] elements into a fresh list; any + string `s` within the list is treated as `pandoc.Str(s)`; + - turns a single [Inline] into a singleton list; + - splits a string into `Str`-wrapped words, treating + interword spaces as `Space`s or `SoftBreak`s. + + Parameters: + + `inline_like_elements`: + : List where each element can be treated as an [Inline] + values, or just a single such value. + + Returns: [Inlines] list + + ## Element components [`Attr ([identifier[, classes[, attributes]]])`]{#pandoc.attr} @@ -2530,6 +2864,33 @@ format, and functions to filter and modify a subtree. Returns: [Attr](#type-attr) object +[`Cell (blocks[, align[, rowspan[, colspan[, attr]]]])`]{#pandoc.attr} + +: Create a new table cell. + + Parameters: + + `blocks`: + : cell contents (list of [Blocks]) + + `align`: + : text alignment; defaults to `AlignDefault` (Alignment) + + `rowspan`: + : number of rows occupied by the cell; defaults to `1` + (integer) + + `colspan`: + : number of columns spanned by the cell; defaults to `1` + (integer) + + `attr`: + : cell attributes ([Attr](#type-attr)) + + Returns: + + - [Cell](#type-cell) object + [`Citation (id, mode[, prefix[, suffix[, note_num[, hash]]]])`]{#pandoc.citation} : Creates a single citation. @@ -2573,6 +2934,42 @@ format, and functions to filter and modify a subtree. Returns: [ListAttributes](#type-listattributes) object +[`Row ([cells[, attr]])`]{#pandoc.row} + +: Creates a table row. + + Parameters: + + `cells`: + : list of table cells in this row + + `attr`: + : row attributes + +[`TableFoot ([rows[, attr]])`]{#pandoc.tablefoot} + +: Creates a table foot. + + Parameters: + + `rows`: + : list of table rows + + `attr`: + : table foot attributes + +[`TableHead ([rows[, attr]])`]{#pandoc.tablehead} + +: Creates a table head. + + Parameters: + + `rows`: + : list of table rows + + `attr`: + : table head attributes + ## Legacy types [`SimpleTable (caption, aligns, widths, headers, rows)`]{#pandoc.simpletable} @@ -2732,7 +3129,33 @@ format, and functions to filter and modify a subtree. [`sha1`]{#pandoc.sha1} : Alias for [`pandoc.utils.sha1`](#pandoc.utils.sha1) - (DEPRECATED). + (DEPRECATED, use `pandoc.utils.sha1` instead). + +## Other constructors + +[`ReaderOptions (opts)`]{#pandoc.readeroptions} + +: Creates a new [ReaderOptions] value. + + Parameters + + `opts`: + : Either a table with a subset of the properties of a + [ReaderOptions] object, or another ReaderOptions object. + Uses the defaults specified in the manual for all + properties that are not explicitly specified. Throws an + error if a table contains properties which are not present + in a ReaderOptions object. ([ReaderOptions]|table) + + Returns: new [ReaderOptions] object + + Usage: + + -- copy of the reader options that were defined on the command line. + local cli_opts = pandoc.ReaderOptions(PANDOC_READER_OPTIONS) + + -- default reader options, but columns set to 66. + local short_colums_opts = pandoc.ReaderOptions {columns = 66} ## Helper functions @@ -2805,17 +3228,23 @@ Returns: the transformed inline element ### read {#pandoc.read} -`read (markup[, format])` +`read (markup[, format[, reader_options]])` Parse the given string into a Pandoc document. Parameters: `markup`: -: the markup to be parsed +: the markup to be parsed (string) `format`: -: format specification, defaults to `"markdown"`. +: format specification, defaults to `"markdown"` (string) + +`reader_options`: +: options passed to the reader; may be a ReaderOptions object or + a table with a subset of the keys and values of a + ReaderOptions object; defaults to the default values + documented in the manual. ([ReaderOptions]|table) Returns: pandoc document @@ -2828,6 +3257,8 @@ Usage: -- The inline element in that block is an `Emph` assert(block.content[1].t == "Emph") +[ReaderOptions]: #type-readeroptions + # Module pandoc.utils This module exposes internal pandoc functions and utility @@ -2882,12 +3313,13 @@ Test equality of AST elements. Elements in Lua are considered equal if and only if the objects obtained by unmarshaling are equal. +**This function is deprecated.** Use the normal Lua `==` equality +operator instead. + Parameters: `element1`, `element2`: -: Objects to be compared. Acceptable input types are [Pandoc], - [Meta], [MetaValue], [Block], [Inline], [Attr], - [ListAttributes], and [Citation]. +: Objects to be compared (any type) Returns: @@ -2927,9 +3359,21 @@ non-null, `Header` levels will be reorganized so that there are no gaps, and so that the base level is the level specified. +Parameters: + +`number_sections` +: whether section divs should get an additional `number` + attribute containing the section number. (boolean) + +`base_level` +: shift top-level headings to this level. (integer|nil) + +`blocks` +: list of blocks to process ([Blocks](#type-blocks)) + Returns: -- List of [Blocks](#type-block). +- [Blocks](#type-blocks). Usage: @@ -2939,6 +3383,38 @@ Usage: } local newblocks = pandoc.utils.make_sections(true, 1, blocks) +### references {#pandoc.references} + +`references (doc)` + +Get references defined inline in the metadata and via an external +bibliography. Only references that are actually cited in the +document (either with a genuine citation or with `nocite`) are +returned. URL variables are converted to links. + +The structure used represent reference values corresponds to that +used in CSL JSON; the return value can be use as `references` +metadata, which is one of the values used by pandoc and citeproc +when generating bibliographies. + +Parameters: + +`doc`: +: document ([Pandoc](#type-pandoc)) + +Returns: + +- list of references. (table) + +Usage: + + -- Include all cited references in document + function Pandoc (doc) + doc.meta.references = pandoc.utils.references(doc) + doc.meta.bibliography = nil + return doc + end + ### run\_json\_filter {#pandoc.utils.run_json_filter} `run_json_filter (doc, filter[, args])` @@ -3050,6 +3526,39 @@ Usage: -- create normal table block again table = pandoc.utils.from_simple_table(simple) +### type {#pandoc.utils.type} + +`type (value)` + +Pandoc-friendly version of Lua's default `type` function, +returning the type of a value. This function works with all types +listed in section [Lua type reference][], except if noted +otherwise. + +The function works by checking the metafield `__name`. If the +argument has a string-valued metafield `__name`, then it returns +that string. Otherwise it behaves just like the normal `type` +function. + +Parameters: + +`value` +: any Lua value + +Returns: + +- type of the given value (string) + +Usage: + + -- Prints one of 'string', 'boolean', 'Inlines', 'Blocks', + -- 'table', and 'nil', corresponding to the Haskell constructors + -- MetaString, MetaBool, MetaInlines, MetaBlocks, MetaMap, + -- and an unset value, respectively. + function Meta (meta) + print('type of metavalue `author`:', pandoc.utils.type(meta.author)) + end + # Module pandoc.mediabag The `pandoc.mediabag` module allows accessing pandoc's media @@ -3230,6 +3739,22 @@ methods and convenience functions. Returns: a new list containing all elements from list1 and list2 +[`pandoc.List:__eq (a, b)`]{#pandoc.list:__concat} + +: Compares two lists for equality. The lists are taken as equal + if and only if they are of the same type (i.e., have the same + non-nil metatable), have the same length, and if all elements + are equal. + + Parameters: + + `a`, `b`: + : any Lua object + + Returns: + + - `true` if the two lists are equal, `false` otherwise. + ## Methods [`pandoc.List:clone ()`]{#pandoc.list:clone} @@ -3442,7 +3967,7 @@ filepath Returns: -- `true` iff `filepath` is an absolute path, `false` otherwise. +- `true` if `filepath` is an absolute path, `false` otherwise. (boolean) ### is_relative (filepath) {#pandoc.path.is_relative} @@ -3456,7 +3981,7 @@ filepath Returns: -- `true` iff `filepath` is a relative path, `false` otherwise. +- `true` if `filepath` is a relative path, `false` otherwise. (boolean) ### join (filepaths) {#pandoc.path.join} diff --git a/doc/short-guide-to-pandocs-sources.md b/doc/short-guide-to-pandocs-sources.md new file mode 100644 index 000000000..722063e03 --- /dev/null +++ b/doc/short-guide-to-pandocs-sources.md @@ -0,0 +1,259 @@ +--- +title: Short guide to pandoc's sources +subtitle: Laying a path for code wanderers +author: Albert Krewinkel +date: 2021-06-07 +--- + +Pandoc, the universal document converter, can serve as a nice intro +into functional programming with Haskell. For many contributors, +including the author of this guide, pandoc was their first real +exposure to this language. Despite its impressive size of more than +60.000 lines of Haskell code (excluding the test suite), pandoc is +still very approachable due to its modular architecture. It can +serve as an interesting subject for learning. + +This guide exists to navigate the large amount of sources, to +lay-out a path that can be followed for learning, and to explain the +underlying concepts. + +A basic understanding of Haskell and of pandoc's functionality is +assumed. + +# Getting the code + +Pandoc has a publicly accessible git repository on GitHub: +<https://github.com/jgm/pandoc>. To get a local copy of the source: + + git clone https://github.com/jgm/pandoc + +The source for the main pandoc program is `app/pandoc.hs`. The +source for the pandoc library is in `src/`, the source for the tests +is in `test/`, and the source for the benchmarks is in `benchmark/`. + +Core type definitions are in the separate [*pandoc-types* repo]. +Get it with + + git clone https://github.com/jgm/pandoc-types + +The organization of library and test sources is identical to the +main repo. + +[*pandoc-types* repo]: https://github.com/jgm/pandoc-types + +# Document representation + +The way documents are represented in pandoc is part of its success. +Every document is read into one central data structure, the +so-called *abstract syntax tree* (AST). + +The AST is defined in module `Text.Pandoc.Definition` in package +[*pandoc-types*]. + +It is not necessary to understand the AST in detail, just check-out +the following points: + + * The [`Pandoc`][def-Pandoc] type serves as the central structure. + + * A document has metadata and a list of "block" elements. + + * There are various types of [blocks][def-Block]; some contain raw + text, others contain "Inline" elements. + + * [Inlines][def-Inline] are "running text", with many different + types. The most important contstructors are `Str` (a word), + `Space` (a space char), `Emph` (emphasized text), and `Strong` + (strongly emphasized text). It's worth checking their + definitions. + + * Element attributes are captured as [`Attr`][def-Attr], which is a + triple of the element identifier, its classes, and the key-value + pairs.^[For plans to change this see [jgm/pandoc-types#88].] + +[*pandoc-types*]: https://hackage.haskell.org/package/pandoc-types +[jgm/pandoc-types#88]: https://github.com/jgm/pandoc-types/issues/88 +[def-Pandoc]: https://hackage.haskell.org/package/pandoc-types/docs/src/Text.Pandoc.Definition.html#Pandoc +[def-Block]: https://hackage.haskell.org/package/pandoc-types/docs/src/Text.Pandoc.Definition.html#Block +[def-Inline]: https://hackage.haskell.org/package/pandoc-types/docs/src/Text.Pandoc.Definition.html#Inline +[def-Attr]: https://hackage.haskell.org/package/pandoc-types/docs/src/Text.Pandoc.Definition.html#Attr + +# Basic architecture + +Take a look at pandoc's source files. The code is below the `src` +directory, in the `Text.Pandoc` module. The basic flow is: + + 1. Document is parsed into the internal representation by a + *reader*; + + 2. the document AST is modified (optional); + + 3. then the internal respresentation is converted into the target + format by a *writer*. + +The [*readers*] can be found in `Text.Pandoc.Readers`, while the +[*writers*] are submodules of `Text.Pandoc.Writers`. The document +modification step is powerful and used in different ways, e.g., in +[*filters*]. + +These parts are the "muscles" of pandoc, which do the heavy lifting. +Everything else can be thought of as the bones and fibers to which +these parts are attached and which make them usable. + +# Writers + +Writers are usually simpler than readers and therefore easier to +grasp. + +Broadly speaking, there are three kind of writers: + + 1. Text writers: these are used for lightweight markup languages + and generate plain text output. Examples: Markdown, Org, + reStructuredText. + 2. XML writers, which convert the AST into structured XML. + Examples: HTML, JATS. + 3. Binary writers, which are like XML writers, but combine the + output with other data and zip it into a single file. Examples: + docx, epub. + + Most writers follow a common pattern and have three main functions: + docTo*Format*, blockTo*Format* and inlineTo*Format*. Each converts + the `Pandoc`, `Block`, and `Inline` elements, respectively. The + *XWiki* and *TEI* writers are comparatively simple and suitable + samples when taking a first look. + + Most writers are self-contained in that most of the conversion code + is within a single module. However, newer writers often use a + different setup: those are built around modules from an external + package. The details of how to serialize the document are not in + the writer module itself, but in an external module. The writer + only has to convert pandoc's AST into the document representation + used by the module. Good examples: commonmark, jira. + +## DocLayout + +All writers build on the `doclayout` package. It can be thought of +as a pretty printer with extra features suitable for lightweight +markup languages. E.g., multiple blank lines are collapsed into a +single blank line, unless multiple blank lines are specifically +requested. This simplifies the code significantly. + +See the repo at https://github.com/jgm/doclayout, and the [hackage +documentation](https://hackage.haskell.org/package/doclayout) + +# Readers + +The same distinction that applies to writers also applies to +readers. Readers for XML formats use XML parsing libraries, while +plain text formats are parsed with [parsec]. + +## Builders + +The plain type constructors from the [`Text.Pandoc.Definition`] +module can be difficult to use, which is why the module +[`Text.Pandoc.Builder`] exists. It offers functions to conveniently +build and combine AST elements. + +The most interesting and important types in `Builder` are +[`Blocks`][def-Blocks] and [`Inlines`][def-Inlines]. All type +constructors use simple lists for sequences of AST elements. +Building lists can be awkward and often comes with bad performance +characteristics, esp. when appending. The `Blocks` and `Inlines` +types are better suited for these operations and are therefore used +extensively in builder functions. + +The builder functions are named with the convention that the suffix +`With` is added if the first argument is an `Attr`; there is usually +another function without that suffix, creating an element with no +attributes. + +[def-Blocks]: https://hackage.haskell.org/package/pandoc-types/docs/src/Text.Pandoc.Builder.html#Blocks +[def-Inlines]: https://hackage.haskell.org/package/pandoc-types/docs/src/Text.Pandoc.Builder.html#Inlines +[parsec]: https://hackage.haskell.org/package/parsec + +# PandocMonad + +Looking at the readers and writers, one will notice that they all +operate within the `PandocMonad` type class. This class gives access +to options, file operations, and other shared information. The +typeclass has two main implementations: one operates in IO, so on +the "real world", while the other provides a pure functional +interface, suitable to "mock" an environment for testing. + +# Document modifications + +One of the big advantages of a central document structure is that it +allows document modifications via a unified interface. This section +describes the multiple ways in which the document can be altered. + +## Walkable + +Document traversal happens through the `Walkable` class in module +`Text.Pandoc.Walk` ([*pandoc-types* package]). + +## Transformations + +Transformations are simple modifications controllable through +command-line options. + +## Filters + +Filters allow to use Lua or any external language to perform +document transformations. + + +[`Text.Pandoc.Builder`]: https://hackage.haskell.org/package/pandoc-types/docs/Text-Pandoc-Builder.html +[`Text.Pandoc.Definition`]: https://hackage.haskell.org/package/pandoc-types/docs/Text-Pandoc-Definition.html + +# Module overview + +The library is structured as follows: + + - `Text.Pandoc` is a top-level module that exports what is needed + by most users of the library. Any patches that add new readers + or writers will need to make changes here, too. + - `Text.Pandoc.Definition` (in `pandoc-types`) defines the types + used for representing a pandoc document. + - `Text.Pandoc.Builder` (in `pandoc-types`) provides functions for + building pandoc documents programmatically. + - `Text.Pandoc.Generics` (in `pandoc-types`) provides functions allowing + you to promote functions that operate on parts of pandoc documents + to functions that operate on whole pandoc documents, walking the + tree automatically. + - `Text.Pandoc.Readers.*` are the readers, and `Text.Pandoc.Writers.*` + are the writers. + - `Text.Pandoc.Citeproc.*` contain the code for citation handling, + including an interface to the [citeproc] library. + - `Text.Pandoc.Data` is used to embed data files when the `embed_data_files` + cabal flag is used. + - `Text.Pandoc.Emoji` is a thin wrapper around [emojis]. + - `Text.Pandoc.Highlighting` contains the interface to the + skylighting library, which is used for code syntax highlighting. + - `Text.Pandoc.ImageSize` is a utility module containing functions for + calculating image sizes from the contents of image files. + - `Text.Pandoc.MIME` contains functions for associating MIME types + with extensions. + - `Text.Pandoc.Lua.*` implement Lua filters. + - `Text.Pandoc.Options` defines reader and writer options. + - `Text.Pandoc.PDF` contains functions for producing PDFs. + - `Text.Pandoc.Parsing` contains parsing functions used in multiple readers. + the needs of pandoc. + - `Text.Pandoc.SelfContained` contains functions for making an HTML + file "self-contained," by importing remotely linked images, CSS, + and JavaScript and turning them into `data:` URLs. + - `Text.Pandoc.Shared` is a grab-bag of shared utility functions. + - `Text.Pandoc.Writers.Shared` contains utilities used in writers only. + - `Text.Pandoc.Slides` contains functions for splitting a markdown document + into slides, using the conventions described in the MANUAL. + - `Text.Pandoc.Templates` defines pandoc's templating system. + - `Text.Pandoc.UTF8` contains functions for converting text to and from + UTF8 bytestrings (strict and lazy). + - `Text.Pandoc.Asciify` contains functions to derive ascii versions of + identifiers that use accented characters. + - `Text.Pandoc.UUID` contains functions for generating UUIDs. + - `Text.Pandoc.XML` contains functions for formatting XML. + + +<!-- +# Templating +## DocTemplates +--> |