diff options
author | fiddlosopher <fiddlosopher@788f1e2b-df1e-0410-8736-df70ead52e1b> | 2007-01-01 21:08:12 +0000 |
---|---|---|
committer | fiddlosopher <fiddlosopher@788f1e2b-df1e-0410-8736-df70ead52e1b> | 2007-01-01 21:08:12 +0000 |
commit | a9e32505debea9077c0cca49b84a8c0d363bf3e8 (patch) | |
tree | 09d62a433a71314ff2049c7c6fb8ad55de928255 | |
parent | 0c6c5d528be44a8a5f599aa114e0890bc7e5e684 (diff) | |
download | pandoc-a9e32505debea9077c0cca49b84a8c0d363bf3e8.tar.gz |
Merged changes from docbook branch since r363.
git-svn-id: https://pandoc.googlecode.com/svn/trunk@386 788f1e2b-df1e-0410-8736-df70ead52e1b
-rw-r--r-- | README | 30 | ||||
-rw-r--r-- | debian/changelog | 2 | ||||
-rw-r--r-- | man/man1/pandoc.1 | 8 | ||||
-rw-r--r-- | src/Main.hs | 8 | ||||
-rw-r--r-- | src/Text/Pandoc/Shared.hs | 3 | ||||
-rw-r--r-- | src/Text/Pandoc/Writers/Docbook.hs | 236 | ||||
-rw-r--r-- | src/Text/Pandoc/Writers/HTML.hs | 4 | ||||
-rw-r--r-- | src/headers/DocbookHeader | 3 | ||||
-rw-r--r-- | src/templates/DefaultHeaders.hs | 4 | ||||
-rw-r--r-- | tests/generate.sh | 1 | ||||
-rw-r--r-- | tests/runtests.pl | 5 | ||||
-rw-r--r-- | tests/writer.docbook | 997 | ||||
-rw-r--r-- | web/index.txt | 5 | ||||
-rw-r--r-- | web/mkdemos.sh | 1 |
14 files changed, 1285 insertions, 22 deletions
@@ -6,8 +6,8 @@ Pandoc is a [Haskell] library for converting from one markup format to another, and a command-line tool that uses this library. It can read [markdown] and (subsets of) [reStructuredText], [HTML], and [LaTeX], and it can write [markdown], [reStructuredText], [HTML], [LaTeX], [RTF], -and [S5] HTML slide shows. Pandoc's version of markdown contains some -enhancements, like footnotes and embedded LaTeX. +[DocBook XML], and [S5] HTML slide shows. Pandoc's version of markdown +contains some enhancements, like footnotes and embedded LaTeX. In contrast to existing tools for converting markdown to HTML, which use regex substitutions, Pandoc has a modular design: it consists of a @@ -22,6 +22,7 @@ or output format requires only adding a reader or writer. [HTML]: http://www.w3.org/TR/html40/ [LaTeX]: http://www.latex-project.org/ [RTF]: http://en.wikipedia.org/wiki/Rich_Text_Format +[DocBook XML]: http://www.docbook.org/ [Haskell]: http://www.haskell.org/ (c) 2006 John MacFarlane (jgm at berkeley dot edu). Released under the @@ -107,17 +108,17 @@ To convert `hello.html` from html to markdown: pandoc -f html -t markdown hello.html Supported output formats include `markdown`, `latex`, `html`, `rtf` -(rich text format), `rst` (reStructuredText), and `s5` (which produces -an HTML file that acts like powerpoint). Supported input formats -include `markdown`, `html`, `latex`, and `rst`. Note that the `rst` -reader only parses a subset of reStructuredText syntax. For example, -it doesn't handle tables, definition lists, option lists, or footnotes. -It handles only the constructs expressible in unextended markdown. -But for simple documents it should be adequate. The `latex` and `html` -readers are also limited in what they can do. Because the `html` -reader is picky about the HTML it parses, it is recommended that you -pipe HTML through [HTML Tidy] before sending it to `pandoc`, or use the -`html2markdown` script described below. +(rich text format), `rst` (reStructuredText), `docbook` (DocBook +XML), and `s5` (which produces an HTML file that acts like powerpoint). +Supported input formats include `markdown`, `html`, `latex`, and `rst`. +Note that the `rst` reader only parses a subset of reStructuredText +syntax. For example, it doesn't handle tables, definition lists, option +lists, or footnotes. It handles only the constructs expressible in +unextended markdown. But for simple documents it should be adequate. +The `latex` and `html` readers are also limited in what they can do. +Because the `html` reader is picky about the HTML it parses, it is +recommended that you pipe HTML through [HTML Tidy] before sending it to +`pandoc`, or use the `html2markdown` script described below. If you don't specify a reader or writer explicitly, `pandoc` will try to determine the input and output format from the extensions of @@ -200,7 +201,8 @@ formats are `native`, `markdown`, `rst`, `html`, and `latex`. `-t`, `--to`, `-w`, or `--write` can be used to specify the output format -- the format Pandoc will be converting *to*. Available formats -are `native`, `html`, `s5`, `latex`, `markdown`, `rst`, and `rtf`. +are `native`, `html`, `s5`, `docbook`, `latex`, `markdown`, `rst`, and +`rtf`. `-s` or `--standalone` indicates that a standalone document is to be produced (with appropriate headers and footers), rather than a fragment. diff --git a/debian/changelog b/debian/changelog index 704f0430f..6790a106e 100644 --- a/debian/changelog +++ b/debian/changelog @@ -46,6 +46,8 @@ pandoc (0.3) unstable; urgency=low + Removed extra blanks after '-h' and '-D' output. + Added copyright message to '-v' output, modeled after FSF messages. + * Added docbook writer. + * Added implicit setting of default input and output format based on input and output filename extensions. These defaults are overridden if explicit input and output formats are specified using diff --git a/man/man1/pandoc.1 b/man/man1/pandoc.1 index 820be3276..a1df84208 100644 --- a/man/man1/pandoc.1 +++ b/man/man1/pandoc.1 @@ -6,8 +6,8 @@ pandoc \- general markup converter .SH DESCRIPTION \fBPandoc\fR converts files from one markup format to another. It can read markdown and (subsets of) reStructuredText, HTML, and LaTeX, and -it can write markdown, reStructuredText, HTML, LaTeX, RTF, and S5 HTML -slide shows. +it can write markdown, reStructuredText, HTML, LaTeX, RTF, DocBook +XML, and S5 HTML slide shows. .PP If no \fIinput\-file\fR is specified, input is read from STDIN. Otherwise, the \fIinput\-files\fR are concatenated (with a blank @@ -80,6 +80,8 @@ can be (HTML), .B latex (LaTeX), +.B docbook +(DocBook XML), .B s5 (S5 HTML and javascript slide show), or @@ -141,7 +143,7 @@ default header, which can be printed by using the \fB\-D\fR option). Implies \fB-s\fR. .TP .B \-D \fIFORMAT\fB, \-\-print-default-header=\fIFORMAT\fB -Print the default header for \fIFORMAT\fR (\fIhtml, s5, latex, +Print the default header for \fIFORMAT\fR (\fIhtml, s5, latex, docbook, markdown, rst, rtf\fR). .TP .B \-T \fISTRING\fB, \-\-title-prefix=\fISTRING\fB diff --git a/src/Main.hs b/src/Main.hs index 35a61c63e..b5320c258 100644 --- a/src/Main.hs +++ b/src/Main.hs @@ -37,12 +37,14 @@ import Text.Pandoc.Writers.RST ( writeRST ) import Text.Pandoc.Readers.RST ( readRST ) import Text.Pandoc.ASCIIMathML ( asciiMathMLScript ) import Text.Pandoc.Writers.HTML ( writeHtml ) +import Text.Pandoc.Writers.Docbook ( writeDocbook ) import Text.Pandoc.Writers.LaTeX ( writeLaTeX ) import Text.Pandoc.Readers.LaTeX ( readLaTeX ) import Text.Pandoc.Writers.RTF ( writeRTF ) import Text.Pandoc.Writers.Markdown ( writeMarkdown ) import Text.Pandoc.Writers.DefaultHeaders ( defaultHtmlHeader, - defaultRTFHeader, defaultS5Header, defaultLaTeXHeader ) + defaultRTFHeader, defaultS5Header, defaultLaTeXHeader, + defaultDocbookHeader ) import Text.Pandoc.Definition import Text.Pandoc.Shared import Text.Regex ( mkRegex, matchRegex ) @@ -79,6 +81,7 @@ writers :: [ ( String, ( WriterOptions -> Pandoc -> String, String ) ) ] writers = [("native" , (writeDoc, "")) ,("html" , (writeHtml, defaultHtmlHeader)) ,("s5" , (writeS5, defaultS5Header)) + ,("docbook" , (writeDocbook, defaultDocbookHeader)) ,("latex" , (writeLaTeX, defaultLaTeXHeader)) ,("markdown" , (writeMarkdown, "")) ,("rst" , (writeRST, "")) @@ -331,6 +334,8 @@ defaultWriterName x = Just ["text"] -> "markdown" Just ["md"] -> "markdown" Just ["markdown"] -> "markdown" + Just ["db"] -> "docbook" + Just ["xml"] -> "docbook" Just _ -> "html" main = do @@ -423,6 +428,7 @@ main = do writerSmart = smart && (not strict), writerTabStop = tabStop, + writerNotes = [], writerS5 = (writerName=="s5"), writerIncremental = incremental, writerNumberSections = numberSections, diff --git a/src/Text/Pandoc/Shared.hs b/src/Text/Pandoc/Shared.hs index 7e4f63ffa..eb6c7be78 100644 --- a/src/Text/Pandoc/Shared.hs +++ b/src/Text/Pandoc/Shared.hs @@ -322,7 +322,7 @@ containsPara (x:rest) = containsPara rest -- | Options for writers data WriterOptions = WriterOptions - { writerStandalone :: Bool -- ^ If @True@, writer header and footer + { writerStandalone :: Bool -- ^ Include header and footer , writerTitlePrefix :: String -- ^ Prefix for HTML titles , writerHeader :: String -- ^ Header for the document , writerIncludeBefore :: String -- ^ String to include before the body @@ -334,6 +334,7 @@ data WriterOptions = WriterOptions , writerStrictMarkdown :: Bool -- ^ Use strict markdown syntax , writerTabStop :: Int -- ^ Tabstop for conversion between -- spaces and tabs + , writerNotes :: [Block] -- ^ List of note blocks } deriving Show -- diff --git a/src/Text/Pandoc/Writers/Docbook.hs b/src/Text/Pandoc/Writers/Docbook.hs new file mode 100644 index 000000000..9924d50fe --- /dev/null +++ b/src/Text/Pandoc/Writers/Docbook.hs @@ -0,0 +1,236 @@ +{- +Copyright (C) 2006 John MacFarlane <jgm at berkeley dot edu> + +This program is free software; you can redistribute it and/or modify +it under the terms of the GNU General Public License as published by +the Free Software Foundation; either version 2 of the License, or +(at your option) any later version. + +This program is distributed in the hope that it will be useful, +but WITHOUT ANY WARRANTY; without even the implied warranty of +MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the +GNU General Public License for more details. + +You should have received a copy of the GNU General Public License +along with this program; if not, write to the Free Software +Foundation, Inc., 59 Temple Place, Suite 330, Boston, MA 02111-1307 USA +-} + +{- | + Module : Text.Pandoc.Writers.Docbook + Copyright : Copyright (C) 2006 John MacFarlane + License : GNU GPL, version 2 or above + + Maintainer : John MacFarlane <jgm at berkeley dot edu> + Stability : alpha + Portability : portable + +Conversion of 'Pandoc' documents to Docbook XML. +-} +module Text.Pandoc.Writers.Docbook ( + writeDocbook + ) where +import Text.Pandoc.Definition +import Text.Pandoc.Shared +import Text.Pandoc.Writers.HTML ( stringToSmartHtml, stringToHtml ) +import Text.Html ( stringToHtmlString ) +import Text.Regex ( mkRegex, matchRegex ) +import Data.Char ( toLower ) +import Data.List ( isPrefixOf, partition ) +import Text.PrettyPrint.HughesPJ hiding ( Str ) + +-- | Data structure for defining hierarchical Pandoc documents +data Element = Blk Block + | Sec [Inline] [Element] deriving (Eq, Read, Show) + +-- | Returns true on Header block with level at least 'level' +headerAtLeast :: Int -> Block -> Bool +headerAtLeast level (Header x _) = x <= level +headerAtLeast level _ = False + +-- | Convert list of Pandoc blocks into list of Elements (hierarchical) +hierarchicalize :: [Block] -> [Element] +hierarchicalize [] = [] +hierarchicalize (block:rest) = + case block of + (Header level title) -> let (thisSection, rest') = break (headerAtLeast level) rest in + (Sec title (hierarchicalize thisSection)):(hierarchicalize rest') + x -> (Blk x):(hierarchicalize rest) + +-- | Convert list of authors to a docbook <author> section +authorToDocbook :: WriterOptions -> [Char] -> Doc +authorToDocbook options name = indentedInTags options "author" $ + if ',' `elem` name + then -- last name first + let (lastname, rest) = break (==',') name + firstname = removeLeadingSpace rest in + inTags "firstname" (text $ stringToXML options firstname) <> + inTags "surname" (text $ stringToXML options lastname) + else -- last name last + let namewords = words name + lengthname = length namewords + (firstname, lastname) = case lengthname of + 0 -> ("","") + 1 -> ("", name) + n -> (joinWithSep " " (take (n-1) namewords), last namewords) in + inTags "firstname" (text $ stringToXML options firstname) $$ + inTags "surname" (text $ stringToXML options lastname) + +-- | Convert Pandoc document to string in Docbook format. +writeDocbook :: WriterOptions -> Pandoc -> String +writeDocbook options (Pandoc (Meta title authors date) blocks) = + let head = if (writerStandalone options) + then text (writerHeader options) + else empty + meta = if (writerStandalone options) + then indentedInTags options "articleinfo" $ + (inTags "title" (inlinesToDocbook options title)) $$ + (vcat (map (authorToDocbook options) authors)) $$ + (inTags "date" (text date)) + else empty + blocks' = replaceReferenceLinks blocks + (noteBlocks, blocks'') = partition isNoteBlock blocks' + options' = options {writerNotes = noteBlocks} + elements = hierarchicalize blocks'' + body = text (writerIncludeBefore options') <> + vcat (map (elementToDocbook options') elements) $$ + text (writerIncludeAfter options') + body' = if writerStandalone options' + then indentedInTags options' "article" (meta $$ body) + else body in + render $ head $$ body' <> text "\n" + +-- | Put the supplied contents between start and end tags of tagType, +-- with specified attributes. +inTagsWithAttrib :: String -> [(String, String)] -> Doc -> Doc +inTagsWithAttrib tagType attribs contents = text ("<" ++ tagType ++ + (concatMap (\(a, b) -> " " ++ attributeStringToXML a ++ + "=\"" ++ attributeStringToXML b ++ "\"") attribs)) <> + if isEmpty contents + then text " />" -- self-closing tag + else text ">" <> contents <> text ("</" ++ tagType ++ ">") + +-- | Put the supplied contents between start and end tags of tagType. +inTags :: String -> Doc -> Doc +inTags tagType contents = inTagsWithAttrib tagType [] contents + +-- | Put the supplied contents in indented block btw start and end tags. +indentedInTags :: WriterOptions -> [Char] -> Doc -> Doc +indentedInTags options tagType contents = text ("<" ++ tagType ++ ">") $$ + nest 2 contents $$ text ("</" ++ tagType ++ ">") + +-- | Convert an Element to Docbook. +elementToDocbook :: WriterOptions -> Element -> Doc +elementToDocbook options (Blk block) = blockToDocbook options block +elementToDocbook options (Sec title elements) = + -- Docbook doesn't allow sections with no content, so insert some if needed + let elements' = if null elements + then [Blk (Para [])] + else elements in + indentedInTags options "section" $ + inTags "title" (wrap options title) $$ + vcat (map (elementToDocbook options) elements') + +-- | Convert a list of Pandoc blocks to Docbook. +blocksToDocbook :: WriterOptions -> [Block] -> Doc +blocksToDocbook options = vcat . map (blockToDocbook options) + +-- | Convert a list of lists of blocks to a list of Docbook list items. +listItemsToDocbook :: WriterOptions -> [[Block]] -> Doc +listItemsToDocbook options items = + vcat $ map (listItemToDocbook options) items + +-- | Convert a list of blocks into a Docbook list item. +listItemToDocbook :: WriterOptions -> [Block] -> Doc +listItemToDocbook options item = + let plainToPara (Plain x) = Para x + plainToPara y = y in + let item' = map plainToPara item in + indentedInTags options "listitem" (blocksToDocbook options item') + +-- | Convert a Pandoc block element to Docbook. +blockToDocbook :: WriterOptions -> Block -> Doc +blockToDocbook options Blank = text "" +blockToDocbook options Null = empty +blockToDocbook options (Plain lst) = wrap options lst +blockToDocbook options (Para lst) = + indentedInTags options "para" (wrap options lst) +blockToDocbook options (BlockQuote blocks) = + indentedInTags options "blockquote" (blocksToDocbook options blocks) +blockToDocbook options (CodeBlock str) = + text "<programlisting>" <> (cdata str) <> text "</programlisting>" +blockToDocbook options (BulletList lst) = + indentedInTags options "itemizedlist" $ listItemsToDocbook options lst +blockToDocbook options (OrderedList lst) = + indentedInTags options "orderedlist" $ listItemsToDocbook options lst +blockToDocbook options (RawHtml str) = text str -- raw XML block +blockToDocbook options HorizontalRule = empty -- not semantic +blockToDocbook options (Note _ _) = empty -- shouldn't occur +blockToDocbook options (Key _ _) = empty -- shouldn't occur +blockToDocbook options _ = indentedInTags options "para" (text "Unknown block type") + +-- | Put string in CDATA section +cdata :: String -> Doc +cdata str = text $ "<![CDATA[" ++ str ++ "]]>" + +-- | Take list of inline elements and return wrapped doc. +wrap :: WriterOptions -> [Inline] -> Doc +wrap options lst = fsep $ map (hcat . (map (inlineToDocbook options))) (splitBySpace lst) + +-- | Escape a string for XML (with "smart" option if specified). +stringToXML :: WriterOptions -> String -> String +stringToXML options = if writerSmart options + then stringToSmartHtml + else stringToHtml + +-- | Escape string to XML appropriate for attributes +attributeStringToXML :: String -> String +attributeStringToXML = gsub "\"" """ . codeStringToXML + +-- | Escape a literal string for XML. +codeStringToXML :: String -> String +codeStringToXML = gsub "<" "<" . gsub "&" "&" + +-- | Convert a list of inline elements to Docbook. +inlinesToDocbook :: WriterOptions -> [Inline] -> Doc +inlinesToDocbook options lst = hcat (map (inlineToDocbook options) lst) + +-- | Convert an inline element to Docbook. +inlineToDocbook :: WriterOptions -> Inline -> Doc +inlineToDocbook options (Str str) = text $ stringToXML options str +inlineToDocbook options (Emph lst) = + inTags "emphasis" (inlinesToDocbook options lst) +inlineToDocbook options (Strong lst) = + inTagsWithAttrib "emphasis" [("role", "strong")] + (inlinesToDocbook options lst) +inlineToDocbook options (Code str) = + inTags "literal" $ text (codeStringToXML str) +inlineToDocbook options (TeX str) = inlineToDocbook options (Code str) +inlineToDocbook options (HtmlInline str) = empty +inlineToDocbook options LineBreak = + text $ "<literallayout></literallayout>" +inlineToDocbook options Space = char ' ' +inlineToDocbook options (Link txt (Src src tit)) = + case (matchRegex (mkRegex "mailto:(.*)") src) of + Just [addr] -> inTags "email" $ text (codeStringToXML addr) + Nothing -> inTagsWithAttrib "ulink" [("url", src)] $ + inlinesToDocbook options txt +inlineToDocbook options (Link text (Ref ref)) = empty -- shouldn't occur +inlineToDocbook options (Image alt (Src src tit)) = + let titleDoc = if null tit + then empty + else indentedInTags options "objectinfo" $ + indentedInTags options "title" + (text $ stringToXML options tit) in + indentedInTags options "inlinemediaobject" $ + indentedInTags options "imageobject" $ + titleDoc $$ inTagsWithAttrib "imagedata" [("fileref", src)] empty +inlineToDocbook options (Image alternate (Ref ref)) = empty --shouldn't occur +inlineToDocbook options (NoteRef ref) = + let notes = writerNotes options + hits = filter (\(Note r _) -> r == ref) notes in + if null hits + then empty + else let (Note _ contents) = head hits in + indentedInTags options "footnote" $ blocksToDocbook options contents +inlineToDocbook options _ = empty diff --git a/src/Text/Pandoc/Writers/HTML.hs b/src/Text/Pandoc/Writers/HTML.hs index 4456a61b5..8de1de43f 100644 --- a/src/Text/Pandoc/Writers/HTML.hs +++ b/src/Text/Pandoc/Writers/HTML.hs @@ -28,7 +28,9 @@ Foundation, Inc., 59 Temple Place, Suite 330, Boston, MA 02111-1307 USA Conversion of 'Pandoc' documents to HTML. -} module Text.Pandoc.Writers.HTML ( - writeHtml + writeHtml, + stringToSmartHtml, + stringToHtml ) where import Text.Pandoc.Definition import Text.Pandoc.Shared diff --git a/src/headers/DocbookHeader b/src/headers/DocbookHeader new file mode 100644 index 000000000..7b26b2c73 --- /dev/null +++ b/src/headers/DocbookHeader @@ -0,0 +1,3 @@ +<?xml version="1.0" encoding="utf-8" ?> +<!DOCTYPE article PUBLIC "-//OASIS//DTD DocBook XML V4.4//EN" + "http://www.oasis-open.org/docbook/xml/4.4/docbookx.dtd"> diff --git a/src/templates/DefaultHeaders.hs b/src/templates/DefaultHeaders.hs index 4c552cea7..0274aa30f 100644 --- a/src/templates/DefaultHeaders.hs +++ b/src/templates/DefaultHeaders.hs @@ -1,6 +1,7 @@ -- | Default headers for Pandoc writers. module Text.Pandoc.Writers.DefaultHeaders ( defaultLaTeXHeader, + defaultDocbookHeader, defaultHtmlHeader, defaultS5Header, defaultRTFHeader @@ -10,6 +11,9 @@ import Text.Pandoc.Writers.S5 defaultLaTeXHeader :: String defaultLaTeXHeader = "@LaTeXHeader@" +defaultDocbookHeader :: String +defaultDocbookHeader = "@DocbookHeader@" + defaultHtmlHeader :: String defaultHtmlHeader = "@HtmlHeader@" diff --git a/tests/generate.sh b/tests/generate.sh index 75b3bb9ee..4c236e654 100644 --- a/tests/generate.sh +++ b/tests/generate.sh @@ -7,4 +7,5 @@ ../pandoc -r native -s -w html -S testsuite.native > writer.smart.html ../pandoc -r native -s -w latex testsuite.native > writer.latex ../pandoc -r native -s -w rtf testsuite.native > writer.rtf +sed -e '/^, Header 1 \[Str "HTML",Space,Str "Blocks"\]/,/^, HorizontalRule/d' testsuite.native | ../pandoc -r native -w docbook -s > writer.docbook diff --git a/tests/runtests.pl b/tests/runtests.pl index f37ec31ba..ed624e359 100644 --- a/tests/runtests.pl +++ b/tests/runtests.pl @@ -52,6 +52,11 @@ foreach my $format (@writeformats) test_results("$format writer", "tmp.$extension", "writer.$format"); } +print "Testing docbook writer..."; +# remove HTML block tests, as this produces invalid docbook... +`sed -e '/^, Header 1 \\[Str "HTML",Space,Str "Blocks"\\]/,/^, HorizontalRule/d' testsuite.native | $script -r native -w docbook -s > tmp.docbook`; +test_results("docbook writer", "tmp.docbook", "writer.docbook"); + print "Testing s5 writer (basic)..."; `$script -r native -w s5 -s s5.native > tmp.html`; test_results("s5 writer (basic)", "tmp.html", "s5.basic.html"); diff --git a/tests/writer.docbook b/tests/writer.docbook new file mode 100644 index 000000000..2736bde74 --- /dev/null +++ b/tests/writer.docbook @@ -0,0 +1,997 @@ +<?xml version="1.0" encoding="utf-8" ?> +<!DOCTYPE article PUBLIC "-//OASIS//DTD DocBook XML V4.4//EN" + "http://www.oasis-open.org/docbook/xml/4.4/docbookx.dtd"> + +<article> + <articleinfo> + <title>Pandoc Test Suite</title> + <author> + <firstname>John</firstname> + <surname>MacFarlane</surname> + </author> + <author> + <firstname></firstname> + <surname>Anonymous</surname> + </author> + <date>July 17, 2006</date> + </articleinfo> + <para> + This is a set of tests for pandoc. Most of them are adapted from + John Gruber's markdown test suite. + </para> + <section> + <title>Headers</title> + <section> + <title>Level 2 with an + <ulink url="/url">embedded link</ulink></title> + <section> + <title>Level 3 with <emphasis>emphasis</emphasis></title> + <section> + <title>Level 4</title> + <section> + <title>Level 5</title> + <para> + </para> + </section> + </section> + </section> + </section> + </section> + <section> + <title>Level 1</title> + <section> + <title>Level 2 with <emphasis>emphasis</emphasis></title> + <section> + <title>Level 3</title> + <para> + with no blank line + </para> + </section> + </section> + <section> + <title>Level 2</title> + <para> + with no blank line + </para> + </section> + </section> + <section> + <title>Paragraphs</title> + <para> + Here's a regular paragraph. + </para> + <para> + In Markdown 1.0.0 and earlier. Version 8. This line turns into a + list item. Because a hard-wrapped line in the middle of a paragraph + looked like a list item. + </para> + <para> + Here's one with a bullet. * criminey. + </para> + <para> + There should be a hard line + break<literallayout></literallayout>here. + </para> + </section> + <section> + <title>Block Quotes</title> + <para> + E-mail style: + </para> + <blockquote> + <para> + This is a block quote. It is pretty short. + </para> + </blockquote> + <blockquote> + <para> + Code in a block quote: + </para> + <programlisting><![CDATA[sub status { + print "working"; +}]]></programlisting> + <para> + A list: + </para> + <orderedlist> + <listitem> + <para> + item one + </para> + </listitem> + <listitem> + <para> + item two + </para> + </listitem> + </orderedlist> + <para> + Nested block quotes: + </para> + <blockquote> + <para> + nested + </para> + </blockquote> + <blockquote> + <para> + nested + </para> + </blockquote> + </blockquote> + <para> + This should not be a block quote: 2 > 1. + </para> + <para> + Box-style: + </para> + <blockquote> + <para> + Example: + </para> + <programlisting><![CDATA[sub status { + print "working"; +}]]></programlisting> + </blockquote> + <blockquote> + <orderedlist> + <listitem> + <para> + do laundry + </para> + </listitem> + <listitem> + <para> + take out the trash + </para> + </listitem> + </orderedlist> + </blockquote> + <para> + Here's a nested one: + </para> + <blockquote> + <para> + Joe said: + </para> + <blockquote> + <para> + Don't quote me. + </para> + </blockquote> + </blockquote> + <para> + And a following paragraph. + </para> + </section> + <section> + <title>Code Blocks</title> + <para> + Code: + </para> + <programlisting><![CDATA[---- (should be four hyphens) + +sub status { + print "working"; +} + +this code block is indented by one tab]]></programlisting> + <para> + And: + </para> + <programlisting><![CDATA[ this code block is indented by two tabs + +These should not be escaped: \$ \\ \> \[ \{]]></programlisting> + </section> + <section> + <title>Lists</title> + <section> + <title>Unordered</title> + <para> + Asterisks tight: + </para> + <itemizedlist> + <listitem> + <para> + asterisk 1 + </para> + </listitem> + <listitem> + <para> + asterisk 2 + </para> + </listitem> + <listitem> + <para> + asterisk 3 + </para> + </listitem> + </itemizedlist> + <para> + Asterisks loose: + </para> + <itemizedlist> + <listitem> + <para> + asterisk 1 + </para> + </listitem> + <listitem> + <para> + asterisk 2 + </para> + </listitem> + <listitem> + <para> + asterisk 3 + </para> + </listitem> + </itemizedlist> + <para> + Pluses tight: + </para> + <itemizedlist> + <listitem> + <para> + Plus 1 + </para> + </listitem> + <listitem> + <para> + Plus 2 + </para> + </listitem> + <listitem> + <para> + Plus 3 + </para> + </listitem> + </itemizedlist> + <para> + Pluses loose: + </para> + <itemizedlist> + <listitem> + <para> + Plus 1 + </para> + </listitem> + <listitem> + <para> + Plus 2 + </para> + </listitem> + <listitem> + <para> + Plus 3 + </para> + </listitem> + </itemizedlist> + <para> + Minuses tight: + </para> + <itemizedlist> + <listitem> + <para> + Minus 1 + </para> + </listitem> + <listitem> + <para> + Minus 2 + </para> + </listitem> + <listitem> + <para> + Minus 3 + </para> + </listitem> + </itemizedlist> + <para> + Minuses loose: + </para> + <itemizedlist> + <listitem> + <para> + Minus 1 + </para> + </listitem> + <listitem> + <para> + Minus 2 + </para> + </listitem> + <listitem> + <para> + Minus 3 + </para> + </listitem> + </itemizedlist> + </section> + <section> + <title>Ordered</title> + <para> + Tight: + </para> + <orderedlist> + <listitem> + <para> + First + </para> + </listitem> + <listitem> + <para> + Second + </para> + </listitem> + <listitem> + <para> + Third + </para> + </listitem> + </orderedlist> + <para> + and: + </para> + <orderedlist> + <listitem> + <para> + One + </para> + </listitem> + <listitem> + <para> + Two + </para> + </listitem> + <listitem> + <para> + Three + </para> + </listitem> + </orderedlist> + <para> + Loose using tabs: + </para> + <orderedlist> + <listitem> + <para> + First + </para> + </listitem> + <listitem> + <para> + Second + </para> + </listitem> + <listitem> + <para> + Third + </para> + </listitem> + </orderedlist> + <para> + and using spaces: + </para> + <orderedlist> + <listitem> + <para> + One + </para> + </listitem> + <listitem> + <para> + Two + </para> + </listitem> + <listitem> + <para> + Three + </para> + </listitem> + </orderedlist> + <para> + Multiple paragraphs: + </para> + <orderedlist> + <listitem> + <para> + Item 1, graf one. + </para> + <para> + Item 1. graf two. The quick brown fox jumped over the lazy dog's + back. + </para> + </listitem> + <listitem> + <para> + Item 2. + </para> + </listitem> + <listitem> + <para> + Item 3. + </para> + </listitem> + </orderedlist> + </section> + <section> + <title>Nested</title> + <itemizedlist> + <listitem> + <para> + Tab + </para> + <itemizedlist> + <listitem> + <para> + Tab + </para> + <itemizedlist> + <listitem> + <para> + Tab + </para> + </listitem> + </itemizedlist> + </listitem> + </itemizedlist> + </listitem> + </itemizedlist> + <para> + Here's another: + </para> + <orderedlist> + <listitem> + <para> + First + </para> + </listitem> + <listitem> + <para> + Second: + </para> + <itemizedlist> + <listitem> + <para> + Fee + </para> + </listitem> + <listitem> + <para> + Fie + </para> + </listitem> + <listitem> + <para> + Foe + </para> + </listitem> + </itemizedlist> + </listitem> + <listitem> + <para> + Third + </para> + </listitem> + </orderedlist> + <para> + Same thing but with paragraphs: + </para> + <orderedlist> + <listitem> + <para> + First + </para> + </listitem> + <listitem> + <para> + Second: + </para> + <itemizedlist> + <listitem> + <para> + Fee + </para> + </listitem> + <listitem> + <para> + Fie + </para> + </listitem> + <listitem> + <para> + Foe + </para> + </listitem> + </itemizedlist> + </listitem> + <listitem> + <para> + Third + </para> + </listitem> + </orderedlist> + </section> + <section> + <title>Tabs and spaces</title> + <itemizedlist> + <listitem> + <para> + this is a list item indented with tabs + </para> + </listitem> + <listitem> + <para> + this is a list item indented with spaces + </para> + <itemizedlist> + <listitem> + <para> + this is an example list item indented with tabs + </para> + </listitem> + <listitem> + <para> + this is an example list item indented with spaces + </para> + </listitem> + </itemizedlist> + </listitem> + </itemizedlist> + </section> + </section> + <section> + <title>Inline Markup</title> + <para> + This is <emphasis>emphasized</emphasis>, and so + <emphasis>is this</emphasis>. + </para> + <para> + This is <emphasis role="strong">strong</emphasis>, and so + <emphasis role="strong">is this</emphasis>. + </para> + <para> + An <emphasis><ulink url="/url">emphasized link</ulink></emphasis>. + </para> + <para> + <emphasis role="strong"><emphasis>This is strong and em.</emphasis></emphasis> + </para> + <para> + So is <emphasis role="strong"><emphasis>this</emphasis></emphasis> + word. + </para> + <para> + <emphasis role="strong"><emphasis>This is strong and em.</emphasis></emphasis> + </para> + <para> + So is <emphasis role="strong"><emphasis>this</emphasis></emphasis> + word. + </para> + <para> + This is code: <literal>></literal>, <literal>$</literal>, + <literal>\</literal>, <literal>\$</literal>, + <literal><html></literal>. + </para> + </section> + <section> + <title>Smart quotes, ellipses, dashes</title> + <para> + "Hello," said the spider. "'Shelob' is my name." + </para> + <para> + 'A', 'B', and 'C' are letters. + </para> + <para> + 'Oak,' 'elm,' and 'beech' are names of trees. So is 'pine.' + </para> + <para> + 'He said, "I want to go."' Were you alive in the 70's? + </para> + <para> + Here is some quoted '<literal>code</literal>' and a + "<ulink url="http://example.com/?foo=1&bar=2">quoted link</ulink>". + </para> + <para> + Some dashes: one---two --- three--four -- five. + </para> + <para> + Dashes between numbers: 5-7, 255-66, 1987-1999. + </para> + <para> + Ellipses...and. . .and . . . . + </para> + </section> + <section> + <title>LaTeX</title> + <itemizedlist> + <listitem> + <para> + <literal>\cite[22-23]{smith.1899}</literal> + </para> + </listitem> + <listitem> + <para> + <literal>\doublespacing</literal> + </para> + </listitem> + <listitem> + <para> + <literal>$2+2=4$</literal> + </para> + </listitem> + <listitem> + <para> + <literal>$x \in y$</literal> + </para> + </listitem> + <listitem> + <para> + <literal>$\alpha \wedge \omega$</literal> + </para> + </listitem> + <listitem> + <para> + <literal>$223$</literal> + </para> + </listitem> + <listitem> + <para> + <literal>$p$</literal>-Tree + </para> + </listitem> + <listitem> + <para> + <literal>$\frac{d}{dx}f(x)=\lim_{h\to 0}\frac{f(x+h)-f(x)}{h}$</literal> + </para> + </listitem> + <listitem> + <para> + Here's one that has a line break in it: + <literal>$\alpha + \omega \times x^2$</literal>. + </para> + </listitem> + </itemizedlist> + <para> + These shouldn't be math: + </para> + <itemizedlist> + <listitem> + <para> + To get the famous equation, write <literal>$e = mc^2$</literal>. + </para> + </listitem> + <listitem> + <para> + $22,000 is a <emphasis>lot</emphasis> of money. So is $34,000. (It + worked if "lot" is emphasized.) + </para> + </listitem> + <listitem> + <para> + Escaped <literal>$</literal>: $73 + <emphasis>this should be emphasized</emphasis> 23$. + </para> + </listitem> + </itemizedlist> + <para> + Here's a LaTeX table: + </para> + <para> + <literal>\begin{tabular}{|l|l|}\hline +Animal & Number \\ \hline +Dog & 2 \\ +Cat & 1 \\ \hline +\end{tabular}</literal> + </para> + </section> + <section> + <title>Special Characters</title> + <para> + Here is some unicode: + </para> + <itemizedlist> + <listitem> + <para> + I hat: Î + </para> + </listitem> + <listitem> + <para> + o umlaut: ö + </para> + </listitem> + <listitem> + <para> + section: § + </para> + </listitem> + <listitem> + <para> + set membership: ∈ + </para> + </listitem> + <listitem> + <para> + copyright: © + </para> + </listitem> + </itemizedlist> + <para> + AT&T has an ampersand in their name. + </para> + <para> + AT&T is another way to write it. + </para> + <para> + This & that. + </para> + <para> + 4 < 5. + </para> + <para> + 6 > 5. + </para> + <para> + Backslash: \ + </para> + <para> + Backtick: ` + </para> + <para> + Asterisk: * + </para> + <para> + Underscore: _ + </para> + <para> + Left brace: { + </para> + <para> + Right brace: } + </para> + <para> + Left bracket: [ + </para> + <para> + Right bracket: ] + </para> + <para> + Left paren: ( + </para> + <para> + Right paren: ) + </para> + <para> + Greater-than: > + </para> + <para> + Hash: # + </para> + <para> + Period: . + </para> + <para> + Bang: ! + </para> + <para> + Plus: + + </para> + <para> + Minus: - + </para> + </section> + <section> + <title>Links</title> + <section> + <title>Explicit</title> + <para> + Just a <ulink url="/url/">URL</ulink>. + </para> + <para> + <ulink url="/url/">URL and title</ulink>. + </para> + <para> + <ulink url="/url/">URL and title</ulink>. + </para> + <para> + <ulink url="/url/">URL and title</ulink>. + </para> + <para> + <ulink url="/url/">URL and title</ulink> + </para> + <para> + <ulink url="/url/">URL and title</ulink> + </para> + <para> + <email>nobody@nowhere.net</email> + </para> + <para> + <ulink url="">Empty</ulink>. + </para> + </section> + <section> + <title>Reference</title> + <para> + Foo <ulink url="/url/">bar</ulink>. + </para> + <para> + Foo <ulink url="/url/">bar</ulink>. + </para> + <para> + Foo <ulink url="/url/">bar</ulink>. + </para> + <para> + With <ulink url="/url/">embedded [brackets]</ulink>. + </para> + <para> + <ulink url="/url/">b</ulink> by itself should be a link. + </para> + <para> + Indented <ulink url="/url">once</ulink>. + </para> + <para> + Indented <ulink url="/url">twice</ulink>. + </para> + <para> + Indented <ulink url="/url">thrice</ulink>. + </para> + <para> + This should [not][] be a link. + </para> + <programlisting><![CDATA[[not]: /url]]></programlisting> + <para> + Foo <ulink url="/url/">bar</ulink>. + </para> + <para> + Foo <ulink url="/url/">biz</ulink>. + </para> + </section> + <section> + <title>With ampersands</title> + <para> + Here's a + <ulink url="http://example.com/?foo=1&bar=2">link with an ampersand in the URL</ulink>. + </para> + <para> + Here's a link with an amersand in the link text: + <ulink url="http://att.com/">AT&T</ulink>. + </para> + <para> + Here's an <ulink url="/script?foo=1&bar=2">inline link</ulink>. + </para> + <para> + Here's an + <ulink url="/script?foo=1&bar=2">inline link in pointy braces</ulink>. + </para> + </section> + <section> + <title>Autolinks</title> + <para> + With an ampersand: + <ulink url="http://example.com/?foo=1&bar=2">http://example.com/?foo=1&bar=2</ulink> + </para> + <itemizedlist> + <listitem> + <para> + In a list? + </para> + </listitem> + <listitem> + <para> + <ulink url="http://example.com/">http://example.com/</ulink> + </para> + </listitem> + <listitem> + <para> + It should. + </para> + </listitem> + </itemizedlist> + <para> + An e-mail address: <email>nobody@nowhere.net</email> + </para> + <blockquote> + <para> + Blockquoted: + <ulink url="http://example.com/">http://example.com/</ulink> + </para> + </blockquote> + <para> + Auto-links should not occur here: + <literal><http://example.com/></literal> + </para> + <programlisting><![CDATA[or here: <http://example.com/>]]></programlisting> + </section> + </section> + <section> + <title>Images</title> + <para> + From "Voyage dans la Lune" by Georges Melies (1902): + </para> + <para> + <inlinemediaobject> + <imageobject> + <objectinfo> + <title> + Voyage dans la Lune + </title> + </objectinfo> + <imagedata fileref="lalune.jpg" /> + </imageobject> + </inlinemediaobject> + </para> + <para> + Here is a movie + <inlinemediaobject> + <imageobject> + <imagedata fileref="movie.jpg" /> + </imageobject> + </inlinemediaobject> + icon. + </para> + </section> + <section> + <title>Footnotes</title> + <para> + Here is a footnote + reference,<footnote> + <para> + Here is the footnote. It can go anywhere after the footnote + reference. It need not be placed at the end of the document. + </para> + </footnote> + and + another.<footnote> + <para> + Here's the long note. This one contains multiple blocks. + </para> + <para> + Subsequent blocks are indented to show that they belong to the + footnote (as with list items). + </para> + <programlisting><![CDATA[ { <code> }]]></programlisting> + <para> + If you want, you can indent every line, but you can also be lazy + and just indent the first line of each block. + </para> + </footnote> + This should <emphasis>not</emphasis> be a footnote reference, + because it contains a space.[^my note] Here is an inline + note.<footnote> + <para> + This is <emphasis>easier</emphasis> to type. Inline notes may + contain <ulink url="http://google.com">links</ulink> and + <literal>]</literal> verbatim characters. + </para> + </footnote> + </para> + <blockquote> + <para> + Notes can go in + quotes.<footnote> + <para> + In quote. + </para> + </footnote> + </para> + </blockquote> + <orderedlist> + <listitem> + <para> + And in list + items.<footnote> + <para> + In list. + </para> + </footnote> + </para> + </listitem> + </orderedlist> + <para> + This paragraph should not be part of the note, as it is not + indented. + </para> + </section> + +</article> diff --git a/web/index.txt b/web/index.txt index 024133487..ec96b01dd 100644 --- a/web/index.txt +++ b/web/index.txt @@ -4,8 +4,8 @@ Pandoc is a [Haskell] library for converting from one markup format to another, and a command-line tool that uses this library. It can read [markdown] and (subsets of) [reStructuredText], [HTML], and [LaTeX], and it can write [markdown], [reStructuredText], [HTML], [LaTeX], [RTF], -and [S5] HTML slide shows. Pandoc's version of markdown contains some -enhancements, like footnotes and embedded LaTeX. +[DocBook XML], and [S5] HTML slide shows. Pandoc's version of markdown +contains some enhancements, like footnotes and embedded LaTeX. In contrast to existing tools for converting markdown to HTML, which use regex substitutions, Pandoc has a modular design: it consists of a @@ -73,6 +73,7 @@ kind. [HTML]: http://www.w3.org/TR/html40/ [LaTeX]: http://www.latex-project.org/ [RTF]: http://en.wikipedia.org/wiki/Rich_Text_Format +[DocBook XML]: http://www.docbook.org/ [Haskell]: http://www.haskell.org/ [GHC]: http://www.haskell.org/ghc/ [GPL]: http://www.gnu.org/copyleft/gpl.html diff --git a/web/mkdemos.sh b/web/mkdemos.sh index ac3d45106..b3e344413 100644 --- a/web/mkdemos.sh +++ b/web/mkdemos.sh @@ -16,6 +16,7 @@ pandoc -s README.tex -o example0.txt pandoc -s -w rst README -o example0.txt pandoc -s README -o example0.rtf pandoc -s -m -i -w s5 S5DEMO -o example0.html +pandoc -s -w docbook README -o example0.db html2markdown http://www.gnu.org/software/make/ -o example0.txt markdown2pdf README -o example0.pdf markdown2pdf -C myheader.tex README -o example0.pdf' |