From a9e32505debea9077c0cca49b84a8c0d363bf3e8 Mon Sep 17 00:00:00 2001 From: fiddlosopher Date: Mon, 1 Jan 2007 21:08:12 +0000 Subject: Merged changes from docbook branch since r363. git-svn-id: https://pandoc.googlecode.com/svn/trunk@386 788f1e2b-df1e-0410-8736-df70ead52e1b --- README | 30 +- debian/changelog | 2 + man/man1/pandoc.1 | 8 +- src/Main.hs | 8 +- src/Text/Pandoc/Shared.hs | 3 +- src/Text/Pandoc/Writers/Docbook.hs | 236 +++++++++ src/Text/Pandoc/Writers/HTML.hs | 4 +- src/headers/DocbookHeader | 3 + src/templates/DefaultHeaders.hs | 4 + tests/generate.sh | 1 + tests/runtests.pl | 5 + tests/writer.docbook | 997 +++++++++++++++++++++++++++++++++++++ web/index.txt | 5 +- web/mkdemos.sh | 1 + 14 files changed, 1285 insertions(+), 22 deletions(-) create mode 100644 src/Text/Pandoc/Writers/Docbook.hs create mode 100644 src/headers/DocbookHeader create mode 100644 tests/writer.docbook diff --git a/README b/README index af06a7e86..caee57852 100644 --- a/README +++ b/README @@ -6,8 +6,8 @@ Pandoc is a [Haskell] library for converting from one markup format to another, and a command-line tool that uses this library. It can read [markdown] and (subsets of) [reStructuredText], [HTML], and [LaTeX], and it can write [markdown], [reStructuredText], [HTML], [LaTeX], [RTF], -and [S5] HTML slide shows. Pandoc's version of markdown contains some -enhancements, like footnotes and embedded LaTeX. +[DocBook XML], and [S5] HTML slide shows. Pandoc's version of markdown +contains some enhancements, like footnotes and embedded LaTeX. In contrast to existing tools for converting markdown to HTML, which use regex substitutions, Pandoc has a modular design: it consists of a @@ -22,6 +22,7 @@ or output format requires only adding a reader or writer. [HTML]: http://www.w3.org/TR/html40/ [LaTeX]: http://www.latex-project.org/ [RTF]: http://en.wikipedia.org/wiki/Rich_Text_Format +[DocBook XML]: http://www.docbook.org/ [Haskell]: http://www.haskell.org/ (c) 2006 John MacFarlane (jgm at berkeley dot edu). Released under the @@ -107,17 +108,17 @@ To convert `hello.html` from html to markdown: pandoc -f html -t markdown hello.html Supported output formats include `markdown`, `latex`, `html`, `rtf` -(rich text format), `rst` (reStructuredText), and `s5` (which produces -an HTML file that acts like powerpoint). Supported input formats -include `markdown`, `html`, `latex`, and `rst`. Note that the `rst` -reader only parses a subset of reStructuredText syntax. For example, -it doesn't handle tables, definition lists, option lists, or footnotes. -It handles only the constructs expressible in unextended markdown. -But for simple documents it should be adequate. The `latex` and `html` -readers are also limited in what they can do. Because the `html` -reader is picky about the HTML it parses, it is recommended that you -pipe HTML through [HTML Tidy] before sending it to `pandoc`, or use the -`html2markdown` script described below. +(rich text format), `rst` (reStructuredText), `docbook` (DocBook +XML), and `s5` (which produces an HTML file that acts like powerpoint). +Supported input formats include `markdown`, `html`, `latex`, and `rst`. +Note that the `rst` reader only parses a subset of reStructuredText +syntax. For example, it doesn't handle tables, definition lists, option +lists, or footnotes. It handles only the constructs expressible in +unextended markdown. But for simple documents it should be adequate. +The `latex` and `html` readers are also limited in what they can do. +Because the `html` reader is picky about the HTML it parses, it is +recommended that you pipe HTML through [HTML Tidy] before sending it to +`pandoc`, or use the `html2markdown` script described below. If you don't specify a reader or writer explicitly, `pandoc` will try to determine the input and output format from the extensions of @@ -200,7 +201,8 @@ formats are `native`, `markdown`, `rst`, `html`, and `latex`. `-t`, `--to`, `-w`, or `--write` can be used to specify the output format -- the format Pandoc will be converting *to*. Available formats -are `native`, `html`, `s5`, `latex`, `markdown`, `rst`, and `rtf`. +are `native`, `html`, `s5`, `docbook`, `latex`, `markdown`, `rst`, and +`rtf`. `-s` or `--standalone` indicates that a standalone document is to be produced (with appropriate headers and footers), rather than a fragment. diff --git a/debian/changelog b/debian/changelog index 704f0430f..6790a106e 100644 --- a/debian/changelog +++ b/debian/changelog @@ -46,6 +46,8 @@ pandoc (0.3) unstable; urgency=low + Removed extra blanks after '-h' and '-D' output. + Added copyright message to '-v' output, modeled after FSF messages. + * Added docbook writer. + * Added implicit setting of default input and output format based on input and output filename extensions. These defaults are overridden if explicit input and output formats are specified using diff --git a/man/man1/pandoc.1 b/man/man1/pandoc.1 index 820be3276..a1df84208 100644 --- a/man/man1/pandoc.1 +++ b/man/man1/pandoc.1 @@ -6,8 +6,8 @@ pandoc \- general markup converter .SH DESCRIPTION \fBPandoc\fR converts files from one markup format to another. It can read markdown and (subsets of) reStructuredText, HTML, and LaTeX, and -it can write markdown, reStructuredText, HTML, LaTeX, RTF, and S5 HTML -slide shows. +it can write markdown, reStructuredText, HTML, LaTeX, RTF, DocBook +XML, and S5 HTML slide shows. .PP If no \fIinput\-file\fR is specified, input is read from STDIN. Otherwise, the \fIinput\-files\fR are concatenated (with a blank @@ -80,6 +80,8 @@ can be (HTML), .B latex (LaTeX), +.B docbook +(DocBook XML), .B s5 (S5 HTML and javascript slide show), or @@ -141,7 +143,7 @@ default header, which can be printed by using the \fB\-D\fR option). Implies \fB-s\fR. .TP .B \-D \fIFORMAT\fB, \-\-print-default-header=\fIFORMAT\fB -Print the default header for \fIFORMAT\fR (\fIhtml, s5, latex, +Print the default header for \fIFORMAT\fR (\fIhtml, s5, latex, docbook, markdown, rst, rtf\fR). .TP .B \-T \fISTRING\fB, \-\-title-prefix=\fISTRING\fB diff --git a/src/Main.hs b/src/Main.hs index 35a61c63e..b5320c258 100644 --- a/src/Main.hs +++ b/src/Main.hs @@ -37,12 +37,14 @@ import Text.Pandoc.Writers.RST ( writeRST ) import Text.Pandoc.Readers.RST ( readRST ) import Text.Pandoc.ASCIIMathML ( asciiMathMLScript ) import Text.Pandoc.Writers.HTML ( writeHtml ) +import Text.Pandoc.Writers.Docbook ( writeDocbook ) import Text.Pandoc.Writers.LaTeX ( writeLaTeX ) import Text.Pandoc.Readers.LaTeX ( readLaTeX ) import Text.Pandoc.Writers.RTF ( writeRTF ) import Text.Pandoc.Writers.Markdown ( writeMarkdown ) import Text.Pandoc.Writers.DefaultHeaders ( defaultHtmlHeader, - defaultRTFHeader, defaultS5Header, defaultLaTeXHeader ) + defaultRTFHeader, defaultS5Header, defaultLaTeXHeader, + defaultDocbookHeader ) import Text.Pandoc.Definition import Text.Pandoc.Shared import Text.Regex ( mkRegex, matchRegex ) @@ -79,6 +81,7 @@ writers :: [ ( String, ( WriterOptions -> Pandoc -> String, String ) ) ] writers = [("native" , (writeDoc, "")) ,("html" , (writeHtml, defaultHtmlHeader)) ,("s5" , (writeS5, defaultS5Header)) + ,("docbook" , (writeDocbook, defaultDocbookHeader)) ,("latex" , (writeLaTeX, defaultLaTeXHeader)) ,("markdown" , (writeMarkdown, "")) ,("rst" , (writeRST, "")) @@ -331,6 +334,8 @@ defaultWriterName x = Just ["text"] -> "markdown" Just ["md"] -> "markdown" Just ["markdown"] -> "markdown" + Just ["db"] -> "docbook" + Just ["xml"] -> "docbook" Just _ -> "html" main = do @@ -423,6 +428,7 @@ main = do writerSmart = smart && (not strict), writerTabStop = tabStop, + writerNotes = [], writerS5 = (writerName=="s5"), writerIncremental = incremental, writerNumberSections = numberSections, diff --git a/src/Text/Pandoc/Shared.hs b/src/Text/Pandoc/Shared.hs index 7e4f63ffa..eb6c7be78 100644 --- a/src/Text/Pandoc/Shared.hs +++ b/src/Text/Pandoc/Shared.hs @@ -322,7 +322,7 @@ containsPara (x:rest) = containsPara rest -- | Options for writers data WriterOptions = WriterOptions - { writerStandalone :: Bool -- ^ If @True@, writer header and footer + { writerStandalone :: Bool -- ^ Include header and footer , writerTitlePrefix :: String -- ^ Prefix for HTML titles , writerHeader :: String -- ^ Header for the document , writerIncludeBefore :: String -- ^ String to include before the body @@ -334,6 +334,7 @@ data WriterOptions = WriterOptions , writerStrictMarkdown :: Bool -- ^ Use strict markdown syntax , writerTabStop :: Int -- ^ Tabstop for conversion between -- spaces and tabs + , writerNotes :: [Block] -- ^ List of note blocks } deriving Show -- diff --git a/src/Text/Pandoc/Writers/Docbook.hs b/src/Text/Pandoc/Writers/Docbook.hs new file mode 100644 index 000000000..9924d50fe --- /dev/null +++ b/src/Text/Pandoc/Writers/Docbook.hs @@ -0,0 +1,236 @@ +{- +Copyright (C) 2006 John MacFarlane + +This program is free software; you can redistribute it and/or modify +it under the terms of the GNU General Public License as published by +the Free Software Foundation; either version 2 of the License, or +(at your option) any later version. + +This program is distributed in the hope that it will be useful, +but WITHOUT ANY WARRANTY; without even the implied warranty of +MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the +GNU General Public License for more details. + +You should have received a copy of the GNU General Public License +along with this program; if not, write to the Free Software +Foundation, Inc., 59 Temple Place, Suite 330, Boston, MA 02111-1307 USA +-} + +{- | + Module : Text.Pandoc.Writers.Docbook + Copyright : Copyright (C) 2006 John MacFarlane + License : GNU GPL, version 2 or above + + Maintainer : John MacFarlane + Stability : alpha + Portability : portable + +Conversion of 'Pandoc' documents to Docbook XML. +-} +module Text.Pandoc.Writers.Docbook ( + writeDocbook + ) where +import Text.Pandoc.Definition +import Text.Pandoc.Shared +import Text.Pandoc.Writers.HTML ( stringToSmartHtml, stringToHtml ) +import Text.Html ( stringToHtmlString ) +import Text.Regex ( mkRegex, matchRegex ) +import Data.Char ( toLower ) +import Data.List ( isPrefixOf, partition ) +import Text.PrettyPrint.HughesPJ hiding ( Str ) + +-- | Data structure for defining hierarchical Pandoc documents +data Element = Blk Block + | Sec [Inline] [Element] deriving (Eq, Read, Show) + +-- | Returns true on Header block with level at least 'level' +headerAtLeast :: Int -> Block -> Bool +headerAtLeast level (Header x _) = x <= level +headerAtLeast level _ = False + +-- | Convert list of Pandoc blocks into list of Elements (hierarchical) +hierarchicalize :: [Block] -> [Element] +hierarchicalize [] = [] +hierarchicalize (block:rest) = + case block of + (Header level title) -> let (thisSection, rest') = break (headerAtLeast level) rest in + (Sec title (hierarchicalize thisSection)):(hierarchicalize rest') + x -> (Blk x):(hierarchicalize rest) + +-- | Convert list of authors to a docbook section +authorToDocbook :: WriterOptions -> [Char] -> Doc +authorToDocbook options name = indentedInTags options "author" $ + if ',' `elem` name + then -- last name first + let (lastname, rest) = break (==',') name + firstname = removeLeadingSpace rest in + inTags "firstname" (text $ stringToXML options firstname) <> + inTags "surname" (text $ stringToXML options lastname) + else -- last name last + let namewords = words name + lengthname = length namewords + (firstname, lastname) = case lengthname of + 0 -> ("","") + 1 -> ("", name) + n -> (joinWithSep " " (take (n-1) namewords), last namewords) in + inTags "firstname" (text $ stringToXML options firstname) $$ + inTags "surname" (text $ stringToXML options lastname) + +-- | Convert Pandoc document to string in Docbook format. +writeDocbook :: WriterOptions -> Pandoc -> String +writeDocbook options (Pandoc (Meta title authors date) blocks) = + let head = if (writerStandalone options) + then text (writerHeader options) + else empty + meta = if (writerStandalone options) + then indentedInTags options "articleinfo" $ + (inTags "title" (inlinesToDocbook options title)) $$ + (vcat (map (authorToDocbook options) authors)) $$ + (inTags "date" (text date)) + else empty + blocks' = replaceReferenceLinks blocks + (noteBlocks, blocks'') = partition isNoteBlock blocks' + options' = options {writerNotes = noteBlocks} + elements = hierarchicalize blocks'' + body = text (writerIncludeBefore options') <> + vcat (map (elementToDocbook options') elements) $$ + text (writerIncludeAfter options') + body' = if writerStandalone options' + then indentedInTags options' "article" (meta $$ body) + else body in + render $ head $$ body' <> text "\n" + +-- | Put the supplied contents between start and end tags of tagType, +-- with specified attributes. +inTagsWithAttrib :: String -> [(String, String)] -> Doc -> Doc +inTagsWithAttrib tagType attribs contents = text ("<" ++ tagType ++ + (concatMap (\(a, b) -> " " ++ attributeStringToXML a ++ + "=\"" ++ attributeStringToXML b ++ "\"") attribs)) <> + if isEmpty contents + then text " />" -- self-closing tag + else text ">" <> contents <> text ("") + +-- | Put the supplied contents between start and end tags of tagType. +inTags :: String -> Doc -> Doc +inTags tagType contents = inTagsWithAttrib tagType [] contents + +-- | Put the supplied contents in indented block btw start and end tags. +indentedInTags :: WriterOptions -> [Char] -> Doc -> Doc +indentedInTags options tagType contents = text ("<" ++ tagType ++ ">") $$ + nest 2 contents $$ text ("") + +-- | Convert an Element to Docbook. +elementToDocbook :: WriterOptions -> Element -> Doc +elementToDocbook options (Blk block) = blockToDocbook options block +elementToDocbook options (Sec title elements) = + -- Docbook doesn't allow sections with no content, so insert some if needed + let elements' = if null elements + then [Blk (Para [])] + else elements in + indentedInTags options "section" $ + inTags "title" (wrap options title) $$ + vcat (map (elementToDocbook options) elements') + +-- | Convert a list of Pandoc blocks to Docbook. +blocksToDocbook :: WriterOptions -> [Block] -> Doc +blocksToDocbook options = vcat . map (blockToDocbook options) + +-- | Convert a list of lists of blocks to a list of Docbook list items. +listItemsToDocbook :: WriterOptions -> [[Block]] -> Doc +listItemsToDocbook options items = + vcat $ map (listItemToDocbook options) items + +-- | Convert a list of blocks into a Docbook list item. +listItemToDocbook :: WriterOptions -> [Block] -> Doc +listItemToDocbook options item = + let plainToPara (Plain x) = Para x + plainToPara y = y in + let item' = map plainToPara item in + indentedInTags options "listitem" (blocksToDocbook options item') + +-- | Convert a Pandoc block element to Docbook. +blockToDocbook :: WriterOptions -> Block -> Doc +blockToDocbook options Blank = text "" +blockToDocbook options Null = empty +blockToDocbook options (Plain lst) = wrap options lst +blockToDocbook options (Para lst) = + indentedInTags options "para" (wrap options lst) +blockToDocbook options (BlockQuote blocks) = + indentedInTags options "blockquote" (blocksToDocbook options blocks) +blockToDocbook options (CodeBlock str) = + text "" <> (cdata str) <> text "" +blockToDocbook options (BulletList lst) = + indentedInTags options "itemizedlist" $ listItemsToDocbook options lst +blockToDocbook options (OrderedList lst) = + indentedInTags options "orderedlist" $ listItemsToDocbook options lst +blockToDocbook options (RawHtml str) = text str -- raw XML block +blockToDocbook options HorizontalRule = empty -- not semantic +blockToDocbook options (Note _ _) = empty -- shouldn't occur +blockToDocbook options (Key _ _) = empty -- shouldn't occur +blockToDocbook options _ = indentedInTags options "para" (text "Unknown block type") + +-- | Put string in CDATA section +cdata :: String -> Doc +cdata str = text $ "" + +-- | Take list of inline elements and return wrapped doc. +wrap :: WriterOptions -> [Inline] -> Doc +wrap options lst = fsep $ map (hcat . (map (inlineToDocbook options))) (splitBySpace lst) + +-- | Escape a string for XML (with "smart" option if specified). +stringToXML :: WriterOptions -> String -> String +stringToXML options = if writerSmart options + then stringToSmartHtml + else stringToHtml + +-- | Escape string to XML appropriate for attributes +attributeStringToXML :: String -> String +attributeStringToXML = gsub "\"" """ . codeStringToXML + +-- | Escape a literal string for XML. +codeStringToXML :: String -> String +codeStringToXML = gsub "<" "<" . gsub "&" "&" + +-- | Convert a list of inline elements to Docbook. +inlinesToDocbook :: WriterOptions -> [Inline] -> Doc +inlinesToDocbook options lst = hcat (map (inlineToDocbook options) lst) + +-- | Convert an inline element to Docbook. +inlineToDocbook :: WriterOptions -> Inline -> Doc +inlineToDocbook options (Str str) = text $ stringToXML options str +inlineToDocbook options (Emph lst) = + inTags "emphasis" (inlinesToDocbook options lst) +inlineToDocbook options (Strong lst) = + inTagsWithAttrib "emphasis" [("role", "strong")] + (inlinesToDocbook options lst) +inlineToDocbook options (Code str) = + inTags "literal" $ text (codeStringToXML str) +inlineToDocbook options (TeX str) = inlineToDocbook options (Code str) +inlineToDocbook options (HtmlInline str) = empty +inlineToDocbook options LineBreak = + text $ "" +inlineToDocbook options Space = char ' ' +inlineToDocbook options (Link txt (Src src tit)) = + case (matchRegex (mkRegex "mailto:(.*)") src) of + Just [addr] -> inTags "email" $ text (codeStringToXML addr) + Nothing -> inTagsWithAttrib "ulink" [("url", src)] $ + inlinesToDocbook options txt +inlineToDocbook options (Link text (Ref ref)) = empty -- shouldn't occur +inlineToDocbook options (Image alt (Src src tit)) = + let titleDoc = if null tit + then empty + else indentedInTags options "objectinfo" $ + indentedInTags options "title" + (text $ stringToXML options tit) in + indentedInTags options "inlinemediaobject" $ + indentedInTags options "imageobject" $ + titleDoc $$ inTagsWithAttrib "imagedata" [("fileref", src)] empty +inlineToDocbook options (Image alternate (Ref ref)) = empty --shouldn't occur +inlineToDocbook options (NoteRef ref) = + let notes = writerNotes options + hits = filter (\(Note r _) -> r == ref) notes in + if null hits + then empty + else let (Note _ contents) = head hits in + indentedInTags options "footnote" $ blocksToDocbook options contents +inlineToDocbook options _ = empty diff --git a/src/Text/Pandoc/Writers/HTML.hs b/src/Text/Pandoc/Writers/HTML.hs index 4456a61b5..8de1de43f 100644 --- a/src/Text/Pandoc/Writers/HTML.hs +++ b/src/Text/Pandoc/Writers/HTML.hs @@ -28,7 +28,9 @@ Foundation, Inc., 59 Temple Place, Suite 330, Boston, MA 02111-1307 USA Conversion of 'Pandoc' documents to HTML. -} module Text.Pandoc.Writers.HTML ( - writeHtml + writeHtml, + stringToSmartHtml, + stringToHtml ) where import Text.Pandoc.Definition import Text.Pandoc.Shared diff --git a/src/headers/DocbookHeader b/src/headers/DocbookHeader new file mode 100644 index 000000000..7b26b2c73 --- /dev/null +++ b/src/headers/DocbookHeader @@ -0,0 +1,3 @@ + + diff --git a/src/templates/DefaultHeaders.hs b/src/templates/DefaultHeaders.hs index 4c552cea7..0274aa30f 100644 --- a/src/templates/DefaultHeaders.hs +++ b/src/templates/DefaultHeaders.hs @@ -1,6 +1,7 @@ -- | Default headers for Pandoc writers. module Text.Pandoc.Writers.DefaultHeaders ( defaultLaTeXHeader, + defaultDocbookHeader, defaultHtmlHeader, defaultS5Header, defaultRTFHeader @@ -10,6 +11,9 @@ import Text.Pandoc.Writers.S5 defaultLaTeXHeader :: String defaultLaTeXHeader = "@LaTeXHeader@" +defaultDocbookHeader :: String +defaultDocbookHeader = "@DocbookHeader@" + defaultHtmlHeader :: String defaultHtmlHeader = "@HtmlHeader@" diff --git a/tests/generate.sh b/tests/generate.sh index 75b3bb9ee..4c236e654 100644 --- a/tests/generate.sh +++ b/tests/generate.sh @@ -7,4 +7,5 @@ ../pandoc -r native -s -w html -S testsuite.native > writer.smart.html ../pandoc -r native -s -w latex testsuite.native > writer.latex ../pandoc -r native -s -w rtf testsuite.native > writer.rtf +sed -e '/^, Header 1 \[Str "HTML",Space,Str "Blocks"\]/,/^, HorizontalRule/d' testsuite.native | ../pandoc -r native -w docbook -s > writer.docbook diff --git a/tests/runtests.pl b/tests/runtests.pl index f37ec31ba..ed624e359 100644 --- a/tests/runtests.pl +++ b/tests/runtests.pl @@ -52,6 +52,11 @@ foreach my $format (@writeformats) test_results("$format writer", "tmp.$extension", "writer.$format"); } +print "Testing docbook writer..."; +# remove HTML block tests, as this produces invalid docbook... +`sed -e '/^, Header 1 \\[Str "HTML",Space,Str "Blocks"\\]/,/^, HorizontalRule/d' testsuite.native | $script -r native -w docbook -s > tmp.docbook`; +test_results("docbook writer", "tmp.docbook", "writer.docbook"); + print "Testing s5 writer (basic)..."; `$script -r native -w s5 -s s5.native > tmp.html`; test_results("s5 writer (basic)", "tmp.html", "s5.basic.html"); diff --git a/tests/writer.docbook b/tests/writer.docbook new file mode 100644 index 000000000..2736bde74 --- /dev/null +++ b/tests/writer.docbook @@ -0,0 +1,997 @@ + + + +
+ + Pandoc Test Suite + + John + MacFarlane + + + + Anonymous + + July 17, 2006 + + + This is a set of tests for pandoc. Most of them are adapted from + John Gruber's markdown test suite. + +
+ Headers +
+ Level 2 with an + <ulink url="/url">embedded link</ulink> +
+ Level 3 with <emphasis>emphasis</emphasis> +
+ Level 4 +
+ Level 5 + + +
+
+
+
+
+
+ Level 1 +
+ Level 2 with <emphasis>emphasis</emphasis> +
+ Level 3 + + with no blank line + +
+
+
+ Level 2 + + with no blank line + +
+
+
+ Paragraphs + + Here's a regular paragraph. + + + In Markdown 1.0.0 and earlier. Version 8. This line turns into a + list item. Because a hard-wrapped line in the middle of a paragraph + looked like a list item. + + + Here's one with a bullet. * criminey. + + + There should be a hard line + breakhere. + +
+
+ Block Quotes + + E-mail style: + +
+ + This is a block quote. It is pretty short. + +
+
+ + Code in a block quote: + + + + A list: + + + + + item one + + + + + item two + + + + + Nested block quotes: + +
+ + nested + +
+
+ + nested + +
+
+ + This should not be a block quote: 2 > 1. + + + Box-style: + +
+ + Example: + + +
+
+ + + + do laundry + + + + + take out the trash + + + +
+ + Here's a nested one: + +
+ + Joe said: + +
+ + Don't quote me. + +
+
+ + And a following paragraph. + +
+
+ Code Blocks + + Code: + + + + And: + + \[ \{]]> +
+
+ Lists +
+ Unordered + + Asterisks tight: + + + + + asterisk 1 + + + + + asterisk 2 + + + + + asterisk 3 + + + + + Asterisks loose: + + + + + asterisk 1 + + + + + asterisk 2 + + + + + asterisk 3 + + + + + Pluses tight: + + + + + Plus 1 + + + + + Plus 2 + + + + + Plus 3 + + + + + Pluses loose: + + + + + Plus 1 + + + + + Plus 2 + + + + + Plus 3 + + + + + Minuses tight: + + + + + Minus 1 + + + + + Minus 2 + + + + + Minus 3 + + + + + Minuses loose: + + + + + Minus 1 + + + + + Minus 2 + + + + + Minus 3 + + + +
+
+ Ordered + + Tight: + + + + + First + + + + + Second + + + + + Third + + + + + and: + + + + + One + + + + + Two + + + + + Three + + + + + Loose using tabs: + + + + + First + + + + + Second + + + + + Third + + + + + and using spaces: + + + + + One + + + + + Two + + + + + Three + + + + + Multiple paragraphs: + + + + + Item 1, graf one. + + + Item 1. graf two. The quick brown fox jumped over the lazy dog's + back. + + + + + Item 2. + + + + + Item 3. + + + +
+
+ Nested + + + + Tab + + + + + Tab + + + + + Tab + + + + + + + + + Here's another: + + + + + First + + + + + Second: + + + + + Fee + + + + + Fie + + + + + Foe + + + + + + + Third + + + + + Same thing but with paragraphs: + + + + + First + + + + + Second: + + + + + Fee + + + + + Fie + + + + + Foe + + + + + + + Third + + + +
+
+ Tabs and spaces + + + + this is a list item indented with tabs + + + + + this is a list item indented with spaces + + + + + this is an example list item indented with tabs + + + + + this is an example list item indented with spaces + + + + + +
+
+
+ Inline Markup + + This is emphasized, and so + is this. + + + This is strong, and so + is this. + + + An emphasized link. + + + This is strong and em. + + + So is this + word. + + + This is strong and em. + + + So is this + word. + + + This is code: >, $, + \, \$, + <html>. + +
+
+ Smart quotes, ellipses, dashes + + "Hello," said the spider. "'Shelob' is my name." + + + 'A', 'B', and 'C' are letters. + + + 'Oak,' 'elm,' and 'beech' are names of trees. So is 'pine.' + + + 'He said, "I want to go."' Were you alive in the 70's? + + + Here is some quoted 'code' and a + "quoted link". + + + Some dashes: one---two --- three--four -- five. + + + Dashes between numbers: 5-7, 255-66, 1987-1999. + + + Ellipses...and. . .and . . . . + +
+
+ LaTeX + + + + \cite[22-23]{smith.1899} + + + + + \doublespacing + + + + + $2+2=4$ + + + + + $x \in y$ + + + + + $\alpha \wedge \omega$ + + + + + $223$ + + + + + $p$-Tree + + + + + $\frac{d}{dx}f(x)=\lim_{h\to 0}\frac{f(x+h)-f(x)}{h}$ + + + + + Here's one that has a line break in it: + $\alpha + \omega \times x^2$. + + + + + These shouldn't be math: + + + + + To get the famous equation, write $e = mc^2$. + + + + + $22,000 is a lot of money. So is $34,000. (It + worked if "lot" is emphasized.) + + + + + Escaped $: $73 + this should be emphasized 23$. + + + + + Here's a LaTeX table: + + + \begin{tabular}{|l|l|}\hline +Animal & Number \\ \hline +Dog & 2 \\ +Cat & 1 \\ \hline +\end{tabular} + +
+
+ Special Characters + + Here is some unicode: + + + + + I hat: Î + + + + + o umlaut: ö + + + + + section: § + + + + + set membership: ∈ + + + + + copyright: © + + + + + AT&T has an ampersand in their name. + + + AT&T is another way to write it. + + + This & that. + + + 4 < 5. + + + 6 > 5. + + + Backslash: \ + + + Backtick: ` + + + Asterisk: * + + + Underscore: _ + + + Left brace: { + + + Right brace: } + + + Left bracket: [ + + + Right bracket: ] + + + Left paren: ( + + + Right paren: ) + + + Greater-than: > + + + Hash: # + + + Period: . + + + Bang: ! + + + Plus: + + + + Minus: - + +
+
+ Links +
+ Explicit + + Just a URL. + + + URL and title. + + + URL and title. + + + URL and title. + + + URL and title + + + URL and title + + + nobody@nowhere.net + + + Empty. + +
+
+ Reference + + Foo bar. + + + Foo bar. + + + Foo bar. + + + With embedded [brackets]. + + + b by itself should be a link. + + + Indented once. + + + Indented twice. + + + Indented thrice. + + + This should [not][] be a link. + + + + Foo bar. + + + Foo biz. + +
+
+ With ampersands + + Here's a + link with an ampersand in the URL. + + + Here's a link with an amersand in the link text: + AT&T. + + + Here's an inline link. + + + Here's an + inline link in pointy braces. + +
+
+ Autolinks + + With an ampersand: + http://example.com/?foo=1&bar=2 + + + + + In a list? + + + + + http://example.com/ + + + + + It should. + + + + + An e-mail address: nobody@nowhere.net + +
+ + Blockquoted: + http://example.com/ + +
+ + Auto-links should not occur here: + <http://example.com/> + + ]]> +
+
+
+ Images + + From "Voyage dans la Lune" by Georges Melies (1902): + + + + + + + Voyage dans la Lune + + + + + + + + Here is a movie + + + + + + icon. + +
+
+ Footnotes + + Here is a footnote + reference, + + Here is the footnote. It can go anywhere after the footnote + reference. It need not be placed at the end of the document. + + + and + another. + + Here's the long note. This one contains multiple blocks. + + + Subsequent blocks are indented to show that they belong to the + footnote (as with list items). + + }]]> + + If you want, you can indent every line, but you can also be lazy + and just indent the first line of each block. + + + This should not be a footnote reference, + because it contains a space.[^my note] Here is an inline + note. + + This is easier to type. Inline notes may + contain links and + ] verbatim characters. + + + +
+ + Notes can go in + quotes. + + In quote. + + + +
+ + + + And in list + items. + + In list. + + + + + + + This paragraph should not be part of the note, as it is not + indented. + +
+ +
diff --git a/web/index.txt b/web/index.txt index 024133487..ec96b01dd 100644 --- a/web/index.txt +++ b/web/index.txt @@ -4,8 +4,8 @@ Pandoc is a [Haskell] library for converting from one markup format to another, and a command-line tool that uses this library. It can read [markdown] and (subsets of) [reStructuredText], [HTML], and [LaTeX], and it can write [markdown], [reStructuredText], [HTML], [LaTeX], [RTF], -and [S5] HTML slide shows. Pandoc's version of markdown contains some -enhancements, like footnotes and embedded LaTeX. +[DocBook XML], and [S5] HTML slide shows. Pandoc's version of markdown +contains some enhancements, like footnotes and embedded LaTeX. In contrast to existing tools for converting markdown to HTML, which use regex substitutions, Pandoc has a modular design: it consists of a @@ -73,6 +73,7 @@ kind. [HTML]: http://www.w3.org/TR/html40/ [LaTeX]: http://www.latex-project.org/ [RTF]: http://en.wikipedia.org/wiki/Rich_Text_Format +[DocBook XML]: http://www.docbook.org/ [Haskell]: http://www.haskell.org/ [GHC]: http://www.haskell.org/ghc/ [GPL]: http://www.gnu.org/copyleft/gpl.html diff --git a/web/mkdemos.sh b/web/mkdemos.sh index ac3d45106..b3e344413 100644 --- a/web/mkdemos.sh +++ b/web/mkdemos.sh @@ -16,6 +16,7 @@ pandoc -s README.tex -o example0.txt pandoc -s -w rst README -o example0.txt pandoc -s README -o example0.rtf pandoc -s -m -i -w s5 S5DEMO -o example0.html +pandoc -s -w docbook README -o example0.db html2markdown http://www.gnu.org/software/make/ -o example0.txt markdown2pdf README -o example0.pdf markdown2pdf -C myheader.tex README -o example0.pdf' -- cgit v1.2.3