Fixed bug in the markdown reader: HTML preceding a code block

could cause it to be parsed as a paragraph. (The problem is that the HTML parser used to eat all blank space after an HTML block, including the indentation of the code block.) Resolves Issue #39. + In Text.Pandoc.Readers.HTML, removed parsing of following space from rawHtmlBlock. + In Text.Pandoc.Readers.Markdown, modified rawHtmlBlocks so that indentation is eaten *only* on the first line after the HTML block. This means that in <div> foo <div> the foo won't be treated as a code block, but in <div> foo </div> it will. This seems the right approach for least suprise. git-svn-id: https://pandoc.googlecode.com/svn/trunk@1164 788f1e2b-df1e-0410-8736-df70ead52e1b
author: fiddlosopher <fiddlosopher@788f1e2b-df1e-0410-8736-df70ead52e1b> 2007-12-31 01:02:44 +0000
committer: fiddlosopher <fiddlosopher@788f1e2b-df1e-0410-8736-df70ead52e1b> 2007-12-31 01:02:44 +0000
commit: e37df6db69fc1d7832db19316ca7beb9cd54a24b (patch)
tree: f8634bade2559c2cc510f294de7da2c1336b74f7 /Text/Pandoc/Readers
parent: ad5cbb78d0256a9394d73aa594a838278b7a8c81 (diff)
download: pandoc-e37df6db69fc1d7832db19316ca7beb9cd54a24b.tar.gz
2 files changed, 13 insertions, 7 deletions
diff --git a/Text/Pandoc/Readers/HTML.hs b/Text/Pandoc/Readers/HTML.hs
index 1d04c74e0..1fff4705f 100644
--- a/Text/Pandoc/Readers/HTML.hs
+++ b/Text/Pandoc/Readers/HTML.hs
@@ -207,9 +207,8 @@ htmlBlockElement = choice [ htmlScript, htmlStyle, htmlComment, xmlDec, definiti
 
 rawHtmlBlock = try $ do
   body <- htmlBlockElement <|> anyHtmlTag <|> anyHtmlEndTag
-  sp <- many space
   state <- getState
-  if stateParseRaw state then return (RawHtml (body ++ sp)) else return Null
+  if stateParseRaw state then return (RawHtml body) else return Null
 
 -- We don't want to parse </body> or </html> as raw HTML, since these
 -- are handled in parseHtml.
diff --git a/Text/Pandoc/Readers/Markdown.hs b/Text/Pandoc/Readers/Markdown.hs
index 6ff5ce17c..6455dcd9d 100644
--- a/Text/Pandoc/Readers/Markdown.hs
+++ b/Text/Pandoc/Readers/Markdown.hs
@@ -507,11 +507,18 @@ strictHtmlBlock = try $ do
              return $ tag ++ concat contents ++ end
 
 rawHtmlBlocks = do
-  htmlBlocks <- many1 rawHtmlBlock    
-  let combined = concatMap (\(RawHtml str) -> str) htmlBlocks
-  let combined' = if not (null combined) && last combined == '\n'
-                     then init combined  -- strip extra newline 
-                     else combined 
+  htmlBlocks <- many1 $ do (RawHtml blk) <- rawHtmlBlock
+                           sps <- do sp1 <- many spaceChar
+                                     sp2 <- option "" (blankline >> return "\n")
+                                     sp3 <- many spaceChar
+                                     sp4 <- option "" blanklines
+                                     return $ sp1 ++ sp2 ++ sp3 ++ sp4
+                           -- note: we want raw html to be able to
+                           -- precede a code block, when separated
+                           -- by a blank line
+                           return $ blk ++ sps
+  let combined = concat htmlBlocks
+  let combined' = if last combined == '\n' then init combined else combined
   return $ RawHtml combined'
 
 --
author	fiddlosopher <fiddlosopher@788f1e2b-df1e-0410-8736-df70ead52e1b>	2007-12-31 01:02:44 +0000
committer	fiddlosopher <fiddlosopher@788f1e2b-df1e-0410-8736-df70ead52e1b>	2007-12-31 01:02:44 +0000
commit	e37df6db69fc1d7832db19316ca7beb9cd54a24b (patch)
tree	f8634bade2559c2cc510f294de7da2c1336b74f7 /Text/Pandoc/Readers
parent	ad5cbb78d0256a9394d73aa594a838278b7a8c81 (diff)
download	pandoc-e37df6db69fc1d7832db19316ca7beb9cd54a24b.tar.gz