aboutsummaryrefslogtreecommitdiff
path: root/src/Text/Pandoc
diff options
context:
space:
mode:
authorJohn MacFarlane <jgm@berkeley.edu>2021-02-22 14:17:22 -0800
committerJohn MacFarlane <jgm@berkeley.edu>2021-02-22 14:17:22 -0800
commitd30791a38166538be60a134196f1d2675275017d (patch)
tree3294ac5a972807e28aa43e05d21bf9ce712f3f4b /src/Text/Pandoc
parent5a73c5d3f8136c7fba7429c3ae3a8ae31c58030b (diff)
downloadpandoc-d30791a38166538be60a134196f1d2675275017d.tar.gz
Fall back to latin1 if UTF-8 decoding fails...
...when handling URL argument served with no charset in the mime type. The assumption is that most pages that don't specify a charset in the mime type are either UTF-8 or latin1. I think that's a good assumption, though I'm not sure.
Diffstat (limited to 'src/Text/Pandoc')
-rw-r--r--src/Text/Pandoc/App.hs8
1 files changed, 7 insertions, 1 deletions
diff --git a/src/Text/Pandoc/App.hs b/src/Text/Pandoc/App.hs
index 59af029b5..40fb34834 100644
--- a/src/Text/Pandoc/App.hs
+++ b/src/Text/Pandoc/App.hs
@@ -1,3 +1,4 @@
+{-# LANGUAGE LambdaCase #-}
{-# LANGUAGE OverloadedStrings #-}
{-# LANGUAGE CPP #-}
{-# LANGUAGE ScopedTypeVariables #-}
@@ -352,7 +353,12 @@ readURI src = do
Just "UTF-8" -> return $ UTF8.toText bs
Just "ISO-8859-1" -> return $ T.pack $ B8.unpack bs
Just charset -> throwError $ PandocUnsupportedCharsetError charset
- Nothing -> return $ UTF8.toText bs
+ Nothing -> liftIO $ -- try first as UTF-8, then as latin1
+ E.catch (return $! UTF8.toText bs)
+ (\case
+ TSE.DecodeError{} ->
+ return $ T.pack $ B8.unpack bs
+ e -> E.throwIO e)
readFile' :: MonadIO m => FilePath -> m BL.ByteString
readFile' "-" = liftIO BL.getContents