aboutsummaryrefslogtreecommitdiff
path: root/src/Text/Pandoc/Readers
diff options
context:
space:
mode:
authorJohn MacFarlane <jgm@berkeley.edu>2020-09-06 16:25:16 -0700
committerJohn MacFarlane <jgm@berkeley.edu>2020-09-21 10:15:50 -0700
commite0984a43a99231e72c02a0a716c8d0315de9abdf (patch)
tree8531ef58c2470d372ff2427a6ae09a6284461471 /src/Text/Pandoc/Readers
parent89c577befb78b32a0884b6092e0415c0dcadab72 (diff)
downloadpandoc-e0984a43a99231e72c02a0a716c8d0315de9abdf.tar.gz
Add built-in citation support using new citeproc library.
This deprecates the use of the external pandoc-citeproc filter; citation processing is now built in to pandoc. * Add dependency on citeproc library. * Add Text.Pandoc.Citeproc module (and some associated unexported modules under Text.Pandoc.Citeproc). Exports `processCitations`. [API change] * Add data files needed for Text.Pandoc.Citeproc: default.csl in the data directory, and a citeproc directory that is just used at compile-time. Note that we've added file-embed as a mandatory rather than a conditional depedency, because of the biblatex localization files. We might eventually want to use readDataFile for this, but it would take some code reorganization. * Text.Pandoc.Loging: Add `CiteprocWarning` to `LogMessage` and use it in `processCitations`. [API change] * Add tests from the pandoc-citeproc package as command tests (including some tests pandoc-citeproc did not pass). * Remove instructions for building pandoc-citeproc from CI and release binary build instructions. We will no longer distribute pandoc-citeproc. * Markdown reader: tweak abbreviation support. Don't insert a nonbreaking space after a potential abbreviation if it comes right before a note or citation. This messes up several things, including citeproc's moving of note citations. * Add `csljson` as and input and output format. This allows pandoc to convert between `csljson` and other bibliography formats, and to generate formatted versions of CSL JSON bibliographies. * Add module Text.Pandoc.Writers.CslJson, exporting `writeCslJson`. [API change] * Add module Text.Pandoc.Readers.CslJson, exporting `readCslJson`. [API change] * Added `bibtex`, `biblatex` as input formats. This allows pandoc to convert between BibLaTeX and BibTeX and other bibliography formats, and to generated formatted versions of BibTeX/BibLaTeX bibliographies. * Add module Text.Pandoc.Readers.BibTeX, exporting `readBibTeX` and `readBibLaTeX`. [API change] * Make "standalone" implicit if output format is a bibliography format. This is needed because pandoc readers for bibliography formats put the bibliographic information in the `references` field of metadata; and unless standalone is specified, metadata gets ignored. (TODO: This needs improvement. We should trigger standalone for the reader when the input format is bibliographic, and for the writer when the output format is markdown.) * Carry over `citationNoteNum` to `citationNoteNumber`. This was just ignored in pandoc-citeproc. * Text.Pandoc.Filter: Add `CiteprocFilter` constructor to Filter. [API change] This runs the processCitations transformation. We need to treat it like a filter so it can be placed in the sequence of filter runs (after some, before others). In FromYAML, this is parsed from `citeproc` or `{type: citeproc}`, so this special filter may be specified either way in a defaults file (or by `citeproc: true`, though this gives no control of positioning relative to other filters). TODO: we need to add something to the manual section on defaults files for this. * Add deprecation warning if `upandoc-citeproc` filter is used. * Add `--citeproc/-C` option to trigger citation processing. This behaves like a filter and will be positioned relative to filters as they appear on the command line. * Rewrote the manual on citatations, adding a dedicated Citations section which also includes some information formerly found in the pandoc-citeproc man page. * Look for CSL styles in the `csl` subdirectory of the pandoc user data directory. This changes the old pandoc-citeproc behavior, which looked in `~/.csl`. Users can simply symlink `~/.csl` to the `csl` subdirectory of their pandoc user data directory if they want the old behavior. * Add support for CSL bibliography entry formatting to LaTeX, HTML, Ms writers. Added CSL-related CSS to styles.html.
Diffstat (limited to 'src/Text/Pandoc/Readers')
-rw-r--r--src/Text/Pandoc/Readers/BibTeX.hs70
-rw-r--r--src/Text/Pandoc/Readers/CslJson.hs53
-rw-r--r--src/Text/Pandoc/Readers/Markdown.hs1
3 files changed, 124 insertions, 0 deletions
diff --git a/src/Text/Pandoc/Readers/BibTeX.hs b/src/Text/Pandoc/Readers/BibTeX.hs
new file mode 100644
index 000000000..c367e75a1
--- /dev/null
+++ b/src/Text/Pandoc/Readers/BibTeX.hs
@@ -0,0 +1,70 @@
+{-# LANGUAGE OverloadedStrings #-}
+{- |
+ Module : Text.Pandoc.Readers.BibTeX
+ Copyright : Copyright (C) 2020 John MacFarlane
+ License : GNU GPL, version 2 or above
+
+ Maintainer : John MacFarlane <jgm@berkeley.edu>
+ Stability : alpha
+ Portability : portable
+
+Parses BibTeX or BibLaTeX bibliographies into a Pandoc document
+with empty body and `references` and `nocite` fields
+in the metadata. A wildcard `nocite` is used so that
+if the document is rendered in another format, the
+entire bibliography will be printed.
+-}
+module Text.Pandoc.Readers.BibTeX
+ ( readBibTeX
+ , readBibLaTeX
+ )
+where
+
+import Text.Pandoc.Options
+import Text.Pandoc.Definition
+import Text.Pandoc.Builder (setMeta, cite, str)
+import Data.Text (Text)
+import Citeproc (Lang(..), parseLang)
+import Citeproc.Locale (getLocale)
+import Data.Maybe (fromMaybe)
+import Text.Pandoc.Error (PandocError(..))
+import Text.Pandoc.Class (PandocMonad, lookupEnv)
+import Text.Pandoc.Citeproc.BibTeX as BibTeX
+import Text.Pandoc.Citeproc.MetaValue (referenceToMetaValue)
+import Control.Monad.Except (throwError)
+
+-- | Read BibTeX from an input string and return a Pandoc document.
+-- The document will have only metadata, with an empty body.
+-- The metadata will contain a `references` field with the
+-- bibliography entries, and a `nocite` field with the wildcard `[@*]`.
+readBibTeX :: PandocMonad m => ReaderOptions -> Text -> m Pandoc
+readBibTeX = readBibTeX' BibTeX.Bibtex
+
+-- | Read BibLaTeX from an input string and return a Pandoc document.
+-- The document will have only metadata, with an empty body.
+-- The metadata will contain a `references` field with the
+-- bibliography entries, and a `nocite` field with the wildcard `[@*]`.
+readBibLaTeX :: PandocMonad m => ReaderOptions -> Text -> m Pandoc
+readBibLaTeX = readBibTeX' BibTeX.Biblatex
+
+readBibTeX' :: PandocMonad m => Variant -> ReaderOptions -> Text -> m Pandoc
+readBibTeX' variant _opts t = do
+ lang <- fromMaybe (Lang "en" (Just "US")) . fmap parseLang
+ <$> lookupEnv "LANG"
+ locale <- case getLocale lang of
+ Left e -> throwError $ PandocCiteprocError e
+ Right l -> return l
+ case BibTeX.readBibtexString variant locale (const True) t of
+ Left e -> throwError $ PandocParsecError t e
+ Right refs -> return $ setMeta "references"
+ (map referenceToMetaValue refs)
+ . setMeta "nocite"
+ (cite [Citation {citationId = "*"
+ , citationPrefix = []
+ , citationSuffix = []
+ , citationMode = NormalCitation
+ , citationNoteNum = 0
+ , citationHash = 0}]
+ (str "[@*]"))
+ $ Pandoc nullMeta []
+
diff --git a/src/Text/Pandoc/Readers/CslJson.hs b/src/Text/Pandoc/Readers/CslJson.hs
new file mode 100644
index 000000000..377186b1e
--- /dev/null
+++ b/src/Text/Pandoc/Readers/CslJson.hs
@@ -0,0 +1,53 @@
+{-# LANGUAGE OverloadedStrings #-}
+{- |
+ Module : Text.Pandoc.Readers.CslJson
+ Copyright : Copyright (C) 2020 John MacFarlane
+ License : GNU GPL, version 2 or above
+
+ Maintainer : John MacFarlane <jgm@berkeley.edu>
+ Stability : alpha
+ Portability : portable
+
+Parses CSL JSON bibliographies into a Pandoc document
+with empty body and `references` and `nocite` fields
+in the metadata. A wildcard `nocite` is used so that
+if the document is rendered in another format, the
+entire bibliography will be printed.
+
+<https://citeproc-js.readthedocs.io/en/latest/csl-json/markup.html>.
+-}
+module Text.Pandoc.Readers.CslJson
+ ( readCslJson )
+where
+
+import Text.Pandoc.Options
+import Text.Pandoc.Definition
+import Text.Pandoc.Builder (setMeta, cite, str)
+import qualified Text.Pandoc.UTF8 as UTF8
+import Data.Text (Text)
+import qualified Data.Text as T
+import Text.Pandoc.Error (PandocError(..))
+import Text.Pandoc.Class (PandocMonad)
+import Text.Pandoc.Citeproc.CslJson (cslJsonToReferences)
+import Text.Pandoc.Citeproc.MetaValue (referenceToMetaValue)
+import Control.Monad.Except (throwError)
+
+-- | Read CSL JSON from an input string and return a Pandoc document.
+-- The document will have only metadata, with an empty body.
+-- The metadata will contain a `references` field with the
+-- bibliography entries, and a `nocite` field with the wildcard `[@*]`.
+readCslJson :: PandocMonad m => ReaderOptions -> Text -> m Pandoc
+readCslJson _opts t =
+ case cslJsonToReferences (UTF8.fromText t) of
+ Left e -> throwError $ PandocParseError $ T.pack e
+ Right refs -> return $ setMeta "references"
+ (map referenceToMetaValue refs)
+ . setMeta "nocite"
+ (cite [Citation {citationId = "*"
+ , citationPrefix = []
+ , citationSuffix = []
+ , citationMode = NormalCitation
+ , citationNoteNum = 0
+ , citationHash = 0}]
+ (str "[@*]"))
+ $ Pandoc nullMeta []
diff --git a/src/Text/Pandoc/Readers/Markdown.hs b/src/Text/Pandoc/Readers/Markdown.hs
index 77f28b21b..257788081 100644
--- a/src/Text/Pandoc/Readers/Markdown.hs
+++ b/src/Text/Pandoc/Readers/Markdown.hs
@@ -1665,6 +1665,7 @@ str = do
abbrevs <- getOption readerAbbreviations
if not (T.null result) && T.last result == '.' && result `Set.member` abbrevs
then try (do ils <- whitespace
+ notFollowedBy (() <$ cite <|> () <$ note)
-- ?? lookAhead alphaNum
-- replace space after with nonbreaking space
-- if softbreak, move before abbrev if possible (#4635)