diff options
author | John MacFarlane <jgm@berkeley.edu> | 2020-09-06 16:25:16 -0700 |
---|---|---|
committer | John MacFarlane <jgm@berkeley.edu> | 2020-09-21 10:15:50 -0700 |
commit | e0984a43a99231e72c02a0a716c8d0315de9abdf (patch) | |
tree | 8531ef58c2470d372ff2427a6ae09a6284461471 /src/Text/Pandoc/Readers | |
parent | 89c577befb78b32a0884b6092e0415c0dcadab72 (diff) | |
download | pandoc-e0984a43a99231e72c02a0a716c8d0315de9abdf.tar.gz |
Add built-in citation support using new citeproc library.
This deprecates the use of the external pandoc-citeproc
filter; citation processing is now built in to pandoc.
* Add dependency on citeproc library.
* Add Text.Pandoc.Citeproc module (and some associated unexported
modules under Text.Pandoc.Citeproc). Exports `processCitations`.
[API change]
* Add data files needed for Text.Pandoc.Citeproc: default.csl
in the data directory, and a citeproc directory that is just
used at compile-time. Note that we've added file-embed as a mandatory
rather than a conditional depedency, because of the biblatex
localization files. We might eventually want to use readDataFile
for this, but it would take some code reorganization.
* Text.Pandoc.Loging: Add `CiteprocWarning` to `LogMessage` and use it
in `processCitations`. [API change]
* Add tests from the pandoc-citeproc package as command tests (including
some tests pandoc-citeproc did not pass).
* Remove instructions for building pandoc-citeproc from CI and
release binary build instructions. We will no longer distribute
pandoc-citeproc.
* Markdown reader: tweak abbreviation support. Don't insert a
nonbreaking space after a potential abbreviation if it comes right before
a note or citation. This messes up several things, including citeproc's
moving of note citations.
* Add `csljson` as and input and output format. This allows pandoc
to convert between `csljson` and other bibliography formats,
and to generate formatted versions of CSL JSON bibliographies.
* Add module Text.Pandoc.Writers.CslJson, exporting `writeCslJson`. [API
change]
* Add module Text.Pandoc.Readers.CslJson, exporting `readCslJson`. [API
change]
* Added `bibtex`, `biblatex` as input formats. This allows pandoc
to convert between BibLaTeX and BibTeX and other bibliography formats,
and to generated formatted versions of BibTeX/BibLaTeX bibliographies.
* Add module Text.Pandoc.Readers.BibTeX, exporting `readBibTeX` and
`readBibLaTeX`. [API change]
* Make "standalone" implicit if output format is a bibliography format.
This is needed because pandoc readers for bibliography formats put
the bibliographic information in the `references` field of metadata;
and unless standalone is specified, metadata gets ignored.
(TODO: This needs improvement. We should trigger standalone for the
reader when the input format is bibliographic, and for the writer
when the output format is markdown.)
* Carry over `citationNoteNum` to `citationNoteNumber`. This was just
ignored in pandoc-citeproc.
* Text.Pandoc.Filter: Add `CiteprocFilter` constructor to Filter.
[API change] This runs the processCitations transformation.
We need to treat it like a filter so it can be placed
in the sequence of filter runs (after some, before others).
In FromYAML, this is parsed from `citeproc` or `{type: citeproc}`,
so this special filter may be specified either way in a defaults file
(or by `citeproc: true`, though this gives no control of positioning
relative to other filters). TODO: we need to add something to the
manual section on defaults files for this.
* Add deprecation warning if `upandoc-citeproc` filter is used.
* Add `--citeproc/-C` option to trigger citation processing.
This behaves like a filter and will be positioned
relative to filters as they appear on the command line.
* Rewrote the manual on citatations, adding a dedicated Citations
section which also includes some information formerly found in
the pandoc-citeproc man page.
* Look for CSL styles in the `csl` subdirectory of the pandoc user data
directory. This changes the old pandoc-citeproc behavior, which looked
in `~/.csl`. Users can simply symlink `~/.csl` to the `csl`
subdirectory of their pandoc user data directory if they want
the old behavior.
* Add support for CSL bibliography entry formatting to LaTeX, HTML,
Ms writers. Added CSL-related CSS to styles.html.
Diffstat (limited to 'src/Text/Pandoc/Readers')
-rw-r--r-- | src/Text/Pandoc/Readers/BibTeX.hs | 70 | ||||
-rw-r--r-- | src/Text/Pandoc/Readers/CslJson.hs | 53 | ||||
-rw-r--r-- | src/Text/Pandoc/Readers/Markdown.hs | 1 |
3 files changed, 124 insertions, 0 deletions
diff --git a/src/Text/Pandoc/Readers/BibTeX.hs b/src/Text/Pandoc/Readers/BibTeX.hs new file mode 100644 index 000000000..c367e75a1 --- /dev/null +++ b/src/Text/Pandoc/Readers/BibTeX.hs @@ -0,0 +1,70 @@ +{-# LANGUAGE OverloadedStrings #-} +{- | + Module : Text.Pandoc.Readers.BibTeX + Copyright : Copyright (C) 2020 John MacFarlane + License : GNU GPL, version 2 or above + + Maintainer : John MacFarlane <jgm@berkeley.edu> + Stability : alpha + Portability : portable + +Parses BibTeX or BibLaTeX bibliographies into a Pandoc document +with empty body and `references` and `nocite` fields +in the metadata. A wildcard `nocite` is used so that +if the document is rendered in another format, the +entire bibliography will be printed. +-} +module Text.Pandoc.Readers.BibTeX + ( readBibTeX + , readBibLaTeX + ) +where + +import Text.Pandoc.Options +import Text.Pandoc.Definition +import Text.Pandoc.Builder (setMeta, cite, str) +import Data.Text (Text) +import Citeproc (Lang(..), parseLang) +import Citeproc.Locale (getLocale) +import Data.Maybe (fromMaybe) +import Text.Pandoc.Error (PandocError(..)) +import Text.Pandoc.Class (PandocMonad, lookupEnv) +import Text.Pandoc.Citeproc.BibTeX as BibTeX +import Text.Pandoc.Citeproc.MetaValue (referenceToMetaValue) +import Control.Monad.Except (throwError) + +-- | Read BibTeX from an input string and return a Pandoc document. +-- The document will have only metadata, with an empty body. +-- The metadata will contain a `references` field with the +-- bibliography entries, and a `nocite` field with the wildcard `[@*]`. +readBibTeX :: PandocMonad m => ReaderOptions -> Text -> m Pandoc +readBibTeX = readBibTeX' BibTeX.Bibtex + +-- | Read BibLaTeX from an input string and return a Pandoc document. +-- The document will have only metadata, with an empty body. +-- The metadata will contain a `references` field with the +-- bibliography entries, and a `nocite` field with the wildcard `[@*]`. +readBibLaTeX :: PandocMonad m => ReaderOptions -> Text -> m Pandoc +readBibLaTeX = readBibTeX' BibTeX.Biblatex + +readBibTeX' :: PandocMonad m => Variant -> ReaderOptions -> Text -> m Pandoc +readBibTeX' variant _opts t = do + lang <- fromMaybe (Lang "en" (Just "US")) . fmap parseLang + <$> lookupEnv "LANG" + locale <- case getLocale lang of + Left e -> throwError $ PandocCiteprocError e + Right l -> return l + case BibTeX.readBibtexString variant locale (const True) t of + Left e -> throwError $ PandocParsecError t e + Right refs -> return $ setMeta "references" + (map referenceToMetaValue refs) + . setMeta "nocite" + (cite [Citation {citationId = "*" + , citationPrefix = [] + , citationSuffix = [] + , citationMode = NormalCitation + , citationNoteNum = 0 + , citationHash = 0}] + (str "[@*]")) + $ Pandoc nullMeta [] + diff --git a/src/Text/Pandoc/Readers/CslJson.hs b/src/Text/Pandoc/Readers/CslJson.hs new file mode 100644 index 000000000..377186b1e --- /dev/null +++ b/src/Text/Pandoc/Readers/CslJson.hs @@ -0,0 +1,53 @@ +{-# LANGUAGE OverloadedStrings #-} +{- | + Module : Text.Pandoc.Readers.CslJson + Copyright : Copyright (C) 2020 John MacFarlane + License : GNU GPL, version 2 or above + + Maintainer : John MacFarlane <jgm@berkeley.edu> + Stability : alpha + Portability : portable + +Parses CSL JSON bibliographies into a Pandoc document +with empty body and `references` and `nocite` fields +in the metadata. A wildcard `nocite` is used so that +if the document is rendered in another format, the +entire bibliography will be printed. + +<https://citeproc-js.readthedocs.io/en/latest/csl-json/markup.html>. +-} +module Text.Pandoc.Readers.CslJson + ( readCslJson ) +where + +import Text.Pandoc.Options +import Text.Pandoc.Definition +import Text.Pandoc.Builder (setMeta, cite, str) +import qualified Text.Pandoc.UTF8 as UTF8 +import Data.Text (Text) +import qualified Data.Text as T +import Text.Pandoc.Error (PandocError(..)) +import Text.Pandoc.Class (PandocMonad) +import Text.Pandoc.Citeproc.CslJson (cslJsonToReferences) +import Text.Pandoc.Citeproc.MetaValue (referenceToMetaValue) +import Control.Monad.Except (throwError) + +-- | Read CSL JSON from an input string and return a Pandoc document. +-- The document will have only metadata, with an empty body. +-- The metadata will contain a `references` field with the +-- bibliography entries, and a `nocite` field with the wildcard `[@*]`. +readCslJson :: PandocMonad m => ReaderOptions -> Text -> m Pandoc +readCslJson _opts t = + case cslJsonToReferences (UTF8.fromText t) of + Left e -> throwError $ PandocParseError $ T.pack e + Right refs -> return $ setMeta "references" + (map referenceToMetaValue refs) + . setMeta "nocite" + (cite [Citation {citationId = "*" + , citationPrefix = [] + , citationSuffix = [] + , citationMode = NormalCitation + , citationNoteNum = 0 + , citationHash = 0}] + (str "[@*]")) + $ Pandoc nullMeta [] diff --git a/src/Text/Pandoc/Readers/Markdown.hs b/src/Text/Pandoc/Readers/Markdown.hs index 77f28b21b..257788081 100644 --- a/src/Text/Pandoc/Readers/Markdown.hs +++ b/src/Text/Pandoc/Readers/Markdown.hs @@ -1665,6 +1665,7 @@ str = do abbrevs <- getOption readerAbbreviations if not (T.null result) && T.last result == '.' && result `Set.member` abbrevs then try (do ils <- whitespace + notFollowedBy (() <$ cite <|> () <$ note) -- ?? lookAhead alphaNum -- replace space after with nonbreaking space -- if softbreak, move before abbrev if possible (#4635) |