Age | Commit message (Collapse) | Author | Files | Lines |
|
New module Text.Pandoc.Readers.Custom, exporting
readCustom [API change].
Users can now do `-f myreader.lua` and pandoc will treat the
script myreader.lua as a custom reader, which parses an input
string to a pandoc AST, using the pandoc module defined for
Lua filters.
A sample custom reader can be found in data/reader.lua.
Closes #7669.
|
|
Reasons:
- Performance: HsYAML is around 20 times slower in parsing
large YAML bibliographies (#6084).
- An issue was submitted to HsYAML, but it hasn't gotten
any attention. HsYAML seems borderline unmaintained; it hasn't
had a commit in over a year.
- Unfortunately this goes back on our attempts to free ourselves
from C dependencies (#4535). But I don't see a better alternative
until a better pure Haskell parser is available.
Closes #6084.
Notes:
- We've removed the FromYAML instances for all types that had
them, since this is a HsYAML-specific typeclass [API change].
(The yaml package just uses From/ToJSON.)
- Unlike HsYAML (in the configuration we were using), yaml
parses 'Y', 'N', 'Yes', 'No', 'On', 'Off' as boolean values.
Users may need to quote these when they are meant to be
interpreted as strings. Similarly, 'null' is parsed as
a YAML null value (and will be treated as an empty string
by pandoc rather than the string 'null'). Quoting it will
force it to be interpreted as a string.
- Some tests had to be adjusted accordingly.
- Pandoc now behaves better when the YAML metadata contains
escaping errors: instead of just falling back on treating
the section as a table, it raises a YAML parsing error.
|
|
|
|
+ Add sandbox feature for readers. When this option is used,
readers and writers only have access to input files (and
other files specified directly on command line). This restriction
is enforced in the type system.
+ Filters, PDF production, custom writers are unaffected. This
feature only insulates the actual readers and writers, not
the pipeline around them in Text.Pandoc.App.
+ Note that when `--sandboxed` is specified, readers won't have
access to the resource path, nor will anything have access to
the user data directory.
+ Add module Text.Pandoc.Class.Sandbox, defining
`sandbox`. Exported via Text.Pandoc.Class. [API change]
Closes #5045.
|
|
This change has several parts:
- In Text.Pandoc.App, if the writer is docx, we fill the media
bag and attempt to convert any SVG images to PNG, adding these
to the media bag. The PNG backups have the same filenames as
the SVG images, but with an added .png extension. If the conversion
cannot be done (e.g. because rsvg-convert is not present),
a warning is omitted.
- In Text.Pandoc.Writers.Docx, we now use Word 2016's syntax for
including SVG images. If a PNG fallback is present in the media bag,
we include a link to that too.
It would be helpful if someone with an old Word version could test
to see that the documents we produce can be opened and viewed with
the PNG fallbacks. If not, then perhaps we can eliminate the
slightly complex code for producing these fallbacks.
Closes #4058.
|
|
|
|
Previously we used liftIO fairly liberally. The code has
been restructured to avoid this.
A small behavior change is that pandoc will now fall back
to latin1 encoding for inputs that can't be read as UTF-8.
This is what it did previously for content fetched from
the web and not marked as to content type. It makes sense
to do the same for local files.
|
|
so we can run this with any instance of PandocMonad and MonadIO,
not just PandocIO.
|
|
|
|
Previously, when multiple file arguments were provided, pandoc
simply concatenated them and passed the contents to the readers,
which took a Text argument.
As a result, the readers had no way of knowing which file
was the source of any particular bit of text. This meant that
we couldn't report accurate source positions on errors or
include accurate source positions as attributes in the AST.
More seriously, it meant that we couldn't resolve resource
paths relative to the files containing them
(see e.g. #5501, #6632, #6384, #3752).
Add Text.Pandoc.Sources (exported module), with a `Sources` type
and a `ToSources` class. A `Sources` wraps a list of `(SourcePos,
Text)` pairs. [API change] A parsec `Stream` instance is provided for
`Sources`. The module also exports versions of parsec's `satisfy` and
other Char parsers that track source positions accurately from a
`Sources` stream (or any instance of the new `UpdateSourcePos` class).
Text.Pandoc.Parsing now exports these modified Char parsers instead of
the ones parsec provides. Modified parsers to use a `Sources` as stream
[API change].
The readers that previously took a `Text` argument have been
modified to take any instance of `ToSources`. So, they may still
be used with a `Text`, but they can also be used with a `Sources`
object.
In Text.Pandoc.Error, modified the constructor PandocParsecError
to take a `Sources` rather than a `Text` as first argument,
so parse error locations can be accurately reported.
T.P.Error: showPos, do not print "-" as source name.
|
|
Tabs in plain-text inputs are now handled correctly, even if the
`--file-scope` flag is used.
Closes: #6709
|
|
Update citeproc test.
|
|
[API change]
Use Lang from UnicodeCollation.Lang instead.
This is a richer implementation of BCP 47.
|
|
This allows the syntax `${HOME}` to be used, in fields that expect
file paths only. Any environment variable may be interpolated
in this way. A warning will be raised for undefined variables.
The special variable `USERDATA` is automatically set to the
user data directory in force when the defaults file is parsed.
(Note: it may be different from the eventual user data directory,
if the defaults file or further command line options change that.)
Closes #5982.
Closes #5977.
Closes #6108 (path not taken).
|
|
...when handling URL argument served with no charset in the mime type.
The assumption is that most pages that don't specify a charset
in the mime type are either UTF-8 or latin1. I think that's a good
assumption, though I'm not sure.
|
|
the character encoding. We can properly handle UTF-8 and
latin1 (ISO-8859-1); for others we raise an error.
See #5600.
|
|
[API change] This affects `readFile`, `getContents`, `writeFileWith`,
`writeFile`, `putStrWith`, `putStr`, `putStrLnWith`, `putStrLn`.
`hPutStrWith`, `hPutStr`, `hPutStrLnWith`, `hPutStrLn`, `hGetContents`.
This avoids the need to uselessly create a linked list of characters
when emiting output.
|
|
Exported by Text.Pandoc.App.
|
|
* JATS writer: keep code lines at 80 chars or below
* JATS writer: fix citations
|
|
|
|
|
|
Closes #6841.
|
|
now that we permit extensions on formats other
than markdown.
|
|
Closes #6730.
Previously the command would succeed, returning empty metadata,
with no errors or warnings.
API changes:
- Remove now unused CouldNotParseYamlMetadata constructor for
LogMessage (T.P.Logging).
- Add 'Maybe FilePath' parameter to yamlToMeta in T.P.Readers.Markdown.
|
|
This deprecates the use of the external pandoc-citeproc
filter; citation processing is now built in to pandoc.
* Add dependency on citeproc library.
* Add Text.Pandoc.Citeproc module (and some associated unexported
modules under Text.Pandoc.Citeproc). Exports `processCitations`.
[API change]
* Add data files needed for Text.Pandoc.Citeproc: default.csl
in the data directory, and a citeproc directory that is just
used at compile-time. Note that we've added file-embed as a mandatory
rather than a conditional depedency, because of the biblatex
localization files. We might eventually want to use readDataFile
for this, but it would take some code reorganization.
* Text.Pandoc.Loging: Add `CiteprocWarning` to `LogMessage` and use it
in `processCitations`. [API change]
* Add tests from the pandoc-citeproc package as command tests (including
some tests pandoc-citeproc did not pass).
* Remove instructions for building pandoc-citeproc from CI and
release binary build instructions. We will no longer distribute
pandoc-citeproc.
* Markdown reader: tweak abbreviation support. Don't insert a
nonbreaking space after a potential abbreviation if it comes right before
a note or citation. This messes up several things, including citeproc's
moving of note citations.
* Add `csljson` as and input and output format. This allows pandoc
to convert between `csljson` and other bibliography formats,
and to generate formatted versions of CSL JSON bibliographies.
* Add module Text.Pandoc.Writers.CslJson, exporting `writeCslJson`. [API
change]
* Add module Text.Pandoc.Readers.CslJson, exporting `readCslJson`. [API
change]
* Added `bibtex`, `biblatex` as input formats. This allows pandoc
to convert between BibLaTeX and BibTeX and other bibliography formats,
and to generated formatted versions of BibTeX/BibLaTeX bibliographies.
* Add module Text.Pandoc.Readers.BibTeX, exporting `readBibTeX` and
`readBibLaTeX`. [API change]
* Make "standalone" implicit if output format is a bibliography format.
This is needed because pandoc readers for bibliography formats put
the bibliographic information in the `references` field of metadata;
and unless standalone is specified, metadata gets ignored.
(TODO: This needs improvement. We should trigger standalone for the
reader when the input format is bibliographic, and for the writer
when the output format is markdown.)
* Carry over `citationNoteNum` to `citationNoteNumber`. This was just
ignored in pandoc-citeproc.
* Text.Pandoc.Filter: Add `CiteprocFilter` constructor to Filter.
[API change] This runs the processCitations transformation.
We need to treat it like a filter so it can be placed
in the sequence of filter runs (after some, before others).
In FromYAML, this is parsed from `citeproc` or `{type: citeproc}`,
so this special filter may be specified either way in a defaults file
(or by `citeproc: true`, though this gives no control of positioning
relative to other filters). TODO: we need to add something to the
manual section on defaults files for this.
* Add deprecation warning if `upandoc-citeproc` filter is used.
* Add `--citeproc/-C` option to trigger citation processing.
This behaves like a filter and will be positioned
relative to filters as they appear on the command line.
* Rewrote the manual on citatations, adding a dedicated Citations
section which also includes some information formerly found in
the pandoc-citeproc man page.
* Look for CSL styles in the `csl` subdirectory of the pandoc user data
directory. This changes the old pandoc-citeproc behavior, which looked
in `~/.csl`. Users can simply symlink `~/.csl` to the `csl`
subdirectory of their pandoc user data directory if they want
the old behavior.
* Add support for CSL bibliography entry formatting to LaTeX, HTML,
Ms writers. Added CSL-related CSS to styles.html.
|
|
This commit adds the option `--no-check-certificate`, which disables certificate
checking when resources are fetched by HTTP.
Co-authored-by: Cécile Chemin <cecile.chemin@insee.fr>
Co-authored-by: Juliette Fourcot <juliette.fourcot@insee.fr>
|
|
* Use implicit Prelude
The previous behavior was introduced as a fix for #4464. It seems that
this change alone did not fix the issue, and `stack ghci` and `cabal
repl` only work with GHC 8.4.1 or newer, as no custom Prelude is loaded
for these versions. Given this, it seems cleaner to revert to the
implicit Prelude.
* PandocMonad: remove outdated check for base version
Only base versions 4.9 and later are supported, the check for
`MIN_VERSION_base(4,8,0)` is therefore unnecessary.
* Always use custom prelude
Previously, the custom prelude was used only with older GHC versions, as
a workaround for problems with ghci. The ghci problems are resolved by
replacing package `base` with `base-noprelude`, allowing for consistent
use of the custom prelude across all GHC versions.
|
|
* Update copyright year
* Copyright: add notes for Lua and Jira modules
|
|
...so it can affect things like include-in-header.
See #5982.
|
|
This adds a new function to the API: Text.Pandoc.Shared.findM.
|
|
|
|
All warnings are either fixed or, if more appropriate, HLint is
configured to ignore them. HLint suggestions remain.
* Ignore "Use camelCase" warnings in Lua and legacy code
* Fix or ignore remaining HLint warnings
* Remove redundant brackets
* Remove redundant `return`s
* Remove redundant as-pattern
* Fuse mapM_/map
* Use `.` to shorten code
* Remove redundant `fmap`
* Remove unused LANGUAGE pragmas
* Hoist `not` in Text.Pandoc.App
* Use fewer imports for `Text.DocTemplates`
* Remove redundant `do`s
* Remove redundant `$`s
* Jira reader: remove unnecessary parentheses
|
|
* Avoid duplicating the dash case
* Pull common functions out of case branches
* Make sure list lengths are only calculated once
* Use unless
* Simplify parseURIReference' and avoid an unnecessary call to length
* Use <$> instead of reimplementing it
* Use swap instead of reimplementing it
* Remove eta-expansion that's been unnecessary since 90f5dd8
* Use tailDef instead of reimplementing it
* Use second instead of fmap, per @tarleb
|
|
For YAML metadata parsing. A step in the direction of #5914.
No API change.
|
|
`Nothing` means: nothing specified.
`Just []` means: an empty list specified (e.g. in defaults).
Potentially these could lead to different behavior: see #5888.
|
|
PR #5884.
+ Use pandoc-types 1.20 and texmath 0.12.
+ Text is now used instead of String, with a few exceptions.
+ In the MediaBag module, some of the types using Strings
were switched to use FilePath instead (not Text).
+ In the Parsing module, new parsers `manyChar`, `many1Char`,
`manyTillChar`, `many1TillChar`, `many1Till`, `manyUntil`,
`mantyUntilChar` have been added: these are like their
unsuffixed counterparts but pack some or all of their output.
+ `glob` in Text.Pandoc.Class still takes String since it seems
to be intended as an interface to Glob, which uses strings.
It seems to be used only once in the package, in the EPUB writer,
so that is not hard to change.
|
|
Previously, if a document contained two YAML metadata blocks
that set the same field, the conflict would be resolved in favor
of the first. Now it is resolved in favor of the second (due to
a change in pandoc-types).
This makes the behavior more uniform with other things in pandoc
(such as reference links and `--metadata-file`).
|
|
This changes `applyFilters` from Text.Pandoc.Filter so
that it does a left fold rather than a right fold, applying
the filters in the order listed. [behavior change]
The command-line arguments are accumulated in order instead
of reverse order.
A first step twoards #5881.
|
|
PDF output will not be output to the terminal, but can be
sent to stdout using either `-o -` or a pipe.
The intermediate format will be determined based on
the setting of `--pdf-engine`.
Closes #5751.
|
|
- Add FromYAML instances to Opt and to all subsidiary types.
- Remove the use of HsYAML-aeson, which doesn't give good
position information on errors.
- Rename some fields in Opt to better match cli options or
reflect what the ycontain [API change]:
+ optMetadataFile -> optMetadataFiles
+ optPDFEngineArgs -> optPDFEngineOpts
+ optWrapText -> optWrap
- Add IpynbOutput enumerated type to Text.Pandoc.App.Opts.
Use this instead fo a string for optIpynbOutput.
- Add FromYAML instance for Filter in Text.Pandoc.Filters.
With these changes parsing of defaults files should be
complete and should give decent error messages.
Now (unlike before) we get an error if an unknown field
is used.
|
|
on conflicting fields. This changes earlier behavior (but not in
a release), where first took precedence.
Note that this may seem inconsistent with the behavior of
multiple YAML blocks within a document, where the first takes
precedence. Still, it is convenient to be able to override
defaults with options later on the command line.
|
|
This will allow to: and from: in defaults.
|
|
[API change]
The current behavior of the `--metadata` option stays the same.
|
|
We now just use optShiftHeadingLevelBy, to avoid redundancy.
|
|
to match the option.
|
|
+ An error is now raised if you try to specify (enable or
disable) an extension that does not affect the given
format, e.g. `docx+pipe_tables`.
+ The `--list-extensions[=FORMAT]` option now lists only
extensions that affect the given FORMAT.
+ Text.Pandoc.Error: Add constructors `PandocUnknownReaderError`,
`PandocUnknownWriterError`, `PandocUnsupportedExtensionError`.
[API change]
+ Text.Pandoc.Extensions now exports `getAllExtensions`,
which returns the extensions that affect a given format
(whether enabled by default or not). [API change]
+ Text.Pandoc.Extensions: change type of `parseFormatSpec`
from `Either ParseError (String, Extensions -> Extensions)`
to `Either ParseError (String, [Extension], [Extension])`
[API change].
+ Text.Pandoc.Readers: change type of `getReader` so it returns
a value in the PandocMonad instance rather than an Either
[API change]. Exceptions for unknown formats and unsupported
extensions are now raised by this function and need not be handled by
the calling function.
+ Text.Pandoc.Writers: change type of `getWriter` so it returns
a value in the PandocMonad instance rather than an Either
[API change]. Exceptions for unknown formats and unsupported
extensions are now raised by this function and need not be handled by
the calling function.
|
|
|
|
Deprecate --base-heading-level.
The new option does everything the old one does, but also
allows negative shifts. It also promotes the document
metadata (if not null) to a level-1 heading with a +1 shift,
and demotes an initial level-1 heading to document metadata
with a -1 shift. This supports converting documents that
use an initial level-1 heading for the document title.
Closes #5615.
|
|
add UnusualConversion to LogMessage [API change]
|
|
|