aboutsummaryrefslogtreecommitdiff
path: root/src/Text/Pandoc/Readers/Org.hs
AgeCommit message (Collapse)AuthorFilesLines
2021-05-09Change reader types, allowing better tracking of source positions.John MacFarlane1-7/+4
Previously, when multiple file arguments were provided, pandoc simply concatenated them and passed the contents to the readers, which took a Text argument. As a result, the readers had no way of knowing which file was the source of any particular bit of text. This meant that we couldn't report accurate source positions on errors or include accurate source positions as attributes in the AST. More seriously, it meant that we couldn't resolve resource paths relative to the files containing them (see e.g. #5501, #6632, #6384, #3752). Add Text.Pandoc.Sources (exported module), with a `Sources` type and a `ToSources` class. A `Sources` wraps a list of `(SourcePos, Text)` pairs. [API change] A parsec `Stream` instance is provided for `Sources`. The module also exports versions of parsec's `satisfy` and other Char parsers that track source positions accurately from a `Sources` stream (or any instance of the new `UpdateSourcePos` class). Text.Pandoc.Parsing now exports these modified Char parsers instead of the ones parsec provides. Modified parsers to use a `Sources` as stream [API change]. The readers that previously took a `Text` argument have been modified to take any instance of `ToSources`. So, they may still be used with a `Text`, but they can also be used with a `Sources` object. In Text.Pandoc.Error, modified the constructor PandocParsecError to take a `Sources` rather than a `Text` as first argument, so parse error locations can be accurately reported. T.P.Error: showPos, do not print "-" as source name.
2021-01-08Update copyright notices for 2021 (#7012)Albert Krewinkel1-1/+1
2020-03-22Finer grained imports of Text.Pandoc.Class submodules (#6203)Albert Krewinkel1-1/+1
This should speed-up recompilation after changes in `Text.Pandoc.Class`, as the number of modules affected by a change will be smaller in general. It also offers faster insights into the parts of `T.P.Class` used within a module.
2020-03-15Use implicit Prelude (#6187)Albert Krewinkel1-2/+0
* Use implicit Prelude The previous behavior was introduced as a fix for #4464. It seems that this change alone did not fix the issue, and `stack ghci` and `cabal repl` only work with GHC 8.4.1 or newer, as no custom Prelude is loaded for these versions. Given this, it seems cleaner to revert to the implicit Prelude. * PandocMonad: remove outdated check for base version Only base versions 4.9 and later are supported, the check for `MIN_VERSION_base(4,8,0)` is therefore unnecessary. * Always use custom prelude Previously, the custom prelude was used only with older GHC versions, as a workaround for problems with ghci. The ghci problems are resolved by replacing package `base` with `base-noprelude`, allowing for consistent use of the custom prelude across all GHC versions.
2020-03-13Update copyright year (#6186)Albert Krewinkel1-1/+1
* Update copyright year * Copyright: add notes for Lua and Jira modules
2019-12-19Org reader: report errors properlyAlbert Krewinkel1-2/+1
Errors during parsing are now returned in full and no longer replaced by a custom message.
2019-11-12Switch to new pandoc-types and use Text instead of String [API change].despresc1-2/+2
PR #5884. + Use pandoc-types 1.20 and texmath 0.12. + Text is now used instead of String, with a few exceptions. + In the MediaBag module, some of the types using Strings were switched to use FilePath instead (not Text). + In the Parsing module, new parsers `manyChar`, `many1Char`, `manyTillChar`, `many1TillChar`, `many1Till`, `manyUntil`, `mantyUntilChar` have been added: these are like their unsuffixed counterparts but pack some or all of their output. + `glob` in Text.Pandoc.Class still takes String since it seems to be intended as an interface to Glob, which uses strings. It seems to be used only once in the package, in the EPUB writer, so that is not hard to change.
2019-03-01Remove license boilerplate.John MacFarlane1-18/+0
The haddock module header contains essentially the same information, so the boilerplate is redundant and just one more thing to get out of sync.
2019-02-04Add missing copyright notices and remove license boilerplate (#5112)Albert Krewinkel1-2/+2
Quite a few modules were missing copyright notices. This commit adds copyright notices everywhere via haddock module headers. The old license boilerplate comment is redundant with this and has been removed. Update copyright years to 2019. Closes #4592.
2018-03-18Use NoImplicitPrelude and explicitly import Prelude.John MacFarlane1-0/+2
This seems to be necessary if we are to use our custom Prelude with ghci. Closes #4464.
2018-01-05Update copyright notices to include 2018Albert Krewinkel1-2/+2
2017-06-20Move CR filtering from tabFilter to the readers.John MacFarlane1-1/+2
The readers previously assumed that CRs had been filtered from the input. Now we strip the CRs in the readers themselves, before parsing. (The point of this is just to simplify the parsers.) Shared now exports a new function `crFilter`. [API change] And `tabFilter` no longer filters CRs.
2017-06-10Changed all readers to take Text instead of String.John MacFarlane1-2/+5
Readers: Renamed StringReader -> TextReader. Updated tests. API change.
2017-05-13Update dates in copyright noticesAlbert Krewinkel1-2/+2
This follows the suggestions given by the FSF for GPL licensed software. <https://www.gnu.org/prep/maintain/html_node/Copyright-Notices.html>
2017-03-12Issue warning for duplicate header identifiers.John MacFarlane1-0/+2
As noted in the previous commit, an autogenerated identifier may still coincide with an explicit identifier that is given for a header later in the document, or with an identifier on a div, span, link, or image. This commit adds a warning in this case, so users can supply an explicit identifier. * Added `DuplicateIdentifier` to LogMessage. * Modified HTML, Org, MediaWiki readers so their custom state type is an instance of HasLogMessages. This is necessary for `registerHeader` to issue warnings. See #1745.
2017-03-04Stylish-haskell automatic formatting changes.John MacFarlane1-9/+9
2017-01-25Unify Errors.Jesse Rosenthal1-1/+2
2017-01-25Working on readers.Jesse Rosenthal1-7/+13
2016-07-01Org reader: refactor comment tree handlingAlbert Krewinkel1-38/+1
Comment trees were handled after parsing, as pattern matching on lists is easier than matching on sequences. The new method of reading documents as trees allows for more elegant subtree removal.
2016-06-03Org reader: support smart quotes export optionAlbert Krewinkel1-2/+3
Reading of smart quotes can be toggled using the `'` option.
2016-05-25Org reader: extract blocks parser to moduleAlbert Krewinkel1-844/+9
Block parsing code is moved to a separate module. This is part of the Org-mode reader cleanup effort.
2016-05-25Org reader: extract inline parser to moduleAlbert Krewinkel1-756/+41
Inline parsing code is moved to a separate module. Parsers for block starts are extracted as well, as those are used in the `endline` parser. This is part of the Org-mode reader cleanup effort.
2016-05-25Org reader: extract parsing function to moduleAlbert Krewinkel1-76/+10
The Org-mode reader uses many functions defined in the `Text.Pandoc.Parsing` utility module. Some of the functions are overwritten with versions adapted to Org-mode idiosyncrasies. These special functions, as well as the normal Pandoc versions, are combined in a single module to increase the ease of use. This leads to decoupling of Org-mode and Pandoc and hence to slightly cleaner code. The downside is code-bloat due to repeated import/export statements.
2016-05-23Org reader: respect drawer export settingAlbert Krewinkel1-11/+65
The `d` export option can be used to control which drawers are exported and which are discarded. Basic support for this option is added here.
2016-05-22Org reader/writer: use CUSTOM_ID in propertiesAlbert Krewinkel1-3/+4
The `ID` property is reserved for internal use by Org-mode and should not be used. The `CUSTOM_ID` property is to be used instead, it is converted to the `ID` property for certain export format. The reader and writer erroneously used `ID`. This is corrected by using `CUSTOM_ID` where appropriate.
2016-05-20Org reader: add :PROPERTIES: drawer supportAlbert Krewinkel1-28/+56
Headers can have optional `:PROPERTIES:` drawers associated with them. These drawers contain key/value pairs like the header's `id`. The reader adds all listed pairs to the header's attributes; `id` and `class` attributes are handled specially to match the way `Attr` are defined. This also changes behavior of how drawers of unknown type are handled. Instead of including all unknown drawers, those are not read/exported, thereby matching current Emacs behavior. This closes #1877.
2016-05-19Org reader: add support for ATTR_HTML attributesAlbert Krewinkel1-7/+28
Arbitrary key-value pairs can be added to some block types using a `#+ATTR_HTML` line before the block. Emacs Org-mode only includes these when exporting to HTML, but since we cannot make this distinction here, the attributes are always added. The functionality is now supported for figures. This closes #1906.
2016-05-19Org reader: use custom `anyLine`Albert Krewinkel1-3/+10
Additional state changes need to be made after a newline is parsed, otherwise markup may not be recognized correctly. This fixes a bug where markup after certain block-types would not be recognized. E.g. `/emph/` in the following snippet was not parsed as emphasized. foo # comment /emph/
2016-05-19Org reader: refactor block attribute handlingAlbert Krewinkel1-79/+77
A parser state attribute was used to keep track of block attributes defined in meta-lines. Global state is undesirable, so block attributes are no longer saved as part of the parser state. Old functions and the respective part of the parser state are removed.
2016-05-11Org reader: parse but ignore export optionsAlbert Krewinkel1-2/+35
All known export options are parsed but ignored.
2016-05-11Org reader: add support for sub/superscript export optionsAlbert Krewinkel1-3/+25
Org-mode allows to specify export settings via `#+OPTIONS` lines. Disabling simple sub- and superscripts is one of these export options, this options is now supported.
2016-05-11Org reader: move parser state into separate moduleAlbert Krewinkel1-158/+57
The org reader code has become large and confusing. Extracting smaller parts into submodules should help to clean things up.
2016-05-09Org reader: fix inline-LaTeX regressionAlbert Krewinkel1-4/+9
The last fix for whitespace handling of inline LaTeX commands was incorrect, preventing correct recognition of inline LaTeX commands which contain spaces. This fix ensures that only trailing whitespace is cut off.
2016-05-05Merge pull request #2898 from tarleb/org-table-refactoringJohn MacFarlane1-59/+59
Org reader: table parsing code refactoring and fixes
2016-05-04Org reader: fix spacing after LaTeX-style symbolsAlbert Krewinkel1-5/+7
The org-reader was droping space after unescaped LaTeX-style symbol commands: `\ForAll \Auml` resulted in `∀Ä` but should give `∀ Ä` instead. This seems to be because the LaTeX-reader treats the command-terminating space as part of the command. Dropping the trailing space from the symbol-command fixes this issue.
2016-05-04Org reader: fix handling of empty table cells, rowsAlbert Krewinkel1-13/+17
This fixes Org mode parsing of some corner cases regarding empty cells and rows. Empty cells weren't parsed correctly, e.g. `|||` should be two empty cells, but would be parsed as a single cell containing a pipe character. Empty rows where parsed as alignment rows and dropped from the output. This fixes #2616.
2016-05-04Org reader: refactor rows-to-table conversionAlbert Krewinkel1-25/+25
This refactores the codes conversing a list table lines to an org table ADT. The old code was simplified and is now slightly less ugly.
2016-05-04Org reader: stop padding short table rowsAlbert Krewinkel1-24/+20
Emacs Org-mode doesn't add any padding to table rows. The first row (header or first body row) is used to determine the column count, no other magic is performed. The org reader was padding rows to the length of the longest table row. This was done due to a misunderstanding of how Org handles tables. This feature reflected how Org-mode handles tables when pressing <TAB>. The Org exporter however, which is what the reader should implement, doesn't do any of this. So this was a mis-feature that made the reader more complex and reduced comparability. It was hence removed.
2016-04-26Ignore leading space in org code blocksEmanuel Evans1-4/+20
Fixes #2862 Also fix up tab handling for leading whitespace in code blocks.
2016-02-20Merge pull request #2646 from tarleb/org-figure-with-no-nameJohn MacFarlane1-3/+3
Prefix even empty figure names with "fig:"
2016-01-31Org reader: Refactor link-target processingAlbert Krewinkel1-29/+29
Cleanup of the code for link target handling. Most notably, the canonicalization of a link is handled by a separate function. This fixes #2684.
2016-01-22Changed type of Shared.uniqueIdent argument from [String] to Set String.John MacFarlane1-2/+3
This avoids performance problems in documents with many identically named headers. Closes #2671.
2016-01-11Prefix even empty figure names with "fig:"Albert Krewinkel1-3/+3
The convention used by pandoc for figures is to mark them by prefixing the name with "fig:". The org reader failed to do this if a figure had no name. The test for this was broken as well. This fixes #2643.
2016-01-07Fix function dropping subtrees tagged :noexport:Albert Krewinkel1-2/+4
Continue scanning for comment subtrees beyond only the first block. Note to self: when writing an recursive function, don't forget to, you know, actually recurse. Shout to @mrvdb for noticing this. This fixes #2628.
2015-12-12Modified readers to emit SoftBreak when appropriate.John MacFarlane1-1/+1
2015-11-13Merge pull request #2526 from tarleb/org-definition-lists-fixJohn MacFarlane1-1/+4
Org reader: Require whitespace around def list markers
2015-11-13Org reader: Require whitespace around def list markersAlbert Krewinkel1-1/+4
Definition list markers (i.e. double colons `::`) must be surrounded by whitespace to start a definition item. This rule was not checked before, resulting in bugs with footnotes and some link types. Thanks to @conklech for noticing and reporting this issue. This fixes #2518.
2015-11-13Org reader: Fix emphasis rules for smart parsingAlbert Krewinkel1-4/+9
Smart quotes, ellipses, and dashes should behave like normal quotes, single dashes, and dots with respect to text markup parsing. The parser state was not updated properly in all cases, which has been fixed. Thanks to @conklech for reporting this issue. This fixes #2513.
2015-11-09Restored Text.Pandoc.Compat.Monoid.John MacFarlane1-0/+1
Don't use custom prelude for latest ghc. This is a better approach to making 'stack ghci' and 'cabal repl' work. Instead of using NoImplicitPrelude, we only use the custom prelude for older ghc versions. The custom prelude presents a uniform API that matches the current base version's prelude. So, when developing (presumably with latest ghc), we don't use a custom prelude at all and hence have no trouble with ghci. The custom prelude no longer exports (<>): we now want to match the base 4.8 prelude behavior.
2015-11-09Revert "Use -XNoImplicitPrelude and 'import Prelude' explicitly."John MacFarlane1-1/+0
This reverts commit c423dbb5a34c2d1195020e0f0ca3aae883d0749b.