aboutsummaryrefslogtreecommitdiff
path: root/src/Text
AgeCommit message (Collapse)AuthorFilesLines
2016-08-28Docx reader: Handle anchor spans with content in headers.Jesse Rosenthal1-7/+8
Previously, we would only be able to figure out internal links to a header in a docx if the anchor span was empty. We change that to read the inlines out of the first anchor span in a header. This still leaves another problem: what to do if there are multiple anchor spans in a header. That will be addressed in a future commit.
2016-08-27Translate NARROW NO-BREAK SPACE into LaTeX.Vaclav Zeman1-0/+1
Translate NARROW NO-BREAK SPACE into LaTeX' `\,`.
2016-08-27Man writer: allow section numbers that are not a single digit.John MacFarlane1-5/+5
Closes #3089.
2016-08-20Org writer: translate language identifiersAlbert Krewinkel1-5/+23
Pandoc and Org-mode use different programming language identifiers. An additional translation between those identifiers is added to avoid unexpected behavior. This fixes a problem where language specific source code would sometimes be output as example code.
2016-08-18Org writer: ensure link targets are paths or URLsAlbert Krewinkel1-5/+23
Org-mode treats links as document internal searches unless the link target looks like a URL or file path, either relative or absolute. This change ensures that this is always the case.
2016-08-18Org writer: ensure blank line after figureAlbert Krewinkel1-1/+1
An Org-mode figure should be surrounded by blank lines. The figure would be recognized regardless, but images in the following line would unintentionally be treated as figures as well.
2016-08-18Org writer: remove blank line after figure captionAlbert Krewinkel1-3/+2
Org-mode only treats an image as a figure if it is directly preceded by a caption.
2016-08-15Docx Writer: change dynamic style keyJesse Rosenthal1-1/+1
Use "custom-style" instead of "docx-style." This allows it to be used in other formats like ODT in the future.
2016-08-15Docx writer: Inject text properties as well.Jesse Rosenthal1-3/+20
2016-08-15Docx Writer: Keep track of dynamic text props too.Jesse Rosenthal1-0/+3
2016-08-15Docx writer: Allow dynamic styles on spans.Jesse Rosenthal1-1/+5
This enables dynamic styling on spans. It uses the same prefix as we used on divs ("docx-style" for the moment). It does not yet inject the style into styles.xml.
2016-08-15Docx writer: Inject new paragraph propertiesJesse Rosenthal1-4/+23
This injects new dynamic paragraph properties to be into the style file. Nothing occurs if the prop already exists in the style file.
2016-08-15StyleMap: export functions on StyleMap instancesJesse Rosenthal1-0/+2
We're going to want `getMap` in the Docx Writer.
2016-08-15Docx Writer: Have state keep track of dynamic styles.Jesse Rosenthal1-2/+6
We want to be able to inject these into our styles.xml.
2016-08-13Docx Writer: Implement user-defined styles.Jesse Rosenthal1-0/+6
Divs with a "docx-style" key in the attributes will apply the corresponding key to the contained blocks.
2016-08-13Docx parser: Use xml convenience functionsJesse Rosenthal1-38/+27
The functions `isElem` and `elemName` (defined in Docx/Util.hs) make the code a lot cleaner than the original XML.Light functions, but they had been used inconsistently. This puts them in wherever applicable.
2016-08-11Merge pull request #3048 from tarleb/latex-mini-fixJohn MacFarlane1-1/+1
LaTeX reader: drop duplicate `*` in bibtexKeyChars
2016-08-09Merge pull request #3065 from tarleb/org-verse-indentJohn MacFarlane1-1/+10
Org reader: preserve indentation of verse lines
2016-08-09Org reader: ensure image sources are proper linksAlbert Krewinkel3-39/+53
Image sources as those in plain images, image links, or figures, must be proper URIs or relative file paths to be recognized as images. This restriction is now enforced for all image sources. This also fixes the reader's usage of uncleaned image sources, leading to `file:` prefixes not being deleted from figure images (e.g. `[[file:image.jpg]]` leading to a broken image `<img src="file:image.jpg"/>) Thanks to @bsag for noticing this bug.
2016-08-08Org reader: preserve indentation of verse linesAlbert Krewinkel1-1/+10
Leading spaces in verse lines are converted to non-breaking spaces, so indentation is preserved. This fixes #3064.
2016-08-06MediaWiki reader: properly interpret XML tags in pre environments.John MacFarlane1-3/+2
They are meant to be interpreted as literal text in textile. Closes #3042.
2016-08-06Improved mediawiki reader's treatment of verbatim constructions.John MacFarlane1-7/+13
Previously these yielded strings of alternating Code and Space elements; we now incorporate the spaces into the Code. Emphasis etc. is still possible inside these. Closes #3055.
2016-08-06Fix for unquoted attribute values in mediawiki tables.John MacFarlane1-1/+1
Previously an unquoted attribute value in a table row could cause parsing problems. Fixes #3053 (well, proper rowspans and colspans aren't created, but that's a bigger limitation with the current Pandoc document model for tables).
2016-08-06Fix out of index error in handleErrorMatthew Pickering1-4/+8
In the latex parser when includes are processed, the text of the included file is directly included into the parse stream. This caused problems when there was an error in the included file (and the included file was longer than the original file) as the error would be reported at this position. The error handling tries to display the line and position where the error occured. It works by including a copy of the input and finding the place in the input when given the position of the error. In the previously described scenario, the input file would be the original source file but the error position would be the position of the error in the included file. The fix is to not try to show the exact line when it would cause an out-of-bounds error.
2016-08-06LaTeX writer: don't use * for unnumbered paragraph, subparagraph.John MacFarlane1-2/+2
The starred variants don't exist. This helps with part of #3058...it gets rid of the spurious *s. But we still have numbers on the 4th and 5th level headers.
2016-07-29LaTeX reader: drop duplicate `*` in bibtexKeyCharsAlbert Krewinkel1-1/+1
2016-07-22Merge pull request #3033 from tarleb/github-readmeJohn MacFarlane2-3/+3
PoC: GitHub-optimized README
2016-07-22Textile reader: disallow empty URL in explicit link.John MacFarlane1-1/+1
Closes #3036.
2016-07-22Textile reader: support `bc..` extended code blocks.John MacFarlane1-5/+25
Also, remove trailing newline in code blocks (consistently with Markdown reader).
2016-07-20Rename README to MANUAL.txtAlbert Krewinkel2-3/+3
2016-07-20LaTeX reader: be more forgiving of non-standard characters.John MacFarlane1-1/+1
E.g. `^` outside of math. Some custom environments give these a meaning, so we should try not to fall over when we encounter them.
2016-07-20LaTeX reader: more robust parsing of unknown environments.John MacFarlane1-2/+9
We no longer fail on things like `^` inside options for tikz. Closes #3026.
2016-07-20RST reader: use Div for admonitions.John MacFarlane1-8/+6
Previously blockquotes were used. Now a Div is used with class `admonition` and (if relevant) one of the following: `attention`, `caution`, `danger`, `error`, `hint`, `important`, `note`, `tip`, `warning`. `sidebar` is also put into a Div. Note: This will change rendering of RST documents! It should provide much more flexibility. Closes #3031.
2016-07-19Textile reader: improve definition list parsing.John MacFarlane1-6/+13
- Allow multiple terms (which we concatenate with linebreaks). - Fix exponential parsing bug (closes #3020 for real this time).
2016-07-18Textile reader: improved table parsing.John MacFarlane1-22/+62
We now handle cell and row attributes, mostly by skipping them. However, alignments are now handled properly. Since in pandoc alignment is per-column, not per-cell, we try to devine column alignments from cell alignments. Table captions are also now parsed, and textile indicators for thead and tfoot no longer cause parse failure. (However, a row designated as tfoot will just be a regular row in pandoc.)
2016-07-15Don't require haddock-library 1.4.John MacFarlane1-0/+4
Instead use CPP to work around version differences.
2016-07-15Use liftM since otherwise Functor type constraint needen in ghc 7.8.John MacFarlane1-1/+1
2016-07-14Fixed compiler warnings.John MacFarlane6-14/+11
2016-07-14Haddock reader - support math.John MacFarlane1-0/+4
The Haddock document model added elements for math in 1.4.
2016-07-14Docx Writer: Use actual creation time as doc propJesse Rosenthal1-4/+3
Previously, we had used the user-supplied date, if available, for Word's document creation metadata. This could lead to weird results, as in cases where the user post-dates a document (so the modification might be prior to the creation). Here we use the actual computer time to set the document creation.
2016-07-14Shared: improve year sanity check in normalizeDateJesse Rosenthal1-6/+6
Previously we parsed a list of dates, took the first one, and then tested its year range. That meant that if the first one failed, we returned nothing, regardless of what the others did. Now we test for sanity before running `msum` over the list of Maybe values. Anything failing the test will be Nothing, so will not be a candidate.
2016-07-14Shared: normalizeDate should reject illegal years.Jesse Rosenthal1-5/+10
We only allow years between 1601 and 9999, inclusive. The ISO 8601 actually says that years are supposed to start with 1583, but MS Word only allows 1601-9999. This should stop corrupted word files if the date is out of that range, or is parsed incorrectly.
2016-07-14Shared: Add further formats for `normalizeDate`Jesse Rosenthal1-1/+2
We want to avoid illegal dates -- in particular years with greater than four digits. We attempt to parse series of digits first as `%Y%m%d`, then `%Y%m`, and finally `%Y`.
2016-07-14Removed some redundant class constraints.John MacFarlane1-3/+3
2016-07-14Merge pull request #3019 from tarleb/org-verbatim-fixJohn MacFarlane1-2/+4
Org reader: fix parsing of verbatim inlines
2016-07-14Fixed exponential parsing bug in textile reader.John MacFarlane1-0/+1
Closes #3020.
2016-07-14Org reader: fix parsing of verbatim inlinesAlbert Krewinkel1-2/+4
Org rules for allowed characters before or after markup chars were not checked for verbatim text. This resultet in wrong parsing outcomes of if the verbatim text contained e.g. space enclosed markup characters as part of the text (`=is_substr = True=`). Forcing the parser to update the positions of allowed/forbidden markup border characters fixes this. This fixes #3016.
2016-07-05Merge pull request #3014 from tarleb/org-writer-divJohn MacFarlane1-7/+41
Org writer: improve Div handling
2016-07-05Org writer: improve Div handlingAlbert Krewinkel1-7/+41
Div blocks handling is changed to make the output look more like idiomatic org mode: - Div-wrapped content is output as-is if the div's attribute is the null attribute. - Div containers with an id but neither classes nor key-value pairs are unwrapped and the id is added as an anchor. - Divs with classes associated with greater block elements are wrapped in a `#+BEGIN`...`#+END` block. - The old behavior for Divs with more complex attributes is kept.
2016-07-04Org reader: replace ugly code with view patternAlbert Krewinkel1-5/+4
Some less-than-smart code required a pragma switching of overlapping pattern warnings in order to compile seamlessly. Using view patterns makes the code easier to read and also doesn't require overlapping pattern checks to be disabled.