pandoc/src/Text/Pandoc/Readers/Docx, branch master

pandoc/src/Text/Pandoc/Readers/Docx, branch master Conversion between markup formats https://git.pashev.ru/pandoc/atom?h=master 2021-12-29T00:31:54Z Use `splitDirectories` istead of `splitPath`. 2021-12-29T00:31:54Z John MacFarlane jgm@berkeley.edu 2021-12-29T00:31:54Z urn:sha1:d960282b105a6469c760b4308a3b81da723b7256 We were using `splitPath` in two places in the code where `splitDirectories` should have been used. This led to a test for `..` in paths in `extractMedia` failing, so that images with `..` in the path name could be extracted outside the directory specified by `extractMedia`. It also led a test for `media` in resource paths to fail in the docx reader. Docx reader: don't let first line indents trigger block quotes. 2021-11-02T21:04:38Z John MacFarlane jgm@berkeley.edu 2021-11-02T21:02:24Z urn:sha1:938d55784486f42d80cc4c2fcfe6ae905be382cd This fixes a regression introduced in pandoc 2.15 by PR #7606. Closes #7655. Docx reader: fix handling of empty fields 2021-10-19T02:15:40Z Milan Bracke mbracke@antidot.net 2021-06-24T07:27:28Z urn:sha1:465c28d28e1017040a41653edb6248056f178d3b Some fields only have an instrText and no content, Pandoc didn't understand these, causing other fields to be misunderstood because it seemed like a field was still open when it wasn't. Docx parser: implement PAGEREF fields 2021-10-19T02:15:40Z Milan Bracke mbracke@antidot.net 2021-06-11T07:26:09Z urn:sha1:6acc82c5d2885c596c52e6c35bed8fe08f535066 These fields, often used in tables of contents, can be a hyperlink. Docx reader: fix handling of nested fields 2021-10-19T02:15:40Z Milan Bracke mbracke@antidot.net 2021-06-14T13:00:36Z urn:sha1:193f6bfebaa43d0d6749d10a4e7ca78a0d31361d Fields delimited by fldChar elements can contain other fields. Before, the nested fields would be ignored, except for the end, which would be considered the end of the parent field. To fix this issue, fields needed to be considered containing ParParts instead of Runs, since a Run can't represent complex enough structures. This also impacted Hyperlinks since they can originate from a field. Avoid blockquote when parent style has more indent 2021-10-10T23:27:32Z Milan Bracke mbracke@antidot.net 2021-10-01T09:34:14Z urn:sha1:0f98cbff4b61b8e79f386f77d18b3218f1214b25 When a paragraph has an indentation different from the parent (named) style, it used to be considered a blockquote. But this only makes sense when the paragraph has more indentation. So this commit adds a check for the indentation of the parent style. Docx reader: Add placeholder for word diagram 2021-09-30T19:44:44Z Ezwal 15009992+Ezwal@users.noreply.github.com 2021-09-29T13:42:37Z urn:sha1:472b33095e1feb42fa96e32271888a3152e36cea Improve docx reader's robustness in extracting images. 2021-08-19T17:50:34Z John MacFarlane jgm@berkeley.edu 2021-08-19T17:49:20Z urn:sha1:ef4efa5373a419edbb99355808ddc63d35ddef20 The docx reader made a couple assumptions about how docx containers were laid out that were not always true, with the result that some images in documents did not get found/extracted. Closes #7511. Docx reader: handle absolute URIs in Relationship Target. 2021-06-12T20:56:09Z John MacFarlane jgm@berkeley.edu 2021-06-12T20:56:09Z urn:sha1:cfa26e3ca0346397f41af9aed5b4cd1d86be1220 Closes #7374. Docx reader: Support new table features. 2021-05-28T18:15:23Z Emily Bourke undergroundquizscene@protonmail.com 2020-06-18T08:53:32Z urn:sha1:56b211120c62a01f8aba1c4512acfe4677d8c7d0 * Column spans * Row spans - The spec says that if the `val` attribute is ommitted, its value should be assumed to be `continue`, and that its values are restricted to {`restart`, `continue`}. If the value has any other value, I think it seems reasonable to default it to `continue`. It might cause problems if the spec is extended in the future by adding a third possible value, in which case this would probably give incorrect behaviour, and wouldn't error. * Allow multiple header rows * Include table description in simple caption - The table description element is like alt text for a table (along with the table caption element). It seems like we should include this somewhere, but I’m not 100% sure how – I’m pairing it with the simple caption for the moment. (Should it maybe go in the block caption instead?) * Detect table captions - Check for caption paragraph style /and/ either the simple or complex table field. This means the caption detection fails for captions which don’t contain a field, as in an example doc I added as a test. However, I think it’s better to be too conservative: a missed table caption will still show up as a paragraph next to the table, whereas if I incorrectly classify something else as a table caption it could cause havoc by pairing it up with a table it’s not at all related to, or dropping it entirely. * Update tests and add new ones Partially fixes: #6316