<feed xmlns='http://www.w3.org/2005/Atom'>
<title>pandoc/src/Text/Pandoc/Readers/Docx, branch master</title>
<subtitle>Conversion between markup formats</subtitle>
<id>https://git.pashev.ru/pandoc/atom?h=master</id>
<link rel='self' href='https://git.pashev.ru/pandoc/atom?h=master'/>
<link rel='alternate' type='text/html' href='https://git.pashev.ru/pandoc/'/>
<updated>2021-12-29T00:31:54Z</updated>
<entry>
<title>Use `splitDirectories` istead of `splitPath`.</title>
<updated>2021-12-29T00:31:54Z</updated>
<author>
<name>John MacFarlane</name>
<email>jgm@berkeley.edu</email>
</author>
<published>2021-12-29T00:31:54Z</published>
<link rel='alternate' type='text/html' href='https://git.pashev.ru/pandoc/commit/?id=d960282b105a6469c760b4308a3b81da723b7256'/>
<id>urn:sha1:d960282b105a6469c760b4308a3b81da723b7256</id>
<content type='text'>
We were using `splitPath` in two places in the code
where `splitDirectories` should have been used.

This led to a test for `..` in paths in `extractMedia`
failing, so that images with `..` in the path name
could be extracted outside the directory specified
by `extractMedia`.

It also led a test for `media` in resource paths to fail
in the docx reader.
</content>
</entry>
<entry>
<title>Docx reader:  don't let first line indents trigger block quotes.</title>
<updated>2021-11-02T21:04:38Z</updated>
<author>
<name>John MacFarlane</name>
<email>jgm@berkeley.edu</email>
</author>
<published>2021-11-02T21:02:24Z</published>
<link rel='alternate' type='text/html' href='https://git.pashev.ru/pandoc/commit/?id=938d55784486f42d80cc4c2fcfe6ae905be382cd'/>
<id>urn:sha1:938d55784486f42d80cc4c2fcfe6ae905be382cd</id>
<content type='text'>
This fixes a regression introduced in pandoc 2.15 by PR #7606.
Closes #7655.
</content>
</entry>
<entry>
<title>Docx reader: fix handling of empty fields</title>
<updated>2021-10-19T02:15:40Z</updated>
<author>
<name>Milan Bracke</name>
<email>mbracke@antidot.net</email>
</author>
<published>2021-06-24T07:27:28Z</published>
<link rel='alternate' type='text/html' href='https://git.pashev.ru/pandoc/commit/?id=465c28d28e1017040a41653edb6248056f178d3b'/>
<id>urn:sha1:465c28d28e1017040a41653edb6248056f178d3b</id>
<content type='text'>
Some fields only have an instrText and no content, Pandoc didn't
understand these, causing other fields to be misunderstood because it
seemed like a field was still open when it wasn't.
</content>
</entry>
<entry>
<title>Docx parser: implement PAGEREF fields</title>
<updated>2021-10-19T02:15:40Z</updated>
<author>
<name>Milan Bracke</name>
<email>mbracke@antidot.net</email>
</author>
<published>2021-06-11T07:26:09Z</published>
<link rel='alternate' type='text/html' href='https://git.pashev.ru/pandoc/commit/?id=6acc82c5d2885c596c52e6c35bed8fe08f535066'/>
<id>urn:sha1:6acc82c5d2885c596c52e6c35bed8fe08f535066</id>
<content type='text'>
These fields, often used in tables of contents, can be a hyperlink.
</content>
</entry>
<entry>
<title>Docx reader: fix handling of nested fields</title>
<updated>2021-10-19T02:15:40Z</updated>
<author>
<name>Milan Bracke</name>
<email>mbracke@antidot.net</email>
</author>
<published>2021-06-14T13:00:36Z</published>
<link rel='alternate' type='text/html' href='https://git.pashev.ru/pandoc/commit/?id=193f6bfebaa43d0d6749d10a4e7ca78a0d31361d'/>
<id>urn:sha1:193f6bfebaa43d0d6749d10a4e7ca78a0d31361d</id>
<content type='text'>
Fields delimited by fldChar elements can contain other fields. Before,
the nested fields would be ignored, except for the end, which would be
considered the end of the parent field.

To fix this issue, fields needed to be considered containing ParParts
instead of Runs, since a Run can't represent complex enough structures.
This also impacted Hyperlinks since they can originate from a field.
</content>
</entry>
<entry>
<title>Avoid blockquote when parent style has more indent</title>
<updated>2021-10-10T23:27:32Z</updated>
<author>
<name>Milan Bracke</name>
<email>mbracke@antidot.net</email>
</author>
<published>2021-10-01T09:34:14Z</published>
<link rel='alternate' type='text/html' href='https://git.pashev.ru/pandoc/commit/?id=0f98cbff4b61b8e79f386f77d18b3218f1214b25'/>
<id>urn:sha1:0f98cbff4b61b8e79f386f77d18b3218f1214b25</id>
<content type='text'>
When a paragraph has an indentation different from the parent (named)
style, it used to be considered a blockquote. But this only makes sense
when the paragraph has more indentation. So this commit adds a check
for the indentation of the parent style.
</content>
</entry>
<entry>
<title>Docx reader: Add placeholder for word diagram</title>
<updated>2021-09-30T19:44:44Z</updated>
<author>
<name>Ezwal</name>
<email>15009992+Ezwal@users.noreply.github.com</email>
</author>
<published>2021-09-29T13:42:37Z</published>
<link rel='alternate' type='text/html' href='https://git.pashev.ru/pandoc/commit/?id=472b33095e1feb42fa96e32271888a3152e36cea'/>
<id>urn:sha1:472b33095e1feb42fa96e32271888a3152e36cea</id>
<content type='text'>
</content>
</entry>
<entry>
<title>Improve docx reader's robustness in extracting images.</title>
<updated>2021-08-19T17:50:34Z</updated>
<author>
<name>John MacFarlane</name>
<email>jgm@berkeley.edu</email>
</author>
<published>2021-08-19T17:49:20Z</published>
<link rel='alternate' type='text/html' href='https://git.pashev.ru/pandoc/commit/?id=ef4efa5373a419edbb99355808ddc63d35ddef20'/>
<id>urn:sha1:ef4efa5373a419edbb99355808ddc63d35ddef20</id>
<content type='text'>
The docx reader made a couple assumptions about how docx
containers were laid out that were not always true, with
the result that some images in documents did not get
found/extracted.

Closes #7511.
</content>
</entry>
<entry>
<title>Docx reader: handle absolute URIs in Relationship Target.</title>
<updated>2021-06-12T20:56:09Z</updated>
<author>
<name>John MacFarlane</name>
<email>jgm@berkeley.edu</email>
</author>
<published>2021-06-12T20:56:09Z</published>
<link rel='alternate' type='text/html' href='https://git.pashev.ru/pandoc/commit/?id=cfa26e3ca0346397f41af9aed5b4cd1d86be1220'/>
<id>urn:sha1:cfa26e3ca0346397f41af9aed5b4cd1d86be1220</id>
<content type='text'>
Closes #7374.
</content>
</entry>
<entry>
<title>Docx reader: Support new table features.</title>
<updated>2021-05-28T18:15:23Z</updated>
<author>
<name>Emily Bourke</name>
<email>undergroundquizscene@protonmail.com</email>
</author>
<published>2020-06-18T08:53:32Z</published>
<link rel='alternate' type='text/html' href='https://git.pashev.ru/pandoc/commit/?id=56b211120c62a01f8aba1c4512acfe4677d8c7d0'/>
<id>urn:sha1:56b211120c62a01f8aba1c4512acfe4677d8c7d0</id>
<content type='text'>
* Column spans
* Row spans
  - The spec says that if the `val` attribute is ommitted, its value
    should be assumed to be `continue`, and that its values are
    restricted to {`restart`, `continue`}. If the value has any other
    value, I think it seems reasonable to default it to `continue`. It
    might cause problems if the spec is extended in the future by adding
    a third possible value, in which case this would probably give
    incorrect behaviour, and wouldn't error.
* Allow multiple header rows
* Include table description in simple caption
  - The table description element is like alt text for a table (along
    with the table caption element). It seems like we should include
    this somewhere, but I’m not 100% sure how – I’m pairing it with the
    simple caption for the moment. (Should it maybe go in the block
    caption instead?)
* Detect table captions
  - Check for caption paragraph style /and/ either the simple or
    complex table field. This means the caption detection fails for
    captions which don’t contain a field, as in an example doc I added
    as a test. However, I think it’s better to be too conservative: a
    missed table caption will still show up as a paragraph next to the
    table, whereas if I incorrectly classify something else as a table
    caption it could cause havoc by pairing it up with a table it’s
    not at all related to, or dropping it entirely.
* Update tests and add new ones

Partially fixes: #6316
</content>
</entry>
</feed>
