pandoc - MSM's mirror of Pandoc

Age	Commit message (Collapse)	Author
2024-02-28	Docx writer: don't copy over footnotePr in settings.xml...	John MacFarlane
	rom reference.docx. Closes #9522.
2024-02-28	Docx reader: ensure that table captions are counted.	John MacFarlane
	Normally these occur outside the table element itself, but they should still be parsed as captions in this case. Closes #9518.
2024-02-03	Docx writer: restore ability to center-justify table.	John MacFarlane
	The fix to #5947 caused all tables to be left indented. This was necessary to avoid extra indentation in table cells when a table appeared in a list item. This change makes the changes conditional, so that they only affect tables in list items. Closes #9393.
2023-12-19	fix(docx): sort inline elements in schema order	Edwin Török
	Fixes #9273 ``` [ { "Description": "The element has unexpected child element 'http://schemas.openxmlformats.org/wordprocessingml/2006/main:b'.", "Path": { "NamespacesDefinitions": [ "xmlns:w=\"http://schemas.openxmlformats.org/wordprocessingml/2006/main\"" ], "Namespaces": { }, "XPath": "/w:document[1]/w:body[1]/w:p[1]/w:r[7]/w:rPr[1]", "PartUri": "/word/document.xml" }, "Id": "Sch_UnexpectedElementContentExpectingComplex", "ErrorType": "Schema" } ] ``` Signed-off-by: Edwin Török <edwin@etorok.net>
2023-12-18	fix(docx): fix validation error on endnotePr	Edwin Török
	Copying `endnotePr` causes validation errors, because it is now referencing something that doesn't exist in the document: ``` { "FilePath": "test/docx/golden/custom_style_reference.docx", "ValidationErrors": "[{\"Description\":\"Element 'w:endnote' referenced by 'endnote@http://schemas.openxmlformats.org/wordprocessingml/2006/main:id' does not exist in part '/MainDocumentPart/EndnotesPart'. The reference value is '0'.\",\"Path\":{\"NamespacesDefinitions\":[\"xmlns:w=\\\"http://schemas.openxmlformats.org/wordprocessingml/2006/main\\\"\"],\"Namespaces\":{},\"XPath\":\"/w:settings[1]/w:endnotePr[1]/w:endnote[2]\",\"PartUri\":\"/word/settings.xml\"},\"Id\":\"Sem_MissingReferenceElement\",\"ErrorType\":\"Semantic\"},{\"Description\":\"Element 'w:endnote' referenced by 'endnote@http://schemas.openxmlformats.org/wordprocessingml/2006/main:id' does not exist in part '/MainDocumentPart/EndnotesPart'. The reference value is '-1'.\",\"Path\":{\"NamespacesDefinitions\":[\"xmlns:w=\\\"http://schemas.openxmlformats.org/wordprocessingml/2006/main\\\"\"],\"Namespaces\":{},\"XPath\":\"/w:settings[1]/w:endnotePr[1]/w:endnote[1]\",\"PartUri\":\"/word/settings.xml\"},\"Id\":\"Sem_MissingReferenceElement\",\"ErrorType\":\"Semantic\"}]" } ``` For now don't copy this element, it wasn't copied before, and it doesn't seem necessary to fix the ordering problems we had with settings. Fixes: c9bf4da74 ("Docx writer: ensure that elements in settings are ordered correctly.") Signed-off-by: Edwin Török <edwin@etorok.net>
2023-12-18	fix(docx): fix validation error on w:tblHeader	Edwin Török
	``` { "FilePath": "test/docx/golden/tables.docx", "ValidationErrors": "[{\"Description\":\"The attribute 'http://schemas.openxmlformats.org/wordprocessingml/2006/main:val' has invalid value 'true'. The Enumeration constraint failed.\",\"Path\":{\"NamespacesDefinitions\":[\"xmlns:w=\\\"http://schemas.openxmlformats.org/wordprocessingml/2006/main\\\"\"],\"Namespaces\":{},\"XPath\":\"/w:document[1]/w:body[1]/w:tbl[1]/w:tr[1]/w:trPr[1]/w:tblHeader[1]\",\"PartUri\":\"/word/document.xml\"},\"Id\":\"Sch_AttributeValueDataTypeDetailed\",\"ErrorType\":\"Schema\"}]" } ``` Although this one might actually be a bug in Open-XML-SDK similar to this, or a subtle difference between standard versions: https://github.com/dotnet/Open-XML-SDK/issues/780 Signed-off-by: Edwin Török <edwin@etorok.net>
2023-12-18	fix(docx): use left vs start consistently	Edwin Török
	They are equivalent, but OOXML-Validator complains: ``` { "FilePath": "test/docx/golden/tables_separated_with_rawblock.docx", "ValidationErrors": "[{\"Description\":\"The attribute 'http://schemas.openxmlformats.org/wordprocessingml/2006/main:val' has invalid value 'start'. The Enumeration constraint failed.\",\"Path\":{\"NamespacesDefinitions\":[\"xmlns:w=\\\"http://schemas.openxmlformats.org/wordprocessingml/2006/main\\\"\"],\"Namespaces\":{},\"XPath\":\"/w:document[1]/w:body[1]/w:tbl[2]/w:tblPr[1]/w:jc[1]\",\"PartUri\":\"/word/document.xml\"},\"Id\":\"Sch_AttributeValueDataTypeDetailed\",\"ErrorType\":\"Schema\"},{\"Description\":\"The attribute 'http://schemas.openxmlformats.org/wordprocessingml/2006/main:val' has invalid value 'start'. The Enumeration constraint failed.\",\"Path\":{\"NamespacesDefinitions\":[\"xmlns:w=\\\"http://schemas.openxmlformats.org/wordprocessingml/2006/main\\\"\"],\"Namespaces\":{},\"XPath\":\"/w:document[1]/w:body[1]/w:tbl[1]/w:tblPr[1]/w:jc[1]\",\"PartUri\":\"/word/document.xml\"},\"Id\":\"Sch_AttributeValueDataTypeDetailed\",\"ErrorType\":\"Schema\"}]" } ``` pandoc already uses 'left' elsewhere, so be consistent, we still produce the transitional schema, not the strict one which would have the 'start' attribute. Signed-off-by: Edwin Török <edwin@etorok.net>
2023-12-18	fix(docx): fix validation error on inline w:i/w:iCs order	Edwin Török
	From `make validate-docx-golden-tests2`: ``` { "FilePath": "test/docx/golden/definition_list.docx", "ValidationErrors": "[{\"Description\":\"The element has unexpected child element 'http://schemas.openxmlformats.org/wordprocessingml/2006/main:i'.\",\"Path\":{\"NamespacesDefinitions\":[\"xmlns:w=\\\"http://schemas.openxmlformats.org/wordprocessingml/2006/main\\\"\"],\"Namespaces\":{},\"XPath\":\"/w:document[1]/w:body[1]/w:p[3]/w:r[3]/w:rPr[1]\",\"PartUri\":\"/word/document.xml\"},\"Id\":\"Sch_UnexpectedElementContentExpectingComplex\",\"ErrorType\":\"Schema\"}]" }, ``` Signed-off-by: Edwin Török <edwin@etorok.net>
2023-12-18	fix(docx): fix OOXMLValidator error on KeywordTok output	Edwin Török
	xmllint doesn't warn about this (maybe because the tag is empty?), but the order doesn't match wml.xsd: ``` <w:rPr> <w:color w:val="007020"/> <w:b/> </w:rPr> ``` And OOXMLValidatorCLI does warn about it: ``` { "Description": "The element has unexpected child element 'http://schemas.openxmlformats.org/wordprocessingml/2006/main:b'.", "Path": { "NamespacesDefinitions": [ "xmlns:w=\"http://schemas.openxmlformats.org/wordprocessingml/2006/main\"" ], "Namespaces": { }, "XPath": "/w:styles[1]/w:style[40]/w:rPr[1]", "PartUri": "/word/styles.xml" }, "Id": "Sch_UnexpectedElementContentExpectingComplex", "ErrorType": "Schema" } ``` Signed-off-by: Edwin Török <edwin@etorok.net>
2023-12-18	fix(docx): fix validation error on w:annotationRef	Edwin Török
	annotationRef is not valid for `w:rPr`, only for `w:r` according to wml.xsd. See https://github.com/jgm/pandoc/issues/9269 Signed-off-by: Edwin Török <edwin@etorok.net>
2023-12-18	fix(docx): fix validation error in w:nsid	Edwin Török
	The length here seems to refer to length in bytes (so twice as long in hex): ``` ./tmp/numbering-pretty.xml:4: element nsid: Schemas validity error : Element '{http://schemas.openxmlformats.org/wordprocessingml/2006/main}nsid', attribute '{http://schemas.openxmlformats.org/wordprocessingml/2006/main}val': [facet 'length'] The value 'A990' has a length of '2'; this differs from the allowed length of '4'. ``` [This](https://learn.microsoft.com/en-us/dotnet/api/documentformat.openxml.wordprocessing.nsid?view=openxml-2.8.1) also documents the longer values. Signed-off-by: Edwin Török <edwin@etorok.net>
2023-12-18	Docx writer: fixed validation errors in tables.	John MacFarlane
	Closes #9266.
2023-12-18	Docx writer: fix validation error.	John MacFarlane
	The elements in pPr in lists were not properly ordered. This doesn't seem to cause problems for Word, but it makes validation fail and may pose problems for other consumers of docx. Closes #9265.
2023-12-17	Docx writer: ensure that elements in settings are ordered correctly.	John MacFarlane
	The elements must occur in a specific order. This was being messed up when integrating a custom reference.docx. Closes #9264.
2023-12-17	test/docx/golden: regenerate	Edwin Török
	Using `make test TESTARGS=--accept` Signed-off-by: Edwin Török <edwin@etorok.net>
2023-12-08	Docx writer: Use different style for block quotes in notes.	John MacFarlane
	Using "Footnote Block Text" for the style name, so it can be given a different font size if footnotes are. Closes #9243.
2023-12-08	Docx writer: allow embedded fonts to be used in reference.docx.	John MacFarlane
	Closes #6728.
2023-11-29	Docx reader: unwrap content of shaped textboxes...	Stephan Meijer
	* #9214 text in shape format test document * #9214 support Text in Shape Format * #9214 remove irrelevant code
2023-08-18	Docx reader: omit "Table NN" from caption.	John MacFarlane
	Closes #9002.
2023-08-14	Docx reader: Avoid spurious block quotes in list items.	John MacFarlane
	The docx reader was overzealous in detecting indented paragraphs as block quotes, leading to list items sometimes being put in block quotes (especially in docx created by Google Docs). Closes #8836.
2023-05-09	Rename test/docx/block_quotes_parse_indent.native for consistency	Stephan Meijer

2023-05-08	Introduce support for Intense Quote in Docx conversion	Stephan Meijer
	This commit introduces support for the Intense Quote in Docx Conversion. Previously this was converted to a regular paragraph, but Intense Quote should be interpreted as a Quote in conversion.
2023-03-17	Update docx golden tests for style changes.	John MacFarlane

2023-03-17	Docx writer: include abstract title.	John MacFarlane
	Closes #8702. Uses localized term for abstract.
2023-01-13	Support complex figures. [API change]	Albert Krewinkel
	Thanks and credit go to Aner Lucero, who laid the groundwork for this feature in the 2021 GSoC project. He contributed many changes, including modifications to the readers for HTML, JATS, and LaTeX, and to the HTML and JATS writers. Shared (Albert Krewinkel): - The new function `figureDiv`, exported from `Text.Pandoc.Shared`, offers a standardized way to convert a figure into a Div element. Readers (Aner Lucero): - HTML reader: `<figure>` elements are parsed as figures, with the caption taken from the respective `<figcaption>` elements. - JATS reader: The `<fig>` and `<caption>` elements are parsed into figure elements, even if the contents is more complex. - LaTeX reader: support for figures with non-image contents and for subfigures. - Markdown reader: paragraphs containing just an image are treated as figures if the `implicit_figures` extension is enabled. The identifier is used as the figure's identifier and the image description is also used as figure caption; all other attributes are treated as belonging to the image. Writers (Aner Lucero, Albert Krewinkel): - DokuWiki, Haddock, Jira, Man, MediaWiki, Ms, Muse, PPTX, RTF, TEI, ZimWiki writers: Figures are rendered like Div elements. - Asciidoc writer: The figure contents is unwrapped; each image in the the figure becomes a separate figure. - Classic custom writers: Figures are passed to the global function `Figure(caption, contents, attr)`, where `caption` and `contents` are strings and `attr` is a table of key-value pairs. - ConTeXt writer: Figures are wrapped in a "placefigure" environment with `\startplacefigure`/`\endplacefigure`, adding the features caption and listing title as properties. Subfigures are place in a single row with the `\startfloatcombination` environment. - DocBook writer: Uses `mediaobject` elements, unless the figure contains subfigures or tables, in which case the figure content is unwrapped. - Docx writer: figures with multiple content blocks are rendered as tables with style `FigureTable`; like before, single-image figures are still output as paragraphs with style `Figure` or `Captioned Figure`, depending on whether a caption is attached. - DokuWiki writer: Caption and "alt-text" are no longer combined. The alt text of a figure will now be lost in the conversion. - FB2 writer: The figure caption is added as alt text to the images in the figure; pre-existing alt texts are kept. - ICML writer: Only single-image figures are supported. The contents of figures with additional elements gets unwrapped. - HTML writer: the alt text is no longer constructed from the caption, as was the case with implicit figures. This reduces duplication, but comes at the risk of images that are missing alt texts. Authors should take care to provide alt texts for all images. Some readers, most notably the Markdown reader with the `implicit_figures` extension, add a caption that's identical to the image description. The writer checks for this and adds an `aria-hidden` attribute to the `<figcaption>` element in that case. - JATS writer: The `<fig>` and `<caption>` elements are used write figures. - LaTeX writer: complex figures, e.g. with non-image contents and subfigures, are supported. The `subfigure` template variable is set if the document contains subfigures, triggering the conditional loading of the subcaption package. Contants of figures that contain tables are become unwrapped, as longtable environments are not allowed within figures. - Markdown writer: figures are output as implicit figures if possible, via HTML if the `raw_html` extension is enabled, and as Div elements otherwise. - OpenDocument writer: A separate paragraph is generated for each block element in a figure, each with style `FigureWithCaption`. Behavior for single-image figures therefore remains unchanged. - Org writer: Only the first element in a figure is given a caption; additional block elements in the figure are appended without any caption being added. - RST writer: Single-image figures are supported as before; the contents of more complex images become nested in a container of type `float`. - Texinfo writer: Figures are rendered as float with type `figure`. - Textile writer: Figures are rendered with the help of HTML elements. - XWiki: Figures are placed in a group. Co-authored-by: Aner Lucero <4rgento@gmail.com>
2022-12-20	Shared: use LineBreak as default block sep in blocksToInlines	Albert Krewinkel
	This change also affects the `pandoc.utils.blocks_to_inlines` Lua function. Closes: #8499
2022-12-08	Shared: change defaultBlocksSeparator to PARAGRAPH SEPARATOR	Albert Krewinkel
	This Unicode char (U+2029) is intended as a semantic separator between paragraphs; it is cleaner and less intrusive than the pilcrow sign that we used before. This also changes the default `sep` value used in the `pandoc.utils.blocks_to_inlines` Lua function.
2022-11-19	Docx reader: Support parsing of highlighted text.	John MacFarlane

2022-02-04	Docx zotero/mendeley/endnote: add comma before locator in suffix.	John MacFarlane

2022-02-04	Add mendeley citation tests.	John MacFarlane

2022-02-03	Docx reader: add bibliographic entries for zotero ADDIN.	John MacFarlane
	Bibliographic data embedded in citation items is added to the `references` metadata field. Closes #7840.
2022-02-03	Add zoreto test with +citations.	John MacFarlane
	So far, though, we still don't include the references in the metadata.
2022-02-03	Add zotero citation test with docx-citations.	John MacFarlane

2022-01-18	Docx writer: Separate tables even with RawBlocks between (#7844)	Michael Hoffmann
	Adjacent docx tables need to be separated by an empty paragraph. If there's a RawBlock between tables which renders to nothing, be sure to still insert the empty paragraph so that they will not collapse together. Fixes #7724
2022-01-11	Docx writer: Handle bullets correctly in lists by not reusing numIds (#7822)	Michael Hoffmann
	Make sure that we only create one bullet per list item in docx. In particular, when a div is a list item, its contained paragraphs will now no longer wrongly get individual bullets. This is accomplished by making sure that for each list, we only use the associated numId once. Any repeated use would add incorrect bullets to the document. Closes #7689
2021-11-02	Docx reader: don't let first line indents trigger block quotes.	John MacFarlane
	This fixes a regression introduced in pandoc 2.15 by PR #7606. Closes #7655.
2021-10-29	Docx writer: add IDs to native_numbering test	Tristan Stenner

2021-10-29	Update test golden master for docx native numbering	Tristan Stenner

2021-10-18	Docx reader: fix handling of empty fields	Milan Bracke
	Some fields only have an instrText and no content, Pandoc didn't understand these, causing other fields to be misunderstood because it seemed like a field was still open when it wasn't.
2021-10-18	Docx parser: implement PAGEREF fields	Milan Bracke
	These fields, often used in tables of contents, can be a hyperlink.
2021-10-18	Docx reader: fix handling of nested fields	Milan Bracke
	Fields delimited by fldChar elements can contain other fields. Before, the nested fields would be ignored, except for the end, which would be considered the end of the parent field. To fix this issue, fields needed to be considered containing ParParts instead of Runs, since a Run can't represent complex enough structures. This also impacted Hyperlinks since they can originate from a field.
2021-10-10	Avoid blockquote when parent style has more indent	Milan Bracke
	When a paragraph has an indentation different from the parent (named) style, it used to be considered a blockquote. But this only makes sense when the paragraph has more indentation. So this commit adds a check for the indentation of the parent style.
2021-09-30	Docx reader: Add placeholder for word diagram	Ezwal

2021-09-12	Docx writer: make id used in native_numbering predictable.	John MacFarlane
	If the image has the id IMAGEID, then we use the id ref_IMAGEID for the figure number. Closes #7551. This allows one to create a filter that adds a figure number with figure name, e.g. <w:fldSimple w:instr=" REF ref_superfig "><w:r><w:t>Figure X</w:t></w:r></w:fldSimple> For this to be possible it must be possible to predict the figure number id from the image id. If images lack an id, an id of the form `ref_fig1` is used.
2021-08-27	Ensure we have unique ids for wp:docPr and pic:cNvPr elements.	John MacFarlane
	This will, I hope, fix #7527 and #7503.
2021-06-29	Docx writer: Add table numbering for captioned tables.	John MacFarlane
	The numbers are added using fields, so that Word can create a list of tables that will update automatically.
2021-06-29	Docx writer: support figure numbers.	John MacFarlane
	These are set up in such a way that they will work with Word's automatic table of figures. Closes #7392.
2021-05-28	Docx reader: Support new table features.	Emily Bourke
	* Column spans * Row spans - The spec says that if the `val` attribute is ommitted, its value should be assumed to be `continue`, and that its values are restricted to {`restart`, `continue`}. If the value has any other value, I think it seems reasonable to default it to `continue`. It might cause problems if the spec is extended in the future by adding a third possible value, in which case this would probably give incorrect behaviour, and wouldn't error. * Allow multiple header rows * Include table description in simple caption - The table description element is like alt text for a table (along with the table caption element). It seems like we should include this somewhere, but I’m not 100% sure how – I’m pairing it with the simple caption for the moment. (Should it maybe go in the block caption instead?) * Detect table captions - Check for caption paragraph style /and/ either the simple or complex table field. This means the caption detection fails for captions which don’t contain a field, as in an example doc I added as a test. However, I think it’s better to be too conservative: a missed table caption will still show up as a paragraph next to the table, whereas if I incorrectly classify something else as a table caption it could cause havoc by pairing it up with a table it’s not at all related to, or dropping it entirely. * Update tests and add new ones Partially fixes: #6316
2021-05-28	Docx reader: Read table column widths.	Emily Bourke

2021-05-15	Docx writer: copy over more settings from referenc.odcx.	John MacFarlane
	From settings.xml in the reference-doc, we now include: `zoom`, `embedSystemFonts`, `doNotTrackMoves`, `defaultTabStop`, `drawingGridHorizontalSpacing`, `drawingGridVerticalSpacing`, `displayHorizontalDrawingGridEvery`, `displayVerticalDrawingGridEvery`, `characterSpacingControl`, `savePreviewPicture`, `mathPr`, `themeFontLang`, `decimalSymbol`, `listSeparator`, `autoHyphenation`, `compat`. Closes #7240.