pandoc - MSM's mirror of Pandoc

Age	Commit message (Collapse)	Author
2024-02-28	Docx writer: don't copy over footnotePr in settings.xml...	John MacFarlane
	rom reference.docx. Closes #9522.
2023-12-18	fix(docx): fix OOXMLValidator error on KeywordTok output	Edwin Török
	xmllint doesn't warn about this (maybe because the tag is empty?), but the order doesn't match wml.xsd: ``` <w:rPr> <w:color w:val="007020"/> <w:b/> </w:rPr> ``` And OOXMLValidatorCLI does warn about it: ``` { "Description": "The element has unexpected child element 'http://schemas.openxmlformats.org/wordprocessingml/2006/main:b'.", "Path": { "NamespacesDefinitions": [ "xmlns:w=\"http://schemas.openxmlformats.org/wordprocessingml/2006/main\"" ], "Namespaces": { }, "XPath": "/w:styles[1]/w:style[40]/w:rPr[1]", "PartUri": "/word/styles.xml" }, "Id": "Sch_UnexpectedElementContentExpectingComplex", "ErrorType": "Schema" } ``` Signed-off-by: Edwin Török <edwin@etorok.net>
2023-12-18	fix(docx): fix validation error in w:nsid	Edwin Török
	The length here seems to refer to length in bytes (so twice as long in hex): ``` ./tmp/numbering-pretty.xml:4: element nsid: Schemas validity error : Element '{http://schemas.openxmlformats.org/wordprocessingml/2006/main}nsid', attribute '{http://schemas.openxmlformats.org/wordprocessingml/2006/main}val': [facet 'length'] The value 'A990' has a length of '2'; this differs from the allowed length of '4'. ``` [This](https://learn.microsoft.com/en-us/dotnet/api/documentformat.openxml.wordprocessing.nsid?view=openxml-2.8.1) also documents the longer values. Signed-off-by: Edwin Török <edwin@etorok.net>
2023-12-17	Docx writer: ensure that elements in settings are ordered correctly.	John MacFarlane
	The elements must occur in a specific order. This was being messed up when integrating a custom reference.docx. Closes #9264.
2023-12-17	test/docx/golden: regenerate	Edwin Török
	Using `make test TESTARGS=--accept` Signed-off-by: Edwin Török <edwin@etorok.net>
2023-12-08	Docx writer: Use different style for block quotes in notes.	John MacFarlane
	Using "Footnote Block Text" for the style name, so it can be given a different font size if footnotes are. Closes #9243.
2023-12-08	Docx writer: allow embedded fonts to be used in reference.docx.	John MacFarlane
	Closes #6728.
2023-03-17	Update docx golden tests for style changes.	John MacFarlane

2023-01-13	Support complex figures. [API change]	Albert Krewinkel
	Thanks and credit go to Aner Lucero, who laid the groundwork for this feature in the 2021 GSoC project. He contributed many changes, including modifications to the readers for HTML, JATS, and LaTeX, and to the HTML and JATS writers. Shared (Albert Krewinkel): - The new function `figureDiv`, exported from `Text.Pandoc.Shared`, offers a standardized way to convert a figure into a Div element. Readers (Aner Lucero): - HTML reader: `<figure>` elements are parsed as figures, with the caption taken from the respective `<figcaption>` elements. - JATS reader: The `<fig>` and `<caption>` elements are parsed into figure elements, even if the contents is more complex. - LaTeX reader: support for figures with non-image contents and for subfigures. - Markdown reader: paragraphs containing just an image are treated as figures if the `implicit_figures` extension is enabled. The identifier is used as the figure's identifier and the image description is also used as figure caption; all other attributes are treated as belonging to the image. Writers (Aner Lucero, Albert Krewinkel): - DokuWiki, Haddock, Jira, Man, MediaWiki, Ms, Muse, PPTX, RTF, TEI, ZimWiki writers: Figures are rendered like Div elements. - Asciidoc writer: The figure contents is unwrapped; each image in the the figure becomes a separate figure. - Classic custom writers: Figures are passed to the global function `Figure(caption, contents, attr)`, where `caption` and `contents` are strings and `attr` is a table of key-value pairs. - ConTeXt writer: Figures are wrapped in a "placefigure" environment with `\startplacefigure`/`\endplacefigure`, adding the features caption and listing title as properties. Subfigures are place in a single row with the `\startfloatcombination` environment. - DocBook writer: Uses `mediaobject` elements, unless the figure contains subfigures or tables, in which case the figure content is unwrapped. - Docx writer: figures with multiple content blocks are rendered as tables with style `FigureTable`; like before, single-image figures are still output as paragraphs with style `Figure` or `Captioned Figure`, depending on whether a caption is attached. - DokuWiki writer: Caption and "alt-text" are no longer combined. The alt text of a figure will now be lost in the conversion. - FB2 writer: The figure caption is added as alt text to the images in the figure; pre-existing alt texts are kept. - ICML writer: Only single-image figures are supported. The contents of figures with additional elements gets unwrapped. - HTML writer: the alt text is no longer constructed from the caption, as was the case with implicit figures. This reduces duplication, but comes at the risk of images that are missing alt texts. Authors should take care to provide alt texts for all images. Some readers, most notably the Markdown reader with the `implicit_figures` extension, add a caption that's identical to the image description. The writer checks for this and adds an `aria-hidden` attribute to the `<figcaption>` element in that case. - JATS writer: The `<fig>` and `<caption>` elements are used write figures. - LaTeX writer: complex figures, e.g. with non-image contents and subfigures, are supported. The `subfigure` template variable is set if the document contains subfigures, triggering the conditional loading of the subcaption package. Contants of figures that contain tables are become unwrapped, as longtable environments are not allowed within figures. - Markdown writer: figures are output as implicit figures if possible, via HTML if the `raw_html` extension is enabled, and as Div elements otherwise. - OpenDocument writer: A separate paragraph is generated for each block element in a figure, each with style `FigureWithCaption`. Behavior for single-image figures therefore remains unchanged. - Org writer: Only the first element in a figure is given a caption; additional block elements in the figure are appended without any caption being added. - RST writer: Single-image figures are supported as before; the contents of more complex images become nested in a container of type `float`. - Texinfo writer: Figures are rendered as float with type `figure`. - Textile writer: Figures are rendered with the help of HTML elements. - XWiki: Figures are placed in a group. Co-authored-by: Aner Lucero <4rgento@gmail.com>
2021-10-29	Docx writer: add IDs to native_numbering test	Tristan Stenner

2021-10-29	Update test golden master for docx native numbering	Tristan Stenner

2021-09-12	Docx writer: make id used in native_numbering predictable.	John MacFarlane
	If the image has the id IMAGEID, then we use the id ref_IMAGEID for the figure number. Closes #7551. This allows one to create a filter that adds a figure number with figure name, e.g. <w:fldSimple w:instr=" REF ref_superfig "><w:r><w:t>Figure X</w:t></w:r></w:fldSimple> For this to be possible it must be possible to predict the figure number id from the image id. If images lack an id, an id of the form `ref_fig1` is used.
2021-08-27	Ensure we have unique ids for wp:docPr and pic:cNvPr elements.	John MacFarlane
	This will, I hope, fix #7527 and #7503.
2021-06-29	Docx writer: Add table numbering for captioned tables.	John MacFarlane
	The numbers are added using fields, so that Word can create a list of tables that will update automatically.
2021-06-29	Docx writer: support figure numbers.	John MacFarlane
	These are set up in such a way that they will work with Word's automatic table of figures. Closes #7392.
2021-05-15	Docx writer: copy over more settings from referenc.odcx.	John MacFarlane
	From settings.xml in the reference-doc, we now include: `zoom`, `embedSystemFonts`, `doNotTrackMoves`, `defaultTabStop`, `drawingGridHorizontalSpacing`, `drawingGridVerticalSpacing`, `displayHorizontalDrawingGridEvery`, `displayVerticalDrawingGridEvery`, `characterSpacingControl`, `savePreviewPicture`, `mathPr`, `themeFontLang`, `decimalSymbol`, `listSeparator`, `autoHyphenation`, `compat`. Closes #7240.
2021-05-15	docx writer: Remove rsids from settings.docx.	John MacFarlane
	Word will add these when revisions are made. But it's pointless to start out with a set of them.
2021-05-11	Improve integration of settings from reference.docx.	John MacFarlane
	The settings we can carry over from a reference.docx are autoHyphenation, consecutiveHyphenLimit, hyphenationZone, doNotHyphenateCap, evenAndOddHeaders, and proofState. Previously this was implemented in a buggy way, so that the reference doc's values AND the new values were included. This change allows users to create a reference.docx that sets w:proofState for spelling or grammar to "dirty," so that spell/grammar checking will be triggered on the generated docx. Closes #1209.
2021-03-17	Docx writer: make nsid in abstractNum deterministic.	John MacFarlane
	Previously we assigned a random number (though in a deterministic way). But changes in the random package mean we get different results now on different architectures, even with the same random seed. We don't need random values; so now we just assign a value based on the list number id, which is guaranteed to be unique to the list marker.
2021-02-16	Rename Text.Pandoc.XMLParser -> Text.Pandoc.XML.Light...	John MacFarlane
	..and add new definitions isomorphic to xml-light's, but with Text instead of String. This allows us to keep most of the code in existing readers that use xml-light, but avoid lots of unnecessary allocation. We also add versions of the functions from xml-light's Text.XML.Light.Output and Text.XML.Light.Proc that operate on our modified XML types, and functions that convert xml-light types to our types (since some of our dependencies, like texmath, use xml-light). Update golden tests for docx and pptx. OOXML test: Use `showContent` instead of `ppContent` in `displayDiff`. Docx: Do a manual traversal to unwrap sdt and smartTag. This is faster, and needed to pass the tests. Benchmarks: A = prior to 8ca191604dcd13af27c11d2da225da646ebce6fc (Feb 8) B = as of 8ca191604dcd13af27c11d2da225da646ebce6fc (Feb 8) C = this commit \| Reader \| A \| B \| C \| \| ------- \| ----- \| ------ \| ----- \| \| docbook \| 18 ms \| 12 ms \| 10 ms \| \| opml \| 65 ms \| 62 ms \| 35 ms \| \| jats \| 15 ms \| 11 ms \| 9 ms \| \| docx \| 72 ms \| 69 ms \| 44 ms \| \| odt \| 78 ms \| 41 ms \| 28 ms \| \| epub \| 64 ms \| 61 ms \| 56 ms \| \| fb2 \| 14 ms \| 5 ms \| 4 ms \|
2021-02-10	Add new unexported module T.P.XMLParser.	John MacFarlane
	This exports functions that uses xml-conduit's parser to produce an xml-light Element or [Content]. This allows existing pandoc code to use a better parser without much modification. The new parser is used in all places where xml-light's parser was previously used. Benchmarks show a significant performance improvement in parsing XML-based formats (especially ODT and FB2). Note that the xml-light types use String, so the conversion from xml-conduit types involves a lot of extra allocation. It would be desirable to avoid that in the future by gradually switching to using xml-conduit directly. This can be done module by module. The new parser also reports errors, which we report when possible. A new constructor PandocXMLError has been added to PandocError in T.P.Error [API change]. Closes #7091, which was the main stimulus. These changes revealed the need for some changes in the tests. The docbook-reader.docbook test lacked definitions for the entities it used; these have been added. And the docx golden tests have been updated, because the new parser does not preserve the order of attributes. Add entity defs to docbook-reader.docbook. Update golden tests for docx.
2021-01-12	Docx writer: handle table header using styles.	John MacFarlane
	Instead of hard-coding the border and header cell vertical alignment, we now let this be determined by the Table style, making use of Word's "conditional formatting" for the table's first row. For headerless tables, we use the tblLook element to tell Word not to apply conditional first-row formatting. Closes #7008.
2020-11-26	Docx writer: Fix bullets/lists indentation	cholonam
	Fix appearance of bullets/numbered lists (the first level is slightly indented to the right instead of right on the margin). New golden files have been tested using Word 2010 on Windows 10.
2020-07-22	Docx writer: support --number-sections.	John MacFarlane
	Closes #1413.
2020-05-16	Docx writer: enable column and row bands for tables.	John MacFarlane
	This change will not have any effect with the default style. However, it enables users to use a style (via a reference.docx) that turns on row and/or column bands. Closes #6371.
2019-11-16	Change styles in reference.docx.	John MacFarlane
	All headings now have a uniform color. Level-1 headings no longer set `w:themeShade="B5"`. Level-2 headings are now 14 point rather than 16 point. Level-3 headings are now 12 point rather than 14 point. Level-4 headings are italic rather than bold. Closes #5820.
2019-11-14	Change reference.docx to use more normal block quotes.	John MacFarlane
	Indented left and right, same font and size. Previously it was unindented, smaller font and different typeface. See #5820.
2019-03-11	docx writer: avoid extra copy of abstractNum and num elements...	John MacFarlane
	...in numbering.xml. This caused pandoc-produced docx files to be uneditable using Word Online. The problem was that recent versions of reference.docx include samples of various kinds of text, including lists. The numering elements for these were getting copied over to the new docx, where they clashed with the autogenerated elements produced by pandoc. This didn't confuse Desktop Word, but it did confuse Word Online. Closes #5358.
2018-10-09	Docx writer: added framework for custom properties.	John MacFarlane
	So far, we don't actually write any custom properties, but we have the infrastructure to add this. See #3034.
2018-02-23	Docx test: adjust test for fix of bug	laptop1\Andrew
	This commit adjusts the test cases for the Docx writer after the fix of #3930. - Adjusted test cases with inline images. The inline images now have the correct sizing, title and description. - Modified the test case to include an image multiple times with different sizing each time. - Tested on Windows 8.1 with Word 2007 (12.0.6705.5000) The files are not corrupted and display exactly what is expected.
2018-01-27	Docx writer tests: Use new golden framework	Jesse Rosenthal
	These are based off the reader tests, with some removed (where the reader output was identical, based on different docx inputs). There are still more to be added. In particular, tests for custom-styles need to be added. All golden docx files have been checked in MS Word 2013 (windows). There is no corruption. There is questionable output in the `tables` test: the three tables seemed to be joined. This will be addressed in a future commit, and the golden docx file will be changed.