pandoc - MSM's mirror of Pandoc

Age	Commit message (Collapse)	Author
2024-02-20	Class: openUrl TLS negotiation fixes.	John MacFarlane
	With the release of TLS 2.0.0, the TLS library started requiring Extended Main Secret for the TLS handshake. This caused problems connecting to zotero's server and others that do not support TLS 1.3. This commit relaxes this requirement. Closes #9483.
2024-02-19	Minor code cleanup.	John MacFarlane

2023-12-06	Revert "Use base64 instead of base64-bytestring."	John MacFarlane
	This reverts commit 6625e9655ed2bb0c4bd4dd91b5959a103deab1cb. base64 is currently buggy on 32-bit systems. Closes #9233.
2023-11-17	T.P.Class.IO.openURL improvements for data uris.	John MacFarlane
	- Only treat as base64 if ';base64' is present. - Otherwise treat as UTF-8 (not 100% reliable but should cover most other cases). - Strip off ';base64' (or ';charset=...' or whatever) from mime type. This last change addresses #9195 (problems with data URIs in conversion to docx).
2023-07-20	Fix new variant of the vulnerability in CVE-2023-35936.	John MacFarlane
	Guilhem Moulin noticed that the fix to CVE-2023-35936 was incomplete. An attacker could get around it by double-encoding the malicious extension to create or override arbitrary files. $ echo '![](data://image/png;base64,cHJpbnQgImhlbGxvIgo=;.lua+%252f%252e%252e%252f%252e%252e%252fb%252elua)' >b.md $ .cabal/bin/pandoc b.md --extract-media=bar <p><img src="bar/2a0eaa89f43fada3e6c577beea4f2f8f53ab6a1d.lua+%2f%2e%2e%2f%2e%2e%2fb%2elua" /></p> $ cat b.lua print "hello" $ find bar bar/ bar/2a0eaa89f43fada3e6c577beea4f2f8f53ab6a1d.lua+ This commit adds a test case for this more complex attack and fixes the vulnerability. (The fix is quite simple: if the URL-unescaped filename or extension contains a '%', we just use the sha1 hash of the contents as the canonical name, just as we do if the filename contains '..'.)
2023-06-27	Removed unused import.	John MacFarlane

2023-06-27	PandocMonad: Use toTextM in `readFileFromDirs`.	John MacFarlane

2023-06-27	Fix toTextM for Windows.	John MacFarlane
	We forgot to filter out CRs as we do in toText.
2023-06-27	Text.Pandoc.Class: add `toTextM`.	John MacFarlane
	This is like `Text.Pandoc.UTF8.toText`, except: - it takes a file path as first argument, in addition to bytestring contents - it raises an informative error with source position if the contents are not UTF8-encoded [API change] This replaces `utf8ToText` in `Text.Pandoc.App.Input`. See #8884.
2023-06-23	More fixes to 5e381e3.	John MacFarlane
	These changes recognize that parseURI does not unescape the path. Another change is that the canonical form of the path used as the MediaBag key retains percent-encoding, if present; we only unescape the string when writing to a file. See #8918. Some tests are needed before the issue can be closed.
2023-06-23	Fix bug in 5e381e3878b5da87ee7542f7e51c3c1a7fd84b89	John MacFarlane
	In the new code a comma mysteriously turned into a period. This would have prevented proper separation of the mime type and content in data uris. Thanks to @hseg for catching this.
2023-06-20	Fix a security vulnerability in MediaBag and T.P.Class.IO.writeMedia.	John MacFarlane
	This vulnerability, discovered by Entroy C, allows users to write arbitrary files to any location by feeding pandoc a specially crafted URL in an image element. The vulnerability is serious for anyone using pandoc to process untrusted input. The vulnerability does not affect pandoc when run with the `--sandbox` flag.
2023-06-19	Add Extracting log message for `--extract-media`.	John MacFarlane
	This message will also be triggered when media is being extracted to a temporary location, e.g. in PDF production.
2023-01-16	T.P.Class.IO: export function `writeMedia` [API change]	Albert Krewinkel
	This is useful for the `pandoc.mediabag` module.
2023-01-10	Update copyright years, it's 2023!	Albert Krewinkel

2022-10-31	Fix import.	John MacFarlane

2022-10-31	Add explicit imports to fix compiler warnings.	John MacFarlane

2022-10-31	First stab at mtl 2.3 compliance.	John MacFarlane
	This will no doubt produce a bunch of warnings and hence CI failures, which we'll need to work around with explicit imports.
2022-10-19	T.P.Class: make `getPOSIXTime`, `getZonedTime` sensitive to...	John MacFarlane
	`SOURCE_DATE_EPOCH` environment variable if set. (`getTimestamp` was already sensitive.) This ensures that EPUB builds are reproducible. Closes #7093.
2022-10-03	Rename T.P.Network.HTTP -> T.P.URI.	John MacFarlane
	This is still an unexported internal module. Export `urlEncode`, `escapeURI`, `isURI`, `schemes`, `uriPathToPath`. Re-export `escapeURI` and `isURI` from T.P.Shared (as they were exported before); drop exports of `schemes` and `uriPathToPath` [API change]. With this change, T.P.Class no longer depends on T.P.Shared.
2022-10-03	Separate out T.P.Data, T.P.Translations from T.P.Class. (#8348)	John MacFarlane
	This makes T.P.Class more self-contained, and suitable for extraction into a separate package if desired. [API changes] - T.P.Data is now an exported module, providing `readDataFile`, `readDefaultDataFile` (both formerly provided by T.P.Class), and also `getDataFileNames` (formerly unexported in T.P.App.CommandLineOptions). - T.P.Translations is now an exported module (along with T.P.Translations.Types), providing `readTranslations`, `getTranslations`, `setTranslations`, `translateTerm`, `lookupTerm`, `readTranslations`, `Term(..)`, and `Translations`. - T.P.Class: `readDataFile`, `readDefaultDataFile`, `setTranslations`, and `translateTerm` are no longer exported. `checkUserDataDir` is now exported. - Text.Pandoc now exports Text.Pandoc.Data and `setTranslations` and `translateTerm`.
2022-09-27	Fix small whitespace things.	John MacFarlane

2022-08-21	Fix regression with data uris in 2.19.1.	John MacFarlane
	In 2.19.1 we used the base64URL encoding rather than base64. This works in Safari, apparently, but not in other browsers. Closes #8239.
2022-08-14	Use base64 instead of base64-bytestring.	John MacFarlane
	It is supposed to be faster and more standards-compliant.
2022-08-02	fillMediaBag: Keep attributes of original image on Span	Albert Krewinkel
	Images that cannot be fetched are replaced with a Span that contains the image's description. The span now also retains all original image attributes and inherits all attributes of the image. Furthermore, the classes `image` and `placeholder` are added, and path and title are store in attributes `original-image-src` and `original-image-title`, respectively. Closes: #8099
2022-06-10	Allow placing custom readers and writers in data subdir (#8112)	Albert Krewinkel
	* PandocMonad: add new function `findFileWithDataFallback` [API Change] * Custom readers: allow files to be placed in "readers" data dir * Custom writers: allow files to be placed in "writers" data dir
2022-02-04	MediaBag: improve detection of absolute paths.	John MacFarlane
	Previously we used System.FilePath's isRelative to determine when paths are relative (since absolute paths need to get a new name based on the sha1 hash). But this has an OS-specific behavior and actually returns True on Windows for paths like `/media/file.png`. This ought to fix #7881.
2022-02-04	Revert "T.P.Class.IO.adjustImagePath: avoid double slash."	John MacFarlane
	This reverts commit 3dcb526b9b084976bfb5ef2f02a6bf009fd78750.
2022-02-04	T.P.Class.IO.adjustImagePath: avoid double slash.	John MacFarlane
	PReviously if the directory argument ended in slash, we'd get a doubled slash in the path. This may help with #7881.
2022-01-28	Don't read files outside of user data directory	Even Brenden
	If a file path does not exist relative to the working directory, but it does exist relative to the user data directory, and it exists outside of the user data directory, do not read it. This applies to readDataFile and readMetadataFile in PandocMonad and, by extension, any module that uses these by passing them relative paths.
2022-01-28	Handle consecutive ".."s in makeCanonical	Even Brenden
	As an example, prior to this commit, "../../file" would evaluate to "file", when it should be unchanged.
2022-01-21	Search for metadata files in $DATADIR/metadata (#7851)	Even Brenden
	If files specified with `--metadata-file` are not found in the working directory, look in `$DATADIR/metadata`. Expose new `readMetadataFile` function from Text.Pandoc.Class [API change]. Expose new `PandocCouldNotFindMetadataFileError` constructor for `PandocError` from Text.Pandoc.Error [API change]. Closes #5876.
2022-01-08	writeMedia: unescape percent-encoding in creating file path.	John MacFarlane
	Closes #7819 (problem with spaces in image filenames when creating PDFs).
2022-01-02	Copyright notices: update for 2022	Albert Krewinkel

2021-11-02	Docx writer: use getTimestamp for modification times in reference.docx.	John MacFarlane
	This ensures that when `SOURCE_DATE_EPOCH` is set, the modification times of files taken from the reference.docx will be set deterministically, allowing for reproducible builds. Closes #7654.
2021-08-28	Add `--sandbox` option.	John MacFarlane
	+ Add sandbox feature for readers. When this option is used, readers and writers only have access to input files (and other files specified directly on command line). This restriction is enforced in the type system. + Filters, PDF production, custom writers are unaffected. This feature only insulates the actual readers and writers, not the pipeline around them in Text.Pandoc.App. + Note that when `--sandboxed` is specified, readers won't have access to the resource path, nor will anything have access to the user data directory. + Add module Text.Pandoc.Class.Sandbox, defining `sandbox`. Exported via Text.Pandoc.Class. [API change] Closes #5045.
2021-08-24	Text.Pandoc.Class: add readStdinStrict method to PandocMonad.	John MacFarlane
	[API change]
2021-08-24	Class: Generalize type of extractMedia.	John MacFarlane
	It was uselessly restricted to PandocIO, instead of any instance of PandocMonad and MonadIO. [API change]
2021-08-24	Text.Pandoc.Filter: Generalize type of applyFilters...	John MacFarlane
	from PandocIO to any instance of MonadIO and PandocMonad. [API change]
2021-08-22	PandocIO: derive MonadCatch, MonadThrow, MonadMask.	John MacFarlane
	This will allow us to use withTempDir.
2021-07-09	Always use / when adding directory to image path with extractMedia.	John MacFarlane
	Even on Windows. May help with #7431.
2021-06-10	Fix MediaBag regressions.	John MacFarlane
	With the 2.14 release `--extract-media` stopped working as before; there could be mismatches between the paths in the rendered document and the extracted media. This patch makes several changes (while keeping the same API). The `mediaPath` in 2.14 was always constructed from the SHA1 hash of the media contents. Now, we preserve the original path unless it's an absolute path or contains `..` segments (in that case we use a path based on the SHA1 hash of the contents). When constructing a path from the SHA1 hash, we always use the original extension, if there is one. Otherwise we look up an appropriate extension for the mime type. `mediaDirectory` and `mediaItems` now use the `mediaPath`, rather than the mediabag key, for the first component of the tuple. This makes more sense, I think, and fits with the documentation of these functions; eventually, though, we should rework the API so that `mediaItems` returns both the keys and the MediaItems. Rewriting of source paths in `extractMedia` has been fixed. `fillMediaBag` has been modified so that it doesn't modify image paths (that was part of the problem in #7345). We now do path normalization (e.g. `\` separators on Windows) only in writing the media; the paths are left unchanged in the image links (sensibly, since they might be URLs and not file paths). These changes should restore the original behavior from before 2.14. Closes #7345.
2021-06-03	T.P.Class.IO: normalise path in writeMedia.	John MacFarlane
	This ensures that we get `\` separators on Windows.
2021-05-30	Have LoadedResource use relative paths.	John MacFarlane
	The immediate reason for this is to allow the test output of #3752 to work on both windows and linux.
2021-05-25	PandocMonad: add info message in `downloadOrRead`...	John MacFarlane
	indicating what path local resources have been loaded from.
2021-05-24	MediaBag improvements.	John MacFarlane
	In the current dev version, we will sometimes add a version of an image with a hashed name, keeping the original version with the original name, which would leave to undesirable duplication. This change separates the media's filename from the media's canonical name (which is the path of the link in the document itself). Filenames are based on SHA1 hashes and assigned automatically. In Text.Pandoc.MediaBag: - Export MediaItem type [API change]. - Change MediaBag type to a map from Text to MediaItem [API change]. - `lookupMedia` now returns a `MediaItem` [API change]. - Change `insertMedia` so it sets the `mediaPath` to a filename based on the SHA1 hash of the contents. This will be used when contents are extracted. In Text.Pandoc.Class.PandocMonad: - Remove `fetchMediaResource` [API change]. Lua MediaBag module has been changed minimally. In the future it would be better, probably, to give Lua access to the full MediaItem type.
2021-05-19	Remove unused pragma.	John MacFarlane

2021-05-18	Use fetchItem instead of downloadOrRead in fetchMediaResource.	John MacFarlane

2021-05-18	Text.Pandoc.MediaBag: change type to use a Text key...	John MacFarlane
	instead of `[FilePath]`. We normalize the path and use `/` separators for consistency.
2021-04-17	Update to released unicode-collation, latest citeproc dev version.	John MacFarlane
	Update citeproc test.