summaryrefslogtreecommitdiff
path: root/src/Text/Pandoc/Class
AgeCommit message (Collapse)Author
2024-02-20Class: openUrl TLS negotiation fixes.John MacFarlane
With the release of TLS 2.0.0, the TLS library started requiring Extended Main Secret for the TLS handshake. This caused problems connecting to zotero's server and others that do not support TLS 1.3. This commit relaxes this requirement. Closes #9483.
2024-02-19Minor code cleanup.John MacFarlane
2023-12-06Revert "Use base64 instead of base64-bytestring."John MacFarlane
This reverts commit 6625e9655ed2bb0c4bd4dd91b5959a103deab1cb. base64 is currently buggy on 32-bit systems. Closes #9233.
2023-11-17T.P.Class.IO.openURL improvements for data uris.John MacFarlane
- Only treat as base64 if ';base64' is present. - Otherwise treat as UTF-8 (not 100% reliable but should cover most other cases). - Strip off ';base64' (or ';charset=...' or whatever) from mime type. This last change addresses #9195 (problems with data URIs in conversion to docx).
2023-07-20Fix new variant of the vulnerability in CVE-2023-35936.John MacFarlane
Guilhem Moulin noticed that the fix to CVE-2023-35936 was incomplete. An attacker could get around it by double-encoding the malicious extension to create or override arbitrary files. $ echo '![](data://image/png;base64,cHJpbnQgImhlbGxvIgo=;.lua+%252f%252e%252e%252f%252e%252e%252fb%252elua)' >b.md $ .cabal/bin/pandoc b.md --extract-media=bar <p><img src="bar/2a0eaa89f43fada3e6c577beea4f2f8f53ab6a1d.lua+%2f%2e%2e%2f%2e%2e%2fb%2elua" /></p> $ cat b.lua print "hello" $ find bar bar/ bar/2a0eaa89f43fada3e6c577beea4f2f8f53ab6a1d.lua+ This commit adds a test case for this more complex attack and fixes the vulnerability. (The fix is quite simple: if the URL-unescaped filename or extension contains a '%', we just use the sha1 hash of the contents as the canonical name, just as we do if the filename contains '..'.)
2023-06-27Removed unused import.John MacFarlane
2023-06-27PandocMonad: Use toTextM in `readFileFromDirs`.John MacFarlane
2023-06-27Fix toTextM for Windows.John MacFarlane
We forgot to filter out CRs as we do in toText.
2023-06-27Text.Pandoc.Class: add `toTextM`.John MacFarlane
This is like `Text.Pandoc.UTF8.toText`, except: - it takes a file path as first argument, in addition to bytestring contents - it raises an informative error with source position if the contents are not UTF8-encoded [API change] This replaces `utf8ToText` in `Text.Pandoc.App.Input`. See #8884.
2023-06-23More fixes to 5e381e3.John MacFarlane
These changes recognize that parseURI does not unescape the path. Another change is that the canonical form of the path used as the MediaBag key retains percent-encoding, if present; we only unescape the string when writing to a file. See #8918. Some tests are needed before the issue can be closed.
2023-06-23Fix bug in 5e381e3878b5da87ee7542f7e51c3c1a7fd84b89John MacFarlane
In the new code a comma mysteriously turned into a period. This would have prevented proper separation of the mime type and content in data uris. Thanks to @hseg for catching this.
2023-06-20Fix a security vulnerability in MediaBag and T.P.Class.IO.writeMedia.John MacFarlane
This vulnerability, discovered by Entroy C, allows users to write arbitrary files to any location by feeding pandoc a specially crafted URL in an image element. The vulnerability is serious for anyone using pandoc to process untrusted input. The vulnerability does not affect pandoc when run with the `--sandbox` flag.
2023-06-19Add Extracting log message for `--extract-media`.John MacFarlane
This message will also be triggered when media is being extracted to a temporary location, e.g. in PDF production.
2023-01-16T.P.Class.IO: export function `writeMedia` [API change]Albert Krewinkel
This is useful for the `pandoc.mediabag` module.
2023-01-10Update copyright years, it's 2023!Albert Krewinkel
2022-10-31Fix import.John MacFarlane
2022-10-31Add explicit imports to fix compiler warnings.John MacFarlane
2022-10-31First stab at mtl 2.3 compliance.John MacFarlane
This will no doubt produce a bunch of warnings and hence CI failures, which we'll need to work around with explicit imports.
2022-10-19T.P.Class: make `getPOSIXTime`, `getZonedTime` sensitive to...John MacFarlane
`SOURCE_DATE_EPOCH` environment variable if set. (`getTimestamp` was already sensitive.) This ensures that EPUB builds are reproducible. Closes #7093.
2022-10-03Rename T.P.Network.HTTP -> T.P.URI.John MacFarlane
This is still an unexported internal module. Export `urlEncode`, `escapeURI`, `isURI`, `schemes`, `uriPathToPath`. Re-export `escapeURI` and `isURI` from T.P.Shared (as they were exported before); drop exports of `schemes` and `uriPathToPath` [API change]. With this change, T.P.Class no longer depends on T.P.Shared.
2022-10-03Separate out T.P.Data, T.P.Translations from T.P.Class. (#8348)John MacFarlane
This makes T.P.Class more self-contained, and suitable for extraction into a separate package if desired. [API changes] - T.P.Data is now an exported module, providing `readDataFile`, `readDefaultDataFile` (both formerly provided by T.P.Class), and also `getDataFileNames` (formerly unexported in T.P.App.CommandLineOptions). - T.P.Translations is now an exported module (along with T.P.Translations.Types), providing `readTranslations`, `getTranslations`, `setTranslations`, `translateTerm`, `lookupTerm`, `readTranslations`, `Term(..)`, and `Translations`. - T.P.Class: `readDataFile`, `readDefaultDataFile`, `setTranslations`, and `translateTerm` are no longer exported. `checkUserDataDir` is now exported. - Text.Pandoc now exports Text.Pandoc.Data and `setTranslations` and `translateTerm`.
2022-09-27Fix small whitespace things.John MacFarlane
2022-08-21Fix regression with data uris in 2.19.1.John MacFarlane
In 2.19.1 we used the base64URL encoding rather than base64. This works in Safari, apparently, but not in other browsers. Closes #8239.
2022-08-14Use base64 instead of base64-bytestring.John MacFarlane
It is supposed to be faster and more standards-compliant.
2022-08-02fillMediaBag: Keep attributes of original image on SpanAlbert Krewinkel
Images that cannot be fetched are replaced with a Span that contains the image's description. The span now also retains all original image attributes and inherits all attributes of the image. Furthermore, the classes `image` and `placeholder` are added, and path and title are store in attributes `original-image-src` and `original-image-title`, respectively. Closes: #8099
2022-06-10Allow placing custom readers and writers in data subdir (#8112)Albert Krewinkel
* PandocMonad: add new function `findFileWithDataFallback` [API Change] * Custom readers: allow files to be placed in "readers" data dir * Custom writers: allow files to be placed in "writers" data dir
2022-02-04MediaBag: improve detection of absolute paths.John MacFarlane
Previously we used System.FilePath's isRelative to determine when paths are relative (since absolute paths need to get a new name based on the sha1 hash). But this has an OS-specific behavior and actually returns True on Windows for paths like `/media/file.png`. This ought to fix #7881.
2022-02-04Revert "T.P.Class.IO.adjustImagePath: avoid double slash."John MacFarlane
This reverts commit 3dcb526b9b084976bfb5ef2f02a6bf009fd78750.
2022-02-04T.P.Class.IO.adjustImagePath: avoid double slash.John MacFarlane
PReviously if the directory argument ended in slash, we'd get a doubled slash in the path. This may help with #7881.
2022-01-28Don't read files outside of user data directoryEven Brenden
If a file path does not exist relative to the working directory, but it does exist relative to the user data directory, and it exists outside of the user data directory, do not read it. This applies to readDataFile and readMetadataFile in PandocMonad and, by extension, any module that uses these by passing them relative paths.
2022-01-28Handle consecutive ".."s in makeCanonicalEven Brenden
As an example, prior to this commit, "../../file" would evaluate to "file", when it should be unchanged.
2022-01-21Search for metadata files in $DATADIR/metadata (#7851)Even Brenden
If files specified with `--metadata-file` are not found in the working directory, look in `$DATADIR/metadata`. Expose new `readMetadataFile` function from Text.Pandoc.Class [API change]. Expose new `PandocCouldNotFindMetadataFileError` constructor for `PandocError` from Text.Pandoc.Error [API change]. Closes #5876.
2022-01-08writeMedia: unescape percent-encoding in creating file path.John MacFarlane
Closes #7819 (problem with spaces in image filenames when creating PDFs).
2022-01-02Copyright notices: update for 2022Albert Krewinkel
2021-11-02Docx writer: use getTimestamp for modification times in reference.docx.John MacFarlane
This ensures that when `SOURCE_DATE_EPOCH` is set, the modification times of files taken from the reference.docx will be set deterministically, allowing for reproducible builds. Closes #7654.
2021-08-28Add `--sandbox` option.John MacFarlane
+ Add sandbox feature for readers. When this option is used, readers and writers only have access to input files (and other files specified directly on command line). This restriction is enforced in the type system. + Filters, PDF production, custom writers are unaffected. This feature only insulates the actual readers and writers, not the pipeline around them in Text.Pandoc.App. + Note that when `--sandboxed` is specified, readers won't have access to the resource path, nor will anything have access to the user data directory. + Add module Text.Pandoc.Class.Sandbox, defining `sandbox`. Exported via Text.Pandoc.Class. [API change] Closes #5045.
2021-08-24Text.Pandoc.Class: add readStdinStrict method to PandocMonad.John MacFarlane
[API change]
2021-08-24Class: Generalize type of extractMedia.John MacFarlane
It was uselessly restricted to PandocIO, instead of any instance of PandocMonad and MonadIO. [API change]
2021-08-24Text.Pandoc.Filter: Generalize type of applyFilters...John MacFarlane
from PandocIO to any instance of MonadIO and PandocMonad. [API change]
2021-08-22PandocIO: derive MonadCatch, MonadThrow, MonadMask.John MacFarlane
This will allow us to use withTempDir.
2021-07-09Always use / when adding directory to image path with extractMedia.John MacFarlane
Even on Windows. May help with #7431.
2021-06-10Fix MediaBag regressions.John MacFarlane
With the 2.14 release `--extract-media` stopped working as before; there could be mismatches between the paths in the rendered document and the extracted media. This patch makes several changes (while keeping the same API). The `mediaPath` in 2.14 was always constructed from the SHA1 hash of the media contents. Now, we preserve the original path unless it's an absolute path or contains `..` segments (in that case we use a path based on the SHA1 hash of the contents). When constructing a path from the SHA1 hash, we always use the original extension, if there is one. Otherwise we look up an appropriate extension for the mime type. `mediaDirectory` and `mediaItems` now use the `mediaPath`, rather than the mediabag key, for the first component of the tuple. This makes more sense, I think, and fits with the documentation of these functions; eventually, though, we should rework the API so that `mediaItems` returns both the keys and the MediaItems. Rewriting of source paths in `extractMedia` has been fixed. `fillMediaBag` has been modified so that it doesn't modify image paths (that was part of the problem in #7345). We now do path normalization (e.g. `\` separators on Windows) only in writing the media; the paths are left unchanged in the image links (sensibly, since they might be URLs and not file paths). These changes should restore the original behavior from before 2.14. Closes #7345.
2021-06-03T.P.Class.IO: normalise path in writeMedia.John MacFarlane
This ensures that we get `\` separators on Windows.
2021-05-30Have LoadedResource use relative paths.John MacFarlane
The immediate reason for this is to allow the test output of #3752 to work on both windows and linux.
2021-05-25PandocMonad: add info message in `downloadOrRead`...John MacFarlane
indicating what path local resources have been loaded from.
2021-05-24MediaBag improvements.John MacFarlane
In the current dev version, we will sometimes add a version of an image with a hashed name, keeping the original version with the original name, which would leave to undesirable duplication. This change separates the media's filename from the media's canonical name (which is the path of the link in the document itself). Filenames are based on SHA1 hashes and assigned automatically. In Text.Pandoc.MediaBag: - Export MediaItem type [API change]. - Change MediaBag type to a map from Text to MediaItem [API change]. - `lookupMedia` now returns a `MediaItem` [API change]. - Change `insertMedia` so it sets the `mediaPath` to a filename based on the SHA1 hash of the contents. This will be used when contents are extracted. In Text.Pandoc.Class.PandocMonad: - Remove `fetchMediaResource` [API change]. Lua MediaBag module has been changed minimally. In the future it would be better, probably, to give Lua access to the full MediaItem type.
2021-05-19Remove unused pragma.John MacFarlane
2021-05-18Use fetchItem instead of downloadOrRead in fetchMediaResource.John MacFarlane
2021-05-18Text.Pandoc.MediaBag: change type to use a Text key...John MacFarlane
instead of `[FilePath]`. We normalize the path and use `/` separators for consistency.
2021-04-17Update to released unicode-collation, latest citeproc dev version.John MacFarlane
Update citeproc test.