summaryrefslogtreecommitdiff
path: root/src/Text/Pandoc/Parsing.hs
AgeCommit message (Collapse)Author
2023-01-10Update copyright years, it's 2023!Albert Krewinkel
2022-10-20Text.Pandoc.Parsing: remove `nested` [API change].John MacFarlane
It was not being used, and in fact it was a bad idea from the beginning, as it had no hope of solving the problem it was introduced to solve.
2022-10-18Revert "T.P.Parsing: export `registerIdentifier`."John MacFarlane
This reverts commit 20492d523c8324e36781cfbbc8092c796f94b151.
2022-10-18T.P.Parsing: export `registerIdentifier`.John MacFarlane
[API change] Use this in the HTML reader to register identifiers to avoid duplicates created by `auto_identifiers`.
2022-10-17T.P.Error: Remove PandocParsecError constructor from PandocError. (#8385)John MacFarlane
Henceforth we just use `PandocParseError`. T.P.Parsing now exports `fromParsecError`, which can be used to turn a parsec ParseError into a regular PandocParseError (the appearance to the user should be unchanged in every case). [API change] Closes #8382.
2022-10-16T.P.Parsing: export errorMessages, messageString.John MacFarlane
[API change]
2022-10-16T.P.Parsing: Remove gratuitious renaming of Parsec types.John MacFarlane
We were exporting Parser, ParserT as synonyms of Parsec, ParsecT. There is no good reason for this and it can cause confusion. Also, when possible, we replace imports of Text.Parsec with T.P.Parsing. The idea is to make it easier, at some point, to switch to megaparsec or another parsing engine if we want to. T.P.Parsing new exports: Stream(..), updatePosString, SourceName, Parsec, ParsecT [API change]. Removed exports: Parser, ParserT [API change].
2022-03-25Rename T.P.Parsing.Combinators -> T.P.Parsing.General.John MacFarlane
Because many of the exported things aren't combinators... Also remove redundant explot of indentWith from T.P.Parsing.Lists.
2022-03-25T.P.Parsing: use explicit imports.John MacFarlane
2022-03-24[API change] Unify grid table parsing (#7971)Albert Krewinkel
Grid table parsing in Markdown and rst are updated use the same functions. Functions are generalized to meet requirements for both formats. This change also lays the ground for further generalizations in table parsers, including support for advanced table features. API changes in Text.Pandoc.Parsing: - Parse results of functions `tableWith'` and `gridTableWith'` are now a `mf TableComponents` instead of a quadruple of alignments, column widths, header rows and body rows. Additional exports from Text.Pandoc.Parsing: - `tableWith'` - `TableComponents` - `TableNormalization` - `toTableComponents` - `toTableComponents'`
2022-03-11Parsing: partition module into (internal) submodules (#7962)Albert Krewinkel
2022-01-02Copyright notices: update for 2022Albert Krewinkel
2021-07-29parseFromString: preserve at least the source directory.John MacFarlane
Previously we just set the source name to "chunk" when parsing from strings, to avoid misleading source positions. This had the side effect that `rebase_relative_paths` would break inside sections that were parsed as strings. So, now we use "ORIGINAL_SOURCE_PATH_chunk" instead of just "chunk". Closes #7464.
2021-06-22Fix unneeded importJohn MacFarlane
2021-06-21Improve emailAddress in Text.Pandoc.Parsing.John MacFarlane
Previously the parser would accept characters in domains that are illegal in domains, and this sometimes caused it to gobble bits of the following text. Closes #7398. Note that this change, by itself, caused some txt2tag reader tests to fail. txt2tags allows bare email addresses with a following form query. So, in addition to the change to emailAddress, we modify the txt2tags parser so it can still handle these cases.
2021-05-13Implement curly-brace syntax for Markdown citation keys.John MacFarlane
The change provides a way to use citation keys that contain special characters not usable with the standard citation key syntax. Example: `@{foo_bar{x}'}` for the key `foo_bar{x}`. Closes #6026. The change requires adding a new parameter to the `citeKey` parser from Text.Pandoc.Parsing [API change]. Markdown reader: recognize @{..} syntax for citatinos. Markdown writer: use @{..} syntax for citations when needed. Update manual with curly-brace syntax for citations. Closes #6026.
2021-05-09T.P.Parsing: improve include file functions.John MacFarlane
Remove old `insertIncludedFileF`. [API change] Give `insertIncludedFile` a more general type, allowing it to be used where `insertIncludedFileF` was.
2021-05-09Change reader types, allowing better tracking of source positions.John MacFarlane
Previously, when multiple file arguments were provided, pandoc simply concatenated them and passed the contents to the readers, which took a Text argument. As a result, the readers had no way of knowing which file was the source of any particular bit of text. This meant that we couldn't report accurate source positions on errors or include accurate source positions as attributes in the AST. More seriously, it meant that we couldn't resolve resource paths relative to the files containing them (see e.g. #5501, #6632, #6384, #3752). Add Text.Pandoc.Sources (exported module), with a `Sources` type and a `ToSources` class. A `Sources` wraps a list of `(SourcePos, Text)` pairs. [API change] A parsec `Stream` instance is provided for `Sources`. The module also exports versions of parsec's `satisfy` and other Char parsers that track source positions accurately from a `Sources` stream (or any instance of the new `UpdateSourcePos` class). Text.Pandoc.Parsing now exports these modified Char parsers instead of the ones parsec provides. Modified parsers to use a `Sources` as stream [API change]. The readers that previously took a `Text` argument have been modified to take any instance of `ToSources`. So, they may still be used with a `Text`, but they can also be used with a `Sources` object. In Text.Pandoc.Error, modified the constructor PandocParsecError to take a `Sources` rather than a `Text` as first argument, so parse error locations can be accurately reported. T.P.Error: showPos, do not print "-" as source name.
2021-04-29Further improvements in smart quotes.John MacFarlane
Improves heuristic for detection of an "open double quote." Closes #2103.
2021-04-28Smarter smart quotes.John MacFarlane
Treat a leading " with no closing " as a left curly quote. This supports the practice, in fiction, of continuing paragraphs quoting the same speaker without an end quote. It also helps with quotes that break over lines in line blocks. Closes #7216.
2021-03-21Simplify T.P.Asciify and export toAsciiText [API change].John MacFarlane
Instead of encoding a giant (and incomplete) map, we now just use unicode-transforms to normalize the text to a canonical decomposition, and manipulate the result. The new `toAsciiText` is equivalent to the old `T.pack . mapMaybe toAsciiChar . T.unpack` but should be faster.
2021-03-20Support `yaml_metadata_block` extension form commonmark, gfm.John MacFarlane
This is a bit more limited than with markdown, as documented in the manual: - The YAML block must be the first thing in the input. - The leaf notes are parsed in isolation from the rest of the document. So, for example, you can't use reference links if the references are defined later in the document. Closes #6537.
2021-03-20Text.Pandoc.Parsing: remove F type synonym.John MacFarlane
Muse and Org were defining their own F anyway, with their own state. We therefore move this definition to the Markdown reader.
2021-03-19T.P.Shared: Remove ToString, ToText typeclasses [API change].John MacFarlane
T.P.Parsing: revise type of readWithM so that it takes a Text rather than a polymorphic ToText value. These typeclasses were there to ease the transition from String to Text. They are no longer needed, and they may clash with more useful versions under the same name. This will require a bump to 2.13.
2021-02-22Text.Pandoc.UTF8: change IO functions to return Text, not String.John MacFarlane
[API change] This affects `readFile`, `getContents`, `writeFileWith`, `writeFile`, `putStrWith`, `putStr`, `putStrLnWith`, `putStrLn`. `hPutStrWith`, `hPutStr`, `hPutStrLnWith`, `hPutStrLn`, `hGetContents`. This avoids the need to uselessly create a linked list of characters when emiting output.
2021-01-08Update copyright notices for 2021 (#7012)Albert Krewinkel
2021-01-07T.P.Parsing: modify gridTableWith' for headerless tables.John MacFarlane
If the table lacks a header, the header row should be an empty list. Previously we got a list of empty cells, which caused an empty header to be emitted instead of no header. In LaTeX/PDF output that meant we got a double top line with space between. @tarleb @despres - please let me know if this is problematic for some reason I'm not grasping.
2020-12-07Parsing: Small code improvements.John MacFarlane
2020-12-07Parsing: More minor performance improvements.John MacFarlane
2020-12-07Small efficiency improvement in uri parserJohn MacFarlane
2020-12-07Parsing: in nonspaceChar use satisfy instead of oneOf.John MacFarlane
For efficiency.
2020-11-17Markdown reader: fix regression with example list references.John MacFarlane
This affects example list references followed by dashes. Introduced by commit b8d17f7. Closes #6855.
2020-10-08Export ParseError from T.P.Parsing.John MacFarlane
2020-10-07Raise informative errors when YAML metadata parsing fails.John MacFarlane
Closes #6730. Previously the command would succeed, returning empty metadata, with no errors or warnings. API changes: - Remove now unused CouldNotParseYamlMetadata constructor for LogMessage (T.P.Logging). - Add 'Maybe FilePath' parameter to yamlToMeta in T.P.Readers.Markdown.
2020-09-21Markdown reader: Set citationNoteNum accurately in citations.John MacFarlane
This also changes stateLastNoteNumber -> stateNoteNumber.
2020-09-21Parsing: add stateInNote and stateLastNoteNumber to ParserState.John MacFarlane
These will be used to populate note numbers for citations.
2020-09-13Fix hlint suggestions, update hlint.yaml (#6680)Christian Despres
* Fix hlint suggestions, update hlint.yaml Most suggestions were redundant brackets. Some required LambdaCase. The .hlint.yaml file had a small typo, and didn't ignore camelCase suggestions in certain modules.
2020-06-23Markdown reader: Don't require blank line after grid table.John MacFarlane
This fixes #6481, allowing grid tables to be enclosed in fenced divs with no intervening blank lines.
2020-04-15Use the new builders, modify readers to preserve empty headersdespresc
The Builder.simpleTable now only adds a row to the TableHead when the given header row is not null. This uncovered an inconsistency in the readers: some would unconditionally emit a header filled with empty cells, even if the header was not present. Now every reader has the conditional behaviour. Only the XWiki writer depended on the header row being always present; it now pads its head as necessary.
2020-04-15Adapt to the newest Table type, fix some previous adaptation issuesdespresc
- Writers.Native is now adapted to the new Table type. - Inline captions should now be conditionally wrapped in a Plain, not a Para block. - The toLegacyTable function now lives in Writers.Shared.
2020-04-15Implement the new Table typedespresc
2020-03-22Finer grained imports of Text.Pandoc.Class submodules (#6203)Albert Krewinkel
This should speed-up recompilation after changes in `Text.Pandoc.Class`, as the number of modules affected by a change will be smaller in general. It also offers faster insights into the parts of `T.P.Class` used within a module.
2020-03-15Use implicit Prelude (#6187)Albert Krewinkel
* Use implicit Prelude The previous behavior was introduced as a fix for #4464. It seems that this change alone did not fix the issue, and `stack ghci` and `cabal repl` only work with GHC 8.4.1 or newer, as no custom Prelude is loaded for these versions. Given this, it seems cleaner to revert to the implicit Prelude. * PandocMonad: remove outdated check for base version Only base versions 4.9 and later are supported, the check for `MIN_VERSION_base(4,8,0)` is therefore unnecessary. * Always use custom prelude Previously, the custom prelude was used only with older GHC versions, as a workaround for problems with ghci. The ghci problems are resolved by replacing package `base` with `base-noprelude`, allowing for consistent use of the custom prelude across all GHC versions.
2020-03-13Update copyright year (#6186)Albert Krewinkel
* Update copyright year * Copyright: add notes for Lua and Jira modules
2020-02-13Add highlight directive to the rST reader (#6140)Lucas Escot
2020-02-07Apply linter suggestions. Add fix_spacing to lint target in Makefile.John MacFarlane
2020-02-07Resolve HLint warningsAlbert Krewinkel
All warnings are either fixed or, if more appropriate, HLint is configured to ignore them. HLint suggestions remain. * Ignore "Use camelCase" warnings in Lua and legacy code * Fix or ignore remaining HLint warnings * Remove redundant brackets * Remove redundant `return`s * Remove redundant as-pattern * Fuse mapM_/map * Use `.` to shorten code * Remove redundant `fmap` * Remove unused LANGUAGE pragmas * Hoist `not` in Text.Pandoc.App * Use fewer imports for `Text.DocTemplates` * Remove redundant `do`s * Remove redundant `$`s * Jira reader: remove unnecessary parentheses
2019-11-14Parsing: Rename takeWhileP -> take1WhileP and clean it up.John MacFarlane
(It doesn't match the empty sequence.)
2019-11-12Switch to new pandoc-types and use Text instead of String [API change].despresc
PR #5884. + Use pandoc-types 1.20 and texmath 0.12. + Text is now used instead of String, with a few exceptions. + In the MediaBag module, some of the types using Strings were switched to use FilePath instead (not Text). + In the Parsing module, new parsers `manyChar`, `many1Char`, `manyTillChar`, `many1TillChar`, `many1Till`, `manyUntil`, `mantyUntilChar` have been added: these are like their unsuffixed counterparts but pack some or all of their output. + `glob` in Text.Pandoc.Class still takes String since it seems to be intended as an interface to Glob, which uses strings. It seems to be used only once in the package, in the EPUB writer, so that is not hard to change.
2019-11-11Fix typos (#5896)Brian Wignall