summaryrefslogtreecommitdiff
path: root/src/Text/Pandoc/Readers/HTML
AgeCommit message (Collapse)Author
2023-10-19HTML reader: allow th to close td and vice versa.John MacFarlane
Closes #9090.
2023-09-16HTML reader: parse task lists using input elements (#9066)Seth Speaks
Allow the HTML reader to parse task lists of the sort produced by pandoc. Closes #9047
2023-08-05HTML reader: properly calculate RowHeadColumns.John MacFarlane
The previous algorithm did not handle rowspans; this one does. Closes #8984.
2023-08-05HTML reader: require unanimity for RowHeadColumns.John MacFarlane
Previously we used the max. #8634 switched to the min, but this had bad results. This commit sets the RowHeadColumns to the consensus value from all rows, or 0 if there is no consensus. See #8984.
2023-08-05HTML reader: fix bug in calculation of RowHeadColumns.John MacFarlane
We were adding up cells, not colspans. Note: there may still be incorrect results in the presence of rowspans. See https://github.com/jgm/pandoc/issues/8984#issuecomment-1666467926
2023-08-05Revert "Update TableBody RowHeadColumns caculation: change from max to min ↵John MacFarlane
(#8634)" This reverts commit f257c97170ba8db3b771135b98b198d5de2bdb5b. For the reason, see #8984. The change caused the "grid shape" of some tables to change.
2023-06-24Update TableBody RowHeadColumns caculation: change from max to min (#8634)Ruqi
This change sets RowHeadColumns to the minimum value of each row, which gives better results in cases where rows have different numbers of leading th tags.
2023-02-18HTML writer: allow "track" element to be treated as block-level HTML.John MacFarlane
Closes #8629.
2023-01-10Update copyright years, it's 2023!Albert Krewinkel
2022-10-16T.P.Parsing: Remove gratuitious renaming of Parsec types.John MacFarlane
We were exporting Parser, ParserT as synonyms of Parsec, ParsecT. There is no good reason for this and it can cause confusion. Also, when possible, we replace imports of Text.Parsec with T.P.Parsing. The idea is to make it easier, at some point, to switch to megaparsec or another parsing engine if we want to. T.P.Parsing new exports: Stream(..), updatePosString, SourceName, Parsec, ParsecT [API change]. Removed exports: Parser, ParserT [API change].
2022-10-03Rename T.P.Readers.LaTeX.Types -> T.P.TeX.John MacFarlane
2022-02-09Fix parsing of epub footnotes.John MacFarlane
Closes #7884.
2022-01-02Copyright notices: update for 2022Albert Krewinkel
2021-11-24HTML reader: parse attributes on links and images.John MacFarlane
Closes #6970.
2021-09-23HTML reader: handle empty tbody element in table.John MacFarlane
Closes #7589.
2021-08-10HTML reader: treat commments as blank when parsing.John MacFarlane
This modifies pBlank. Previously comments could sometimes flummox the parser. Cloes #7482.
2021-07-06HTML reader: add col, colgroup to 'closes' definitionsJohn MacFarlane
2021-05-30HTML reader: fix column width regression.John MacFarlane
Column widths specified with a style attribute were off by a factor of 100 in 2.14. Closes #7334.
2021-05-22Handle relative lengths (e.g. `2*`) in HTML column widths.John MacFarlane
See <https://www.w3.org/TR/html4/types.html#h-6.6>. "A relative length has the form "i*", where "i" is an integer. When allotting space among elements competing for that space, user agents allot pixel and percentage lengths first, then divide up remaining available space among relative lengths. Each relative length receives a portion of the available space that is proportional to the integer preceding the "*". The value "*" is equivalent to "1*". Thus, if 60 pixels of space are available after the user agent allots pixel and percentage space, and the competing relative lengths are 1*, 2*, and 3*, the 1* will be alloted 10 pixels, the 2* will be alloted 20 pixels, and the 3* will be alloted 30 pixels." Closes #4063.
2021-05-22Revert "HTML reader: simplify col width parsing"John MacFarlane
This reverts commit f76fe2ab56606528d4710cc6c40bceb5788c3906.
2021-05-22HTML reader: simplify col width parsingAlbert Krewinkel
2021-03-19Protect partial uses of maximum with NonEmpty.John MacFarlane
2021-03-15Remove an unneeded importJohn MacFarlane
2021-03-15Use foldl' instead of foldl everywhere.John MacFarlane
2021-01-08Update copyright notices for 2021 (#7012)Albert Krewinkel
2020-12-10HTML reader: retain attribute prefixes and avoid duplicates.John MacFarlane
Previously we stripped attribute prefixes, reading `xml:lang` as `lang` for example. This resulted in two duplicate `lang` attributes when `xml:lang` and `lang` were both used. This commit causes the prefixes to be retained, and also avoids invald duplicate attributes. Closes #6938.
2020-11-27HTML reader tests: improve test coverage of new featuresAlbert Krewinkel
2020-11-27HTML reader: support body headers, row head columnsAlbert Krewinkel
Closes: #6312
2020-11-26HTML reader: improve support for table headers, footer, attributesAlbert Krewinkel
- `<tfoot>` elements are no longer added to the table body but used as table footer. - Separate `<tbody>` elements are no longer combined into one. - Attributes on `<thead>`, `<tbody>`, `<th>`/`<td>`, and `<tfoot>` elements are preserved.
2020-11-26HTML reader: allow finer grained options for tag omissionAlbert Krewinkel
2020-11-24HTML reader: support row or column-spanning table cellsAlbert Krewinkel
2020-11-24HTML reader: support blocks in captionAlbert Krewinkel
2020-11-24HTML reader: extract table parsing into separate moduleAlbert Krewinkel
2020-11-23HTML reader: extract submodulesAlbert Krewinkel
Reducing module size should reduce memory use during compilation. This is preparatory work to tackle support for more table features.