summaryrefslogtreecommitdiff
diff options
context:
space:
mode:
authorJohn MacFarlane <jgm@berkeley.edu>2022-12-28 17:16:30 -0800
committerJohn MacFarlane <jgm@berkeley.edu>2022-12-28 17:16:30 -0800
commitce7d1d1c2029d7a248c1a84958d464dd45f332a2 (patch)
tree60038da09005f02a3dd4dc74d36faded729f4c49
parent6c96340bf63df36c91d11d405af96da8b736eb56 (diff)
Man writer: use UTF-8 by default for non-ascii characters.
Only use groff escapes if `--ascii` has been specified on the command line (`writerPreferAscii`). Closes #8507.
-rw-r--r--MANUAL.txt4
-rw-r--r--src/Text/Pandoc/Writers/Man.hs4
-rw-r--r--test/writer.man18
3 files changed, 14 insertions, 12 deletions
diff --git a/MANUAL.txt b/MANUAL.txt
index c8d949364..330faff37 100644
--- a/MANUAL.txt
+++ b/MANUAL.txt
@@ -990,9 +990,9 @@ header when requesting a document from a URL:
: Use only ASCII characters in output. Currently supported for XML
and HTML formats (which use entities instead of UTF-8 when this
option is selected), CommonMark, gfm, and Markdown (which use
- entities), roff ms (which use hexadecimal escapes), and to a
+ entities), roff man and ms (which use hexadecimal escapes), and to a
limited degree LaTeX (which uses standard commands for accented
- characters when possible). roff man output uses ASCII by default.
+ characters when possible).
`--reference-links`
diff --git a/src/Text/Pandoc/Writers/Man.hs b/src/Text/Pandoc/Writers/Man.hs
index 4e1651e53..859378dce 100644
--- a/src/Text/Pandoc/Writers/Man.hs
+++ b/src/Text/Pandoc/Writers/Man.hs
@@ -83,7 +83,9 @@ pandocToMan opts (Pandoc meta blocks) = do
Just tpl -> renderTemplate tpl context
escString :: WriterOptions -> Text -> Text
-escString _ = escapeString AsciiOnly -- for better portability
+escString opts = escapeString (if writerPreferAscii opts
+ then AsciiOnly
+ else AllowUTF8)
-- | Return man representation of notes.
notesToMan :: PandocMonad m => WriterOptions -> [[Block]] -> StateT WriterState m (Doc Text)
diff --git a/test/writer.man b/test/writer.man
index 752852322..c476c35aa 100644
--- a/test/writer.man
+++ b/test/writer.man
@@ -541,11 +541,11 @@ Ellipses\&...and\&...and\&....
.SH LaTeX
.IP \[bu] 2
.IP \[bu] 2
-2\[u2005]+\[u2005]2\[u2004]=\[u2004]4
+2 + 2 = 4
.IP \[bu] 2
-\f[I]x\f[R]\[u2004]\[mo]\[u2004]\f[I]y\f[R]
+\f[I]x\f[R] ∈ \f[I]y\f[R]
.IP \[bu] 2
-\f[I]\[*a]\f[R]\[u2005]\[AN]\[u2005]\f[I]\[*w]\f[R]
+\f[I]α\f[R] ∧ \f[I]ω\f[R]
.IP \[bu] 2
223
.IP \[bu] 2
@@ -557,7 +557,7 @@ $$\[rs]frac{d}{dx}f(x)=\[rs]lim_{h\[rs]to 0}\[rs]frac{f(x+h)-f(x)}{h}$$
.RE
.IP \[bu] 2
Here\[cq]s one that has a line break in it:
-\f[I]\[*a]\f[R]\[u2005]+\[u2005]\f[I]\[*w]\f[R]\[u2005]\[tmu]\[u2005]\f[I]x\f[R]^2^.
+\f[I]α\f[R] + \f[I]ω\f[R] × \f[I]x\f[R]^2^.
.PP
These shouldn\[cq]t be math:
.IP \[bu] 2
@@ -578,15 +578,15 @@ Here\[cq]s a LaTeX table:
.PP
Here is some unicode:
.IP \[bu] 2
-I hat: \[^I]
+I hat: Î
.IP \[bu] 2
-o umlaut: \[:o]
+o umlaut: ö
.IP \[bu] 2
-section: \[sc]
+section: §
.IP \[bu] 2
-set membership: \[mo]
+set membership: ∈
.IP \[bu] 2
-copyright: \[co]
+copyright: ©
.PP
AT&T has an ampersand in their name.
.PP