summaryrefslogtreecommitdiff
diff options
context:
space:
mode:
authorAlbert Krewinkel <albert@zeitkraut.de>2022-06-16 17:40:33 +0200
committerAlbert Krewinkel <albert@zeitkraut.de>2022-06-16 17:40:33 +0200
commit37fc412daa145ab666f9c400781435152a6ed72c (patch)
treedf938aafb161cf99f8691d89e3b57810a78b6fc4
parent0bd8a0e3d1fda091ad77c110d8006f540c69d0c1 (diff)
doc/lua-filters.html: add list of common pitfalls
A list with common filtering and Lua pitfalls is added to the "debugging" section. Closes: #6077
-rw-r--r--doc/lua-filters.md44
1 files changed, 44 insertions, 0 deletions
diff --git a/doc/lua-filters.md b/doc/lua-filters.md
index f64c77298..6a302622d 100644
--- a/doc/lua-filters.md
+++ b/doc/lua-filters.md
@@ -416,6 +416,50 @@ should add/modify your `LUA_PATH` and `LUA_CPATH` to include the
correct locations; [see detailed instructions
here](https://studio.zerobrane.com/doc-remote-debugging).
+## Common pitfalls
+
+AST elements not updated
+: A filtered element will only be updated if the filter
+ function returns a new element to replace it. A function like
+ the below has no effect, as the function returns no value:
+
+ ``` lua
+ function Str (str)
+ str.text = string.upper(str.text)
+ end
+ ```
+
+ The correct version would be
+
+ ``` lua
+ function Str (str)
+ str.text = string.upper(str.text)
+ return str
+ end
+ ```
+
+Pattern behavior is locate dependent
+: The character classes in Lua's pattern library depend on the
+ current locale: E.g., the character `©` will be treated as
+ punctuation, and matched by the pattern `%p`, on CP-1252
+ locales, but not on systems using a UTF-8 locale.
+
+ A reliable way to ensure unified handling of patterns and
+ character classes is to use the "C" locale by adding
+ `os.setlocale 'C'` to the top of the Lua script.
+
+String library is not Unicode aware
+: Lua's `string` library treats each byte as a single
+ character. A function like `string.upper` will not have the
+ intended effect when applied to words with non-ASCII
+ characters. Similarly, a pattern like `[☃]` will match *any*
+ of the bytes `\240`, `\159`, `\154`, and `\178`, but
+ **won't** match the "snowman" Unicode character.
+
+ Use the [pandoc.text](#module-text) module for Unicode-aware
+ transformation, and consider using using the lpeg or re
+ library for pattern matching.
+
# Examples
The following filters are presented as examples. A repository of