set-char-mask(2)set-char-mask - Set character word mask
n set-char-mask "flags" ["value"]
set-char-mask returns or modifies the setting of MicroEmacs internal character tables. The argument n defines the action to be taken, as follows:-
-1
0
1
The first argument "flags" determines the required character set as follows:-
M
The "value" for the M flag must be a string containing pairs of characters, an internal character set character followed by its display character set equivalent. For example, if the current internal character set is CP-1252 and the display character set is CP-437 (a common DOS code page), then 'e' acute must be mapped from 0xe9 in CP-1252 to 0x82 in CP-437 so "value" should contain "\xe9\x82" as well as many other mappings. All characters in the display character set should be mapped if possible rather than just the letters most commonly used by the current language as this creates the best support for any text entered.
Some display character sets may not have all the characters available in the internal character set, for instance DOS code page CP-437 does not have an upper-case 'E' grave. In this case an ordinary 'E' should be used as a sensible replacement, i.e. "`EE" (where `E is an 'E' grave) as this is the best that can be done given the limitations of the current display character set.
This flag cannot be incrementally altered, any calls to alter this set leads to the resetting of all the character tables so the character mapping must be performed first and in a single call. No other set may be altered in the same call.
c
The mapping is used by MicroEmacs on Windows and UNIX XTerm systems to better map the system clipboard text to the current display character set. For example XTerm typically uses an A-F code page which does not support a euro currency symbol, however CP-1252 based fonts can be installed and used correctly by MicroEmacs allowing support for the euro and many other characters. This mapping then allows MicroEmacs to correctly handle these characters when copied between different applications. The mapping table is also used by the &uni(4) functions, expand-iso-accents(3) and change-buffer-charset(3) commands.
d
p
P
I
D
a
Note that the returned character list will pair all lower-case characters with their upper-case equivalent letters first.
l
u
h
A
s
1, 2, 3 & 4
k
As with flag M, this cannot be incrementally altered, any call to set this mapping first resets the mapping table so the mapping must be performed in a single call. No other set may be altered in the same call. When setting, the "value" must supply pairs of characters, the keyboard character followed by the character to map it to, typically an ASCII character.
Unless stated otherwise, multiple flags may be specified at the same time returning a combined character set or setting multiple properties for the given "value" characters.
For many UNIX XTerm fonts the best characters to use for $box-chars(5) (used in drawing osd(2) dialogs) lie in the range 0x0B to 0x19. For example the vertical bar is '\x19', the top left hand corner is '\x0D' etc. These characters are by default set to be not displayable or pokable which renders them useless. They can be made displayable and pokable as follows:-
set-char-mask "dp" "\x19\x0D\x0C\x0E\x0B\x18\x15\x0F\x16\x17\x12"
MicroEmacs variables have either '$', '#', '%', ':' or a '.' character prepended to their name, they may also contain a '-' character in the body of their name. It is preferable for these characters to be part of the variable 'word' so commands like forward-kill-word(2) can work correctly. This may be achieved by adding these characters to user set 2 and setting the buffer-mask variable to include set 2, as follows:
set-char-mask "2" "$#%:.-" define-macro fhook-emf set-variable $buffer-mask "luh2" . . !emacro
For the examples below only the following subset of characters will be used:-
Character Win CP-1252 Cmd CP-850 DOS CP-437
Capital A (A) A A A
Capital A grave (`A) \xC0 \xB7 No equivalent
Capital A acute ('A) \xC1 \x90 No equivalent
Small a (a) a a a
Small A grave (`a) \xE0 \x85 \x85
Small A acute ('a) \xE1 \xA0 \xA0
As the spell checker for French will operates in Windows CP-1252, the character font mapping (flag M) must be correctly setup for spell checking to operate correctly. When CP-1252 is also used as the display character set the mapping is the empty string as the internal and display character set are fully in-sync, but for both Windows Console CP-850 and DOS code page CP-437 the mappings should be set as follows:-
; CP-850 mapping setup set-char-mask "M" "\xC0\xB7\xC1\x90\xE0\x85\xE1\xA0" ; CP-437 mapping setup set-char-mask "M" "\xC0A\xC1AAA\xE0\x85\xE1\xA0"
As all the characters in CP-1252 have equivalents in CP-850, the mapping for Windows console is a simple 1-to-1 lossless character list. However the missing capital A's in CP-437 causes problems, for the command change-buffer-charset(3) it is preferable for a mapping of `A to be given, otherwise the document being converted may become corrupted and unreadable. Therefore a mapping of `A to A is given to alleviate this problem, similarly 'A is also mapped to A leading to loss of information.
This leads to a further problem with the conversion of CP-437 back to CP-1252, if the mapping the 'A's was left as just "\xC0A\xC1A" the last mapping ('A to A) would also be the back conversion for A, i.e. ALL A's would be converted back to 'A's. To solve this problem, a further seemingly pointless mapping of A to A is given to correct the back conversion.
While ISO-8859-1 (Latin 1) supports a very similar set of characters to CP-1252, it lacks some accented 'S', 'Y' & 'Z' characters must be mapped to their plain letter equivalents.
For languages which use accented characters, the alphabetic character set must be extended to include these characters for letter based commands like forward-word(2) and upper-case-word(2) to operate correctly. However, the letter set should be fully extended for each code page regardless of the language being used as an 'a' acute should always be considered a letter even though it is unlikely to occur. The addition of extra letters must achieve two goals, firstly to define whether a character is a letter, enabling commands like forward-word to work correctly. The second is to provide an upper case to lower case character mapping, enabling commands like upper-case-word to work correctly. This is achieved with a single call to set-char-mask using the a flag as follows:-
set-char-mask "a" "\xC0\xE0\xC1\xE1"
Note that this flag always expects an internal character set based string, this allows the same map character list to be used regardless of the display character set being used, i.e. the above line can be used for CP-1252, CP-850, CP-437 & ISO-8859-1 code pages. But it does mean that the internal to display character set mapping (flag M) must already have been provided.
Similar mapping problems are encountered with the a flag as with flag M above. The problem is not immediately obvious because the mapping is always given in internal character set which will support the widest set of characters, but when CP-437 is used the mapping string of "A\x85A\xA0" must be used. As can be seen, A is mapped last to 'a so an upper to lower character operation will convert a A to 'a. A similar solution is used, a further mapping of A to a is given to correct the default case mapping for both A and a, i.e. the following line should always be used instead:-
set-char-mask "a" "\xC0\xE0\xC1\xE1Aa"
forward-word(2), upper-case-word(2), change-buffer-charset(3), &uni(4), $buffer-mask(5), screen-poke(2), spell(2), $buffer-tab-width(5).
(c) Copyright JASSPA 2025
Last Modified: 2024/05/12
Generated On: 2025/09/29