change-buffer-charset(3)change-buffer-charset - Convert buffer; between two character sets
buffer-is-utf8 - Detect if the current buffer is UTF-8 encoded
change-buffer-charset
n change-buffer-charset "from-charset" "fto-charset"
n buffer-is-utf8 variable
change-buffer-charset opens a dialog allowing the user to select a From and To character set. If the Convert button is selected the current buffer is converted to the destination character set. The command assumes that the current buffer is written in the From character set, no attempt is made to verify this.
The command also provides the ability to convert to and from UTF-8. MicroEmacs cannot directly support Unicode, UTF-8 or any multi-byte code pages, for various reasons (insert FAQ ref here) MicroEmacs only supports a single byte code page restricting the user to working with 255 characters at any one time. However, change-buffer-charset does provide the user with the ability to load a UTF-8 file, losslessly convert the loaded file to the current code-page, edit the file and then convert it back to UTF-8 before saving.
The lossless conversion makes use of 2 special prefix characters, 0x01 and 0x02, followed by 3 or 5 hexadecimal digits respectively. It is important to understand that these 4 or 6 character strings are just that to the rest of the system, as a result operations like kill-rectangle(2) could split the character strings and invalidate them in any later conversion process. This is not an ideal solution but better than losing the characters entirely. The prefix characters are rendered as [u] and [U] if extended character rendering is enabled, see bit 0x10000 of $system(5) for more information.
change-buffer-charset also allows unsupported characters to be changed to the unsupported character (0x07) or the characters can be removed.
The current character set is configured using the user-setup(3) dialog (see Display Font Set). This in turn uses the command set-char-mask(2) to create the low level character conversion tables.
The change-buffer-charset dialog is not opened if both from-charset and fto-charset are given on the command-line, the argument n is used to set the options of the conversion, where the bits are defined as follows:
0x01
0x02
0x04
0x08
A value of "display" can be given for either of the from-charset or fto-charset to indicate the user's current display character set, see user-setup(3).
The buffer-is-utf8 command attempts to determine if the current buffer is UTF-8 encoded or not, setting the given variable to either:
0
1
-1
change-buffer-charset and buffer-is-utf8 are macros defined in charsutl.emf.
(c) Copyright JASSPA 2025
Last Modified: 2025/09/07
Generated On: 2025/09/29