translate-buffer(2)
[Home]
[Commands]
[Variables]
[Macro-Dev]
[Glossary]
NAME
SYNOPSIS
DESCRIPTION
translate-buffer converts the text in the current buffer to another format, the conversion performed is controlled be the given numeric argument n which is bit based where the bits are defined as follows:
0x003
The lower 2 bits define the base conversion, see below.
0x004
When this bit is set and converting from UTF-16 or UTF-32 then the buffer is in big-endian byte order, if clear the buffer is in little-endian byte order.
0x008
When set and converting from a Unicode UTF format the buffer start is checked for a Unicode 'BOM' character. If found it is removed but flagged to be readded on save, see bit 0x08 of variable
$buffer-xlate(5).
0x100
When set MicroEmacs attempts to auto-detect the Unicode UTF format of the current buffer, setting bits 0x01 to 0x08 appropriately. MicroEmacs will not attempt to detect when a lossless text format might be best, it always chooses a UTF format. Note that when using this bit, bits 0x01 through 0x08 must be clear.
0x200
Output the converted text to a new buffer with given buffer-name rather than replace the text of the current buffer. With this bit is not set, the current buffer must be clean and unedited.
0x400
Used with bit 0x200, when set the new buffer is not automatically displayed.
The base conversion, determined by the first 2 bits, must be one of the following:
0x00
Convert a Reduced-binary (see
rbin(2m)) formatted buffer to a lossless text format, in this mode line termination is always a single \x0A byte (UNIX style EOL) and a minimum set of bytes (\x00, \x01, \x02 and \x07) are protected by encoding to the byte sequence "\x0100#" (byte \x01 followed by 2 ASCII '0' (\x30) characters followed by # which is the value of the byte being protected, i.e. '0', '1', '2' or '7'). In this format every byte is preserved so it can be treated more like a text file.
0x01
Convert a Unicode UTF-8 formatted file, loaded in rbin format, to a Unicode preserving text mode. In this mode line termination is auto-detected and handled as normal but the same spacial bytes as lossless text are protected. Then any Unicode character which as a value of U+0080 or more is then handled in one of 4 ways, if the character is supported by the current MicroEmacs charset it is converted to that single byte value. Otherwise, if the hex value of the character is less than 0x01000 then it is converted to a byte sequence of "\x01###" where ### is the hex value of the character in ASCII text, if less than 0x100000 it is converted to byte sequence of "\x02#####", if the value is 0x100000 or greater the character is replace with a literal \x07 byte (MicroEmacs's undefined character) and the value is lost. At the time of writing there is no use of characters in thiis Unicode plane so this should not be an issue. If the file is invalid, e.g. has invalid character encodings, the command throws an error and the conversion halts.
0x02
As with conversion 0x01 above except the source must be in UTF-16 format, see bit 0x04 for details on setting the endianness.
0x03
As with conversion 0x01 above except the source must be in UTF-32 format, see bit 0x04 for details on setting the endianness.
When omitted, the default value on n is 0x100 which means the buffer must be a UTF file loaded in rbin format, MicroEmacs will then attempt to detect the format being used and convert the current buffer to a lossless text form.
When translate-buffer is used, the resultant buffer will have
xlate(2m) buffer mode enabled and variable
$buffer-xlate(5) set so that MicroEmacs can correctly convert the buffer back to its original format when it is written back out.
EXAMPLE
25 find-file "unicode_file.txt"
translate-buffer
The following can be used to load any file and preserve all bytes, i.e. lossless text mode:
25 find-file "any.file"
0 translate-buffer
NOTES
In lossless text mode, long strings without a UNIX new line character will be split at just under the 64KB length, these lines are flagged as not having a line terminator (see bit 0x010 of variable
$line-flags(5) for more details) so these line breaks do not affect the saved output.
It is important to avoid breaking the encoded byte sequences when editing these 'lossless' buffers, the 4 or 6 byte sequences are simple text bytes and any addition or removal of bytes will alter their value.
SEE ALSO
(c) Copyright JASSPA 2025
Last Modified: 2025/03/25
Generated On: 2025/09/29