Step 1: Determine the UTF-8 encoding bit layout
The character has the Unicode code point U+0004. In UTF-8, it is encoded using 1 byte because its codepoint is in the range of
0x0000
to0x007f
.
Therefore we know that the UTF-8 encoding will be done over 7 bits within the final 8 bits and that it will have the format:0xxxxxxx
Where thex
are the payload bits.UTF-8 Encoding bit layout by codepoint range Codepoint Range Bytes Bit pattern Payload length U+0000 - U+007F 1 0xxxxxxx 7 bits U+0080 - U+07FF 2 110xxxxx 10xxxxxx 11 bits U+0800 - U+FFFF 3 1110xxxx 10xxxxxx 10xxxxxx 16 bits U+10000 - U+10FFFF 4 11110xxx 10xxxxxx 10xxxxxx 10xxxxxx 21 bits Step 2: Obtain the payload bits:
Convert the hexadecimal code point U+0004 to binary:
00000100
. Those are the payload bits.Step 3: Fill in the bits to match the bit pattern:
Obtain the final bytes by arranging the paylod bits to match the bit layout:
00000100
<control>·U+0004
Character Information
Character Representations
Click elements to copyEncoding | Hex | Binary |
---|---|---|
UTF8 | 04 | 00000100 |
UTF16 (big Endian) | 00 04 | 00000000 00000100 |
UTF16 (little Endian) | 04 00 | 00000100 00000000 |
UTF32 (big Endian) | 00 00 00 04 | 00000000 00000000 00000000 00000100 |
UTF32 (little Endian) | 04 00 00 00 | 00000100 00000000 00000000 00000000 |
Description
The Unicode character U+0004, also known as the Control Character LF (Line Feed), is a fundamental element in digital text, facilitating line breaks and formatting. Typically used in text processing systems, this character signifies a new line or carriage return, ensuring proper text formatting across various platforms and devices. Although it may not hold direct linguistic or cultural significance, U+0004 plays an essential role as a technical component in digital communication. This character belongs to the Basic Latin Unicode block (U+0000 to U+007F), which includes 128 essential characters ranging from control codes and special symbols to common alphanumeric characters used in programming languages, text documents, and various other applications. The Basic Latin Unicode block serves as a foundation for many other Unicode blocks and continues to be an integral part of modern digital communication, despite its historical roots in the ASCII character set. In terms of its usage, U+0004 is most commonly found in text processing systems where it denotes line breaks or new lines. This enables proper formatting and readability of digital text across different platforms and devices. While it may not carry traditional cultural or linguistic meaning, the character serves as a critical technical element for ensuring seamless communication and data exchange in today's digital world.
How to type the symbol on Windows
Hold Alt and type 0004 on the numpad. Or use Character Map.