Step 1: Determine the UTF-8 encoding bit layout
The character ስ has the Unicode code point U+1235. In UTF-8, it is encoded using 3 bytes because its codepoint is in the range of
0x0800
to0xffff
.
Therefore we know that the UTF-8 encoding will be done over 16 bits within the final 24 bits and that it will have the format:1110xxxx 10xxxxxx 10xxxxxx
Where thex
are the payload bits.UTF-8 Encoding bit layout by codepoint range Codepoint Range Bytes Bit pattern Payload length U+0000 - U+007F 1 0xxxxxxx 7 bits U+0080 - U+07FF 2 110xxxxx 10xxxxxx 11 bits U+0800 - U+FFFF 3 1110xxxx 10xxxxxx 10xxxxxx 16 bits U+10000 - U+10FFFF 4 11110xxx 10xxxxxx 10xxxxxx 10xxxxxx 21 bits Step 2: Obtain the payload bits:
Convert the hexadecimal code point U+1235 to binary:
00010010 00110101
. Those are the payload bits.Step 3: Fill in the bits to match the bit pattern:
Obtain the final bytes by arranging the paylod bits to match the bit layout:
11100001 10001000 10110101
ETHIOPIC SYLLABLE SE·U+1235
Character Information
Character Representations
Click elements to copyEncoding | Hex | Binary |
---|---|---|
UTF8 | E1 88 B5 | 11100001 10001000 10110101 |
UTF16 (big Endian) | 12 35 | 00010010 00110101 |
UTF16 (little Endian) | 35 12 | 00110101 00010010 |
UTF32 (big Endian) | 00 00 12 35 | 00000000 00000000 00010010 00110101 |
UTF32 (little Endian) | 35 12 00 00 | 00110101 00010010 00000000 00000000 |
Description
U+1235, known as ETHIOPIC SYLLABLE SE, is a unique character within the Unicode Standard that holds significant importance in Ethiopian orthography. This syllable is part of the Ethiopic script, which has been used for centuries to write the Amharic language - the official and most widely spoken language in Ethiopia. The ETHIOPIC SYLLABLE SE character serves as a building block for constructing words within the Ethiopian writing system, where it usually follows the vowel "a" and can be combined with other consonants or syllables to form complex words. In digital text, U+1235 helps preserve the authenticity and accuracy of Ethiopian texts, enabling readers worldwide to access and comprehend works written in Amharic or other Ethiopian languages. This character also aids in preserving linguistic heritage and facilitating communication among Ethiopian communities that rely on written forms of their languages for cultural, religious, and educational purposes. The Unicode Consortium introduced the U+1235 character to ensure consistent encoding of Ethiopic texts across different digital platforms, such as websites, documents, and software applications. This has been instrumental in supporting the Ethiopian language's rich linguistic history and cultural identity, while promoting internationalization and multilingual support within the realm of computing and technology.
How to type the ስ symbol on Windows
Hold Alt and type 4661 on the numpad. Or use Character Map.