Understanding Character Encoding: UTF-8, ASCII, Unicode
A clear explanation of character encoding - ASCII, Unicode, UTF-8, UTF-16, and why it matters for developers.
Why Character Encoding Matters
Every text character stored or transmitted by a computer must be mapped to a number. Character encoding defines this mapping. Getting it wrong causes the dreaded mojibake (garbled text like "Γ’β¬β’" instead of "'").
ASCII (1963)
ASCII (American Standard Code for Information Interchange) maps 128 characters (0-127) to 7-bit codes:
- A = 65, B = 66, ... Z = 90
- a = 97, b = 98, ... z = 122
- 0 = 48, 1 = 49, ... 9 = 57
ASCII only covers English. Not sufficient for global text.
Unicode
Unicode is a universal character set that assigns a unique "code point" to every character in every writing system:
- 'A' = U+0041
- 'β¬' = U+20AC
- 'δΈ' = U+4E2D
- 'π' = U+1F600
Unicode defines what characters exist; encodings (UTF-8, UTF-16, etc.) define how to store them.
UTF-8 (The Web Standard)
UTF-8 encodes Unicode using 1-4 bytes per character:
- ASCII characters (0-127): 1 byte - backward compatible!
- Latin extended, common symbols: 2 bytes
- Chinese, Japanese, Korean: 3 bytes
- Emoji, rare scripts: 4 bytes
UTF-8 is used by 98%+ of websites. Always use UTF-8 for web content.
<meta charset="UTF-8">
HTML Entities
When you can't use a character directly in HTML, use entities:
& β &
< β <
> β >
© β Β©
😀 β π
Use our HTML Entity Encoder/Decoder and Unicode Escape tool.