Skip to content

Character Set (Encoding)

character set is a group of alphanumeric characters that can be used on the internet: numbers (0-9), English letters (A-Z), and some special characters like ! $ + - ( ) @ < > . ASCII was the first character encoding standard.

Character set encoding refers to a set of characters and the way the way these characters are stored into memory. A coded character set is a character set in which each character corresponds to a unique number. The code unit size is equivalent to the bit measurement for the particular encoding.

  • A code unit in US-ASCII consists of 7 bits.
  • A code unit in UTF-8, EBCDIC and GB18030 consists of 8 bits.
  • A code unit in UTF-16 consists of 16 bits.
  • A code unit in UTF-32 consists of 32 bits.