[prev] 14 [next]

Huffman coding

A Huffman code is a prefix code that is commonly used for lossless data compression.

Huffman coding uses a specific method for choosing the representation for each symbol, resulting in a prefix code (sometimes called "prefix-free codes")

  • the bit string representing some particular symbol is never a prefix of the bit string representing any other symbol
  • The algorithm derives a variable-length code table for encoding source symbols, using estimated probability or frequency of occurrence for each possible value of the source symbol.
  • More common symbols are generally represented using fewer bits than less common symbols.
  • computes frequency f(c) for each character c
  • encodes high-frequency characters with short code
  • no code word is a prefix of another code word
  • uses optimal encoding tree to determine the code words