[prev] 15 [next]

Text Compression using Huffman coding

Code … mapping of each character to a binary code word

Prefix code … binary code such that no code word is prefix of another code word

Encoding tree

  • represents a prefix code
  • each leaf stores a character
  • code word given by the path from the root to the leaf   (0 for left child, 1 for right child)

Text compression problem

Given a text T, find a prefix code that yields the shortest encoding of T

  • short codewords for frequent characters
  • long code words for rare characters