You edit CodeTree.java to implement adaptive compression. There will be a file HuffmanTree.java that will be provided for you to edit for the static compression.
We will change "up to the root ..." to "up toward the root ...". Thanks
for pointing this out. What was intended is that the internal node off
which the (leaf) symbol node hangs gets splayed upwards to the root.
If the input file is long, it does not really matter if limited splaying is
used or splaying that internal node all the way to the root is done each
time the symbol (off which it hangs) is accessed. We will look at our
own compressor to see which it does, so that short file compression
will not differ materially from yours. Then we will tell you.
UPDATE 11 April --
Splay all the way to the root. This is to conform with
the description in your text, even though it does not matter for long inputs.
Our compressor will do this.
The read() method in FileInputStream reads a single byte (8 bits), and the write(int b) method in FileOutputStream writes a single byte. So it does not matter if the actual file is text, image or whatever. It would be a mistake to assume that what you will get as inputs are characters. FileReader in the lecture slides reads in characters (16 bits), so is not suitable.
The provided classes BitInputStream
and
BitOutputStream
are only to be used for reading and writing the
compressed file.
For reading and writing the uncompressed file, the classes
FileInputStream
and FileOutputStream
are to be
used --
these classes have methods for reading and writing one byte at a time
(see previous FAQ entry).
2. For the adaptive splaying compression, are we meant to construct the tree with 257 leaves all the time, or construct a tree which varies and only includes the characters which are input?
1. The code tree has to be somehow passed to the decompressor. Please read the assignment to see how that is done.
2. You need to construct the entire tree all the time. If you construct a partial tree based on the contents of the file, the decompressor has no way of knowing what tree you started with. (For adaptive splay compression, you do not encode the tree in the compressed file.) There is also no real saving in constructing an initial partial tree. See answer to next question.
In principle yes. In practice the only extra overhead is the code tree size, likely to be a lot smaller than the coded input. So, it does not matter either way.
2. Also, what is the weighted path length of the Huffman tree for an empty file? Following the formula and dividing by the total number of bytes gives Infinity (whereas 0 might be a more meaningful value).
1. We will check your WPL against ours along these lines so that real number arithmetic effects are ignored. Say ours is W1 and yours is W2. If the difference is less than 1% they are considered to be equal.
2. The empty file is an anomaly. It will not be a test case.
They are standard Java IO classes. To use them, you need the following
import
line at the beginning of your file (already included
in the provided templates):
import java.io.*;
Your browser may have automatically converted the text files from Unix to Windows format (which is a bit larger). To avoid this, when you download the files, right click on the link to save the file. Testing your program on a CSE computer also avoids the problem.
Shhh, don't tell him! ;)