Assignment 2 FAQ

This page was last updated: 12:37 Tuesday, 2 May, 2006 by cs2011.hw2@cse.unsw.edu.au

I am a little confused as to wheather we are supposed to implement both the adaptive and static compression techniques, the spec does say "This project will involve writing a Java class to implement Adaptive Data Compression [b]and[/b] Static Compression." but it also says that we are only supposed to edit the file CodeTree.java, so are we supposed to code two versions of CodeTree.java, or just implement one version?
You edit CodeTree.java to implement adaptive compression. There will be a file HuffmanTree.java that will be provided for you to edit for the static compression.
In the last lecture I asked about the assignment how splaying a node will bring it to the root, but before I could even finish my sentence, the lecturer said that splaying a leaf will not move it to the root, but rather, just upwards closer to the root. But in the spec, it clearly says, "The splay() method should restructure the tree by splaying the specificed node up to the root of the tree". So which one is it?
We will change "up to the root ..." to "up toward the root ...". Thanks for pointing this out. What was intended is that the internal node off which the (leaf) symbol node hangs gets splayed upwards to the root. If the input file is long, it does not really matter if limited splaying is used or splaying that internal node all the way to the root is done each time the symbol (off which it hangs) is accessed. We will look at our own compressor to see which it does, so that short file compression will not differ materially from yours. Then we will tell you.
UPDATE 11 April -- Splay all the way to the root. This is to conform with the description in your text, even though it does not matter for long inputs. Our compressor will do this.
What kind of files should we expect as inputs for compression, only text or image? On what units of inputs should our algorithm work?
The read() method in FileInputStream reads a single byte (8 bits), and the write(int b) method in FileOutputStream writes a single byte. So it does not matter if the actual file is text, image or whatever. It would be a mistake to assume that what you will get as inputs are characters. FileReader in the lecture slides reads in characters (16 bits), so is not suitable.
What's the reason for using readBit? I don't understand why the byte is broken into 8 bits, wouldn't it be more efficient to just pass in the whole byte?
The provided classes BitInputStream and BitOutputStream are only to be used for reading and writing the compressed file. For reading and writing the uncompressed file, the classes FileInputStream and FileOutputStream are to be used -- these classes have methods for reading and writing one byte at a time (see previous FAQ entry).
1. For the Huffman compression, after we've done the compression, and we then go to decompress, are we meant to reconstruct the initial tree using the input file or is there some other way we're expected to do it? (eg. writing the frequency of character in the header of the compressed file, or something like that).
2. For the adaptive splaying compression, are we meant to construct the tree with 257 leaves all the time, or construct a tree which varies and only includes the characters which are input?
1. The code tree has to be somehow passed to the decompressor. Please read the assignment to see how that is done.

2. You need to construct the entire tree all the time. If you construct a partial tree based on the contents of the file, the decompressor has no way of knowing what tree you started with. (For adaptive splay compression, you do not encode the tree in the compressed file.) There is also no real saving in constructing an initial partial tree. See answer to next question.
Can we discard characters in the ASCII set that did not appear in the file when we construct the Huffman tree?
In principle yes. In practice the only extra overhead is the code tree size, likely to be a lot smaller than the coded input. So, it does not matter either way.
1. Does it matter if my weighted path length is slightly different to the sample ones? Even setting the height of the root node to -1, I'm still about 0.003 off. However, my actual file sizes are exactly the same so I'm fairly confident that my Huffman tree is correct.
2. Also, what is the weighted path length of the Huffman tree for an empty file? Following the formula and dividing by the total number of bytes gives Infinity (whereas 0 might be a more meaningful value).
1. We will check your WPL against ours along these lines so that real number arithmetic effects are ignored. Say ours is W1 and yours is W2. If the difference is less than 1% they are considered to be equal.
2. The empty file is an anomaly. It will not be a test case.
Can you please tell me where are the FileInputStream and FileOutputStream classes?? Or are we suppose to write them ourselves??
They are standard Java IO classes. To use them, you need the following import line at the beginning of your file (already included in the provided templates):
```
import java.io.*;
```
My Huffman compressor compresses the image files properly, but for the text files, my compressed file seems to be around 20 bytes larger.
Your browser may have automatically converted the text files from Unix to Windows format (which is a bit larger). To avoid this, when you download the files, right click on the link to save the file. Testing your program on a CSE computer also avoids the problem.
Hey, those JPEG files are pictures of the lecturer!
Shhh, don't tell him! ;)

Back to Assignment 2 | Back to the main page