Proj1 FAQ

1. Clarification

2. Q&A

Yes. For example, I can request your program to cluster top-75 results of a query into 10 clusters.

We will make sure it does not occur in the test dataset. Unfortuneately, this occurs in the 'australia-10-3-complete' case (last step where all entries are 0.0).

No. I (and you should) use double to record all the similarity values. It is printed with 4 digits after the decimal point to make the output matrix readable.

The single most likely reason is that you always update the matrix using the average of two similarity values from the last-round matrix. This is WRONG. Please double check the formula used in the average-link algorithm.

For those who understand how to do it correctly, yes, it is possible to calculate the new matrix just based on the matrix obtained in the last round. You will need to do a weighted average.