Deadline: 11:59pm Tue 12 January 2016
Total Marks: 4 ( A chance to make up an extra mark for missed labs )
In the first part of the lab you will perform an empirical analysis of two sorting algorithms. You can split up into groups of 2 or 3 to do this task.
For parts 2 and 3 of the lab you will do it on your own. If you want to start the lab in advance at home it is better to start with parts 2 and 3 and actually run the timing experiments for part 1 in the lab class. You can still prepare for part 1 by reading the instructions and making sure you can generate data sets.
Warning: to be able to compare your timing results for part 1, you must use the same (or same type) of computer to gather all your timing data. If you do half on your own computer and the rest on the cse machines, you will not be able to compare your results. In other words you need to use one type of machine such as a computer from the leaf lab or the spoons lab etc or the wagner server or your own computer etc for the whole task. Note: It will not matter if you change machines within the tuba or drum lab etc.
Change into your lab04 directory and run the following command:
cp /home/cs1927/public_html/16x1/labs/lab04/files/* .
Note the "." at the end of the command, meaning "into the current directory".
If you've done the above correctly, you should find the following files now in the directory:
genOrdered.c |
main program for generating ordered string data |
genRandom.c |
main program for generating random string data |
lists.c |
A list implementation that you will need to implement the insertion sort algorithm in |
lists.h |
A list interface that does not need to be modified |
isMinHeapOrdered.c |
A stub file for you to write your function isMinHeapOrdered |
isMinHeapOrdered.h |
A tree based heap interface that should not be changed |
testIsMinHeapOrdered.c |
A test file to test your implementation of isMinHeapOrdered. You should add some of your own tests |
isMinHeap.c |
A stub file for you to write your function isMinHeap |
isMinHeap.h |
An array based heap interface that should not be changed |
testIsMinHeapOrdered.c |
A test file to test your implementation of isMinHeap. You should add some of your own tests |
Please write all your results for this section in and open office spreadsheet file called
timing.ods
and write your answers in a file called conclusions.txt
To run open office spreadsheet application type at command line
soffice -calc
This can be done in pairs or groups of 3.
For this part of the lab you are to run experiments to distinguish between three unknown sorting algorithms. These can be run by typing
~cs1927/bin/sortA
~cs1927/bin/sortB
~cs1927/bin/sortC
We know that the sorting algorithms are either insertion sort, naive bubble sort, bubble sort with early exit, selection sort or bogo sort (you may need to research bogo sort to find out what it is and how it behaves).
The programs will read in strings separated by newlines until it reaches the endo of file or Ctrl^d.
For example you could run the program as follows and manually type in input and then press control d.
~cs1927/bin/sortA zzzzz aabzz cccdd qwere ZZZZZctrl^d
Now examine the code in genRandom.c.
Notice that the error messages are printed on standard error using fprintf(stderr, ...) so that you can redirect the output to a file without missing the error messages appearing on the screen. The C function fprintf() is similar to printf() except that the output is sent to the specified file. In this case, the output file is the standard error stream which by default is the terminal on which the program is run. This means that the output would normally appear along with the rest of the output of your program. However, note that using Unix redirection (via >) will channel the standard output of your program to a file, the output printed on standard error will still appear on the screen (and is not channelled to the file by redirection). printf(...) is the same as fprintf(stdout, ...).
Test the program on a small input to make sure that it functions correctly.
./genRandom 5 alqbd mzuag zuaks pqxoa dpsdp
Once you are convinced that the program is working correctly, use it to generate test data. The following example shows you how to generate 1000 samples and have them placed in a file called test1.
% ./genRandom 1000 > test1 % more test1 asdfe poiua zzzxe aaior fdsdq ... (rest of output deleted)
Let us now use the test file we have created (test1 in the example above). Try the following:
~cs1927/bin/sortA < test1
You will see the data followed by the sorted output scroll quickly past on the screen. If you want to save the output to a file, you can use Unix redirection as follows.
~cs1927/bin/sortA < test1 > out1
The resulting output now appears in the file out1. If you don't care to see the output at all, you can redirect the output to /dev/null
~cs1927/bin/sortA < test1 > /dev/null
Let us now time the sorting of the data file using the Unix time command.
% time ~cs1927/bin/sortA < test1 > /dev/null real 0m0.107s user 0m0.036s sys 0m0.021s
Note: We are only interested in the user time.
Alternatively, for large data sets that are too big to save , you could run your tests as follows:
./genRandom 100000 | time ~cs1927/bin/sortA > /dev/null
Use the genRandom program to generate data sets. As the random data generated will be in different orders each time, it is very important that we run the experiment many times and take the average for each data set size. For example we would need to run say 5 experiments containing 1000 samples each , another 5 containing 10000 samples each and another 5 containing 100000 each etc . Use the genOrdered to create ordered data of sizes 1000, 10000 and 100 000 etc. Note: genOrdered uses math.h so compile like
gcc -Wall -Werror -O -o genOrdered genOrdered.c -lm
Write your own program to generate reverseOrdered data, or use the linux sort command.
man sortNote:These sizes are just a guide, if the measured times are too small etc you might need bigger data sizes. For example user times of say 0.004s are not very meaningful as they are too small. Use the Unix time command to complete the first three columns of the following table. Create an open office spreadsheet file called timing.ods and enter the values of the user times for the table in this file so that you can show them to your tutor and submit the file at the end of the lab.
If possible do not delete the random test files, you will need them again to use with the other sorting programs.
Run |
1 000 |
10 000 |
100 000 |
etc... |
1 |
||||
2 |
||||
3 |
||||
4 |
||||
5 |
||||
Average |
Create similar tables for ordered and reverse ordered data
Create a graph of your averages. The graph should be a scatter graph of size vs time
Repeat these experiments for other sorting algorithms.
Write a short paragraph explaining conclusions can you draw about the run-time complexity of the algorithms from your results.
Devise another test to further identify the algorithms. Please include a description of the data you used and explain how this helped you differentiate between the algorithms. You may assume that the algorithms have been implemented on arrays in the standards ways we looked at in lectures.
This exercise is good practice for the prac exam
In the filelists.c
implement the function to perform an
insertion sort on a list. You should sort items into ascending order.
link insertionSort(link list);
This can be described as follows:
You may write additional static helper functions
On your own, write the function isMinHeapOrderedin the file isMinHeapOrdered.cthat decides whether a tree-based structure is in min heap-order. In other words, whether it is in heap order where smaller items have a higher priority. For this question you do not need to check that the tree is complete. Write some tests to check your function. Note: This is a simple question from a past exam.
On your own, write a function isMinHeap in the file isMinHeap.c that decides whether a given array-based heap is in min heap-order. You must assume we have an array-based implementation where the array at index 0 is not used and that the items in the 'heap' begin at index 1.
1927 classrun 16x1 give lab04 timing.ods conclusions.txt lists.c isMinHeapOrdered.c isMinHeap.c