Lab Exercises Week 4

Objectives

Assessment

Deadline: 11:59pm Tue 12 January 2016

Total Marks: 4 ( A chance to make up an extra mark for missed labs )

Introduction

In the first part of the lab you will perform an empirical analysis of two sorting algorithms. You can split up into groups of 2 or 3 to do this task.

For parts 2 and 3 of the lab you will do it on your own. If you want to start the lab in advance at home it is better to start with parts 2 and 3 and actually run the timing experiments for part 1 in the lab class. You can still prepare for part 1 by reading the instructions and making sure you can generate data sets.

Warning: to be able to compare your timing results for part 1, you must use the same (or same type) of computer to gather all your timing data. If you do half on your own computer and the rest on the cse machines, you will not be able to compare your results. In other words you need to use one type of machine such as a computer from the leaf lab or the spoons lab etc or the wagner server or your own computer etc for the whole task. Note: It will not matter if you change machines within the tuba or drum lab etc.

Setting Up

Change into your lab04 directory and run the following command:

cp /home/cs1927/public_html/16x1/labs/lab04/files/* .

Note the "." at the end of the command, meaning "into the current directory".

If you've done the above correctly, you should find the following files now in the directory:

genOrdered.c

main program for generating ordered string data

genRandom.c

main program for generating random string data

lists.c

A list implementation that you will need to implement the insertion sort algorithm in

lists.h

A list interface that does not need to be modified

isMinHeapOrdered.c

A stub file for you to write your function isMinHeapOrdered

isMinHeapOrdered.h

A tree based heap interface that should not be changed

testIsMinHeapOrdered.c

A test file to test your implementation of isMinHeapOrdered. You should add some of your own tests

isMinHeap.c

A stub file for you to write your function isMinHeap

isMinHeap.h

An array based heap interface that should not be changed

testIsMinHeapOrdered.c

A test file to test your implementation of isMinHeap. You should add some of your own tests

Part 1. Sorting detective: Empirical Analysis of unknown sorting algorithms - 2 Marks

Please write all your results for this section in and open office spreadsheet file called timing.ods and write your answers in a file called conclusions.txt

To run open office spreadsheet application type at command line

soffice -calc

This can be done in pairs or groups of 3.

For this part of the lab you are to run experiments to distinguish between three unknown sorting algorithms. These can be run by typing

~cs1927/bin/sortA

~cs1927/bin/sortB

~cs1927/bin/sortC

We know that the sorting algorithms are either insertion sort, naive bubble sort, bubble sort with early exit, selection sort or bogo sort (you may need to research bogo sort to find out what it is and how it behaves).

The programs will read in strings separated by newlines until it reaches the endo of file or Ctrl^d.

For example you could run the program as follows and manually type in input and then press control d.

~cs1927/bin/sortA
zzzzz
aabzz
cccdd
qwere
ZZZZZ
ctrl^d

Now examine the code in genRandom.c.

Notice that the error messages are printed on standard error using fprintf(stderr, ...) so that you can redirect the output to a file without missing the error messages appearing on the screen. The C function fprintf() is similar to printf() except that the output is sent to the specified file. In this case, the output file is the standard error stream which by default is the terminal on which the program is run. This means that the output would normally appear along with the rest of the output of your program. However, note that using Unix redirection (via >) will channel the standard output of your program to a file, the output printed on standard error will still appear on the screen (and is not channelled to the file by redirection). printf(...) is the same as fprintf(stdout, ...).

Test the program on a small input to make sure that it functions correctly.

./genRandom 5
alqbd
mzuag
zuaks
pqxoa
dpsdp


Once you are convinced that the program is working correctly, use it to generate test data. The following example shows you how to generate 1000 samples and have them placed in a file called test1.

% ./genRandom 1000 > test1

% more test1
asdfe
poiua
zzzxe
aaior
fdsdq
... (rest of output deleted)

Let us now use the test file we have created (test1 in the example above). Try the following:

~cs1927/bin/sortA < test1

You will see the data followed by the sorted output scroll quickly past on the screen. If you want to save the output to a file, you can use Unix redirection as follows.

~cs1927/bin/sortA < test1 > out1

The resulting output now appears in the file out1. If you don't care to see the output at all, you can redirect the output to /dev/null

~cs1927/bin/sortA < test1 > /dev/null

Let us now time the sorting of the data file using the Unix time command.

% time ~cs1927/bin/sortA < test1 > /dev/null

real    0m0.107s
user    0m0.036s
sys     0m0.021s

Note: We are only interested in the user time.

Alternatively, for large data sets that are too big to save , you could run your tests as follows:

./genRandom 100000 | time ~cs1927/bin/sortA > /dev/null

Use the genRandom program to generate data sets. As the random data generated will be in different orders each time, it is very important that we run the experiment many times and take the average for each data set size. For example we would need to run say 5 experiments containing 1000 samples each , another 5 containing 10000 samples each and another 5 containing 100000 each etc . Use the genOrdered to create ordered data of sizes 1000, 10000 and 100 000 etc. Note: genOrdered uses math.h so compile like

gcc -Wall -Werror -O -o genOrdered genOrdered.c -lm

Write your own program to generate reverseOrdered data, or use the linux sort command.

man sort
Note:These sizes are just a guide, if the measured times are too small etc you might need bigger data sizes. For example user times of say 0.004s are not very meaningful as they are too small. Use the Unix time command to complete the first three columns of the following table. Create an open office spreadsheet file called timing.ods and enter the values of the user times for the table in this file so that you can show them to your tutor and submit the file at the end of the lab.

If possible do not delete the random test files, you will need them again to use with the other sorting programs.

Run

1 000
Random

10 000
Random

100 000
Random

etc...

1

2

3

4

5

Average

Create similar tables for ordered and reverse ordered data

Create a graph of your averages. The graph should be a scatter graph of size vs time

Repeat these experiments for other sorting algorithms.

Write a short paragraph explaining conclusions can you draw about the run-time complexity of the algorithms from your results.

Devise another test to further identify the algorithms. Please include a description of the data you used and explain how this helped you differentiate between the algorithms. You may assume that the algorithms have been implemented on arrays in the standards ways we looked at in lectures.

Part 2. Insertion Sort on Linked Lists - 1 Mark

This exercise is good practice for the prac exam

In the file lists.c implement the function to perform an insertion sort on a list. You should sort items into ascending order.
link insertionSort(link list);

This can be described as follows:

You may write additional static helper functions

Part 3.Heaps - 1 Mark (0.5 mark each)

On your own, write the function isMinHeapOrderedin the file isMinHeapOrdered.cthat decides whether a tree-based structure is in min heap-order. In other words, whether it is in heap order where smaller items have a higher priority. For this question you do not need to check that the tree is complete. Write some tests to check your function. Note: This is a simple question from a past exam.

On your own, write a function isMinHeap in the file isMinHeap.c that decides whether a given array-based heap is in min heap-order. You must assume we have an array-based implementation where the array at index 0 is not used and that the items in the 'heap' begin at index 1.

Submission

When you are happy with your work, please show it to your tutor to get it marked. Before you leave your lab, remember to submit your lab via the give command

1927 classrun 16x1 give lab04 timing.ods conclusions.txt lists.c isMinHeapOrdered.c isMinHeap.c