## Program Testing

These notes are formulated for Prolog programming, for students taking COMP9414 Artificial Intelligence at the University of New South Wales, Sydney, Australia, but many of the ideas can be applied to programming in functional or procedural languages.

Testing your program is a necessary part of program creation.

Except for trivial programs, testing can never prove that a program is correct. What testing does is to try to find errors in your code. No matter how many errors you find and fix, there may be more. Still, tested code is better than untested code.

### Designing Tests

If you write a Prolog procedure to help some other procedure, test the helper procedure in isolation before you try it out inside the procedure it is helping. For example, suppose you want to be able to determine the maximum of two numbers. Suitable Prolog code is:
```min(A, B, B) :- A > B.
min(A, B, A) :- A <= B.
```
To ensure that your procedure works, you should try it out on examples of all the likely scenarios. For example, with `min`, the three likely categories of data are where A > B, where A = B, and where A < B. So you should do at least one test case for each of these:
```min(1, 2, X)?
min(3, 3, X)?
min(5, 4, X)?
```
and make sure it gives the correct answer for each.

When you believe that `min` is working correctly, you can start testing the procedure that calls `min`.

When you are dealing with lists, replacing an item in a list, say, then again you should look for a range of critical situations - for example:

• What happens if the item to be replaced is not in the list?
• What happens if the item to be replaced is the first item in the list?
• What happens if the item to be replaced is the last item in the list?
• What happens if the item to be replaced is in middle of the list?
• What happens if the item to be replaced occurs more than once in the list?
• What happens if the list is empty? [If the empty list is an allowable argument.]
• What happens if the list is long? You may have an algorithm that works, in the sense that given enough time, it will compute the correct thing, and when you run it on a list of 3 or 4 items it seems to be fine, but if you run it on a list of 30 items, it might take 372 years to complete the computation, when a better algorithm would complete the computation (correctly :-) in less than a microsecond. Don't laugh - somebody handed an assignment like that in 2004. Solution: try it on longer lists!
A common reason for problems like this is making a recursive call in a rule before you perform a goal or goals that tests something that might make the recursive call unnecessary. There is more information on this under "efficiency" in the Topic index of the Prolog dictionary.

If your procedure has more than one rule in it, then you should make sure that your testing covers each rule. In particular, if you write a recursive procedure, then you should make sure that you cover the base case or cases, and the recursive case or cases.

### Tracing

In response to any `iProlog` prompt (": "), you can type `trace!`. This has the effect of turning on tracing of Prolog execution. This can sometimes be helpful in figuring out what your program is doing that it shouldn't be doing. Traces can however produce more output than one can easily work through, so it's usefulness is limited.

There is an example of tracing in iProlog here.

To turn tracing off, type `notrace!` in response to a iProlog prompt.

### Tips

As you develop a program, you are likely to have to test it several times, as you find each bug and try to fix it. It can be quite time-consuming to repeatedly type in the test queries over and over again. You can automate this to some extent by putting your queries into a file - call it `mytests` say - and then, if your program is in the file `myprog.pl`, you can do the Unix command

`prolog myprog.pl <mytests`

Then Prolog will load the code in `mycode.pl` and then treat the contents of `mytests` as if you typed the contents of `mytests` into Prolog interactively. All that will be printed, however, will be the output of the queries. If you can figure out which query caused which output, then that's OK. If not, then you can add extra code to `mytests` so that Prolog prints a message from time to time so you can tell where it is up to. For example,

```write('Testing min now'), nl,
write('First test, min(1,2,X)?'), nl,
min(1,2,X)?
```

`nl` means print a "newline" - that is terminate the current line of output and move to the next line. [If you're wondering how `write` and `nl` fit into a logic-language like Prolog, it's like this: `write` and `nl` are built-in Prolog predicates that can be thought of as always succeeding, and which have the side-effect of printing something in the window from which you are running Prolog.]

You can also insert `write`s and `nl`s into your rules in order to keep track of what's going on, though to a large extent the `trace` does this sort of thing for you when turned on.

```% Version of factorial(N, FactorialN) with a bug in it.
factorial(0, 1).
factorial(N, Result) :-
write("Entering recursive rule for factorial with N = "), write(N), nl,
Nminus1 is N - 1,
factorial(Nminus1, Nminus1Factorial),
Result is N * Nminus1Factorial.
```

Try copying and pasting this code into Prolog, and then typing the goal `factorial(5, Result)?` and see what happens. Can you figure out the bug? Solution - but have a go yourself, first!

Some people prefer to copy and paste individual tests into the prolog interpreter rather than repeatedly running all of the tests.

Don't forget to retest your code when/if you move it from your home computer to one of the School of Computer Science and Engineering's computer systems. "It worked at home" is a not uncommon cry of anguish: we test your program on the machines in CSE.

### Sanity Checks

Sometimes a program works for the boundary cases (or is meaningless for the boundary cases) but not for realistic, non-boundary cases. In this case, "sanity checking" can help. Basically, this means looking at the output thoughtfully and trying to work out whether it makes sense. It is hard to pin down what "makes sense" means, though it often involves "back of an envelope" type calculations. The following non-programming example may help clarify the idea: a newspaper reported in 2009, in passing, that 130,000 people die in Australia each year (in total). The population of Australia in 2008 was about 21 million, and the average lifespan in Australia is around 80 years. One version of the sanity check involves using the 21 million and the 80 years figures to work out independently how many people die in Australia each year: 21,000,000 / 80 = 262,500. Oops, that's nowhere near 130,000. Alternatively, you can take the 21 million population figure and the newspaper's claimed 130,000 deaths per year, and calculate the average lifespan: 21,000,000 / 130,000 = about 161.5 years. I don't think so! So the newspaper has it wrong.

If your code passes a sanity check, it doesn't mean the answer it produces is correct. It might still be wrong, just not very wrong. If your code fails a sanity check, then it likely is wrong!

### Automating (Re-)Testing

In fact, for repeated testing of code as you modify it and add extra features (or for testing that your code works on more than one platform - e.g. at home and at work/university), the only reasonable thing to do is to use automated testing. A simple approach to this is to write predicates with names like `test1`, `test2`, `test3`, ... and another one called `runtests`.

```runtests :-
test1,
test2,
test3, ... .
```
The individual tests might look like this (using as an example a test to show that a procedure called `factorial` does something sensible - recall that 4! = 24):
```test1 :-
not(factorial(4, 24)),
write('Fails factorial(4, 24)'),
nl.
```
If `factorial(4, 24)` fails, the `not()` clause succeeds, so the failure message is printed. And so on for `test2`, `test3`, etc. To run the tests, issue the single query
```?- runtests.
```
An error message (as programmed by you in `test1`, `test2`, `test3`, etc.) will be printed for each test that fails.

Bill's Wilson's Contact Info

UNSW's CRICOS Provider No. is 00098G
Last updated: