Program Testing

These notes are formulated for Prolog programming, for students taking COMP9414 Artificial Intelligence at the University of New South Wales, Sydney, Australia, but many of the ideas can be applied to programming in functional or procedural languages.

Testing your program is a necessary part of program creation.

Except for trivial programs, testing can never prove that a program is correct. What testing does is to try to find errors in your code. No matter how many errors you find and fix, there may be more. Still, tested code is better than untested code.

Designing Tests

If you write a Prolog procedure to help some other procedure, test the helper procedure in isolation before you try it out inside the procedure it is helping. For example, suppose you want to be able to determine the maximum of two numbers. Suitable Prolog code is:
min(A, B, B) :- A > B.
min(A, B, A) :- A <= B.
To ensure that your procedure works, you should try it out on examples of all the likely scenarios. For example, with min, the three likely categories of data are where A > B, where A = B, and where A < B. So you should do at least one test case for each of these:
min(1, 2, X)?
min(3, 3, X)?
min(5, 4, X)?
and make sure it gives the correct answer for each.

When you believe that min is working correctly, you can start testing the procedure that calls min.

When you are dealing with lists, replacing an item in a list, say, then again you should look for a range of critical situations - for example:

If your procedure has more than one rule in it, then you should make sure that your testing covers each rule. In particular, if you write a recursive procedure, then you should make sure that you cover the base case or cases, and the recursive case or cases.


In response to any iProlog prompt (": "), you can type trace!. This has the effect of turning on tracing of Prolog execution. This can sometimes be helpful in figuring out what your program is doing that it shouldn't be doing. Traces can however produce more output than one can easily work through, so it's usefulness is limited.

There is an example of tracing in iProlog here.

To turn tracing off, type notrace! in response to a iProlog prompt.


As you develop a program, you are likely to have to test it several times, as you find each bug and try to fix it. It can be quite time-consuming to repeatedly type in the test queries over and over again. You can automate this to some extent by putting your queries into a file - call it mytests say - and then, if your program is in the file, you can do the Unix command

prolog <mytests

Then Prolog will load the code in and then treat the contents of mytests as if you typed the contents of mytests into Prolog interactively. All that will be printed, however, will be the output of the queries. If you can figure out which query caused which output, then that's OK. If not, then you can add extra code to mytests so that Prolog prints a message from time to time so you can tell where it is up to. For example,

write('Testing min now'), nl,
write('First test, min(1,2,X)?'), nl,

nl means print a "newline" - that is terminate the current line of output and move to the next line. [If you're wondering how write and nl fit into a logic-language like Prolog, it's like this: write and nl are built-in Prolog predicates that can be thought of as always succeeding, and which have the side-effect of printing something in the window from which you are running Prolog.]

You can also insert writes and nls into your rules in order to keep track of what's going on, though to a large extent the trace does this sort of thing for you when turned on.

% Version of factorial(N, FactorialN) with a bug in it.
factorial(0, 1).
factorial(N, Result) :-
	write("Entering recursive rule for factorial with N = "), write(N), nl,
	Nminus1 is N - 1,
	factorial(Nminus1, Nminus1Factorial),
	Result is N * Nminus1Factorial.

Try copying and pasting this code into Prolog, and then typing the goal factorial(5, Result)? and see what happens. Can you figure out the bug? Solution - but have a go yourself, first!

Some people prefer to copy and paste individual tests into the prolog interpreter rather than repeatedly running all of the tests.

Don't forget to retest your code when/if you move it from your home computer to one of the School of Computer Science and Engineering's computer systems. "It worked at home" is a not uncommon cry of anguish: we test your program on the machines in CSE.

Sanity Checks

Sometimes a program works for the boundary cases (or is meaningless for the boundary cases) but not for realistic, non-boundary cases. In this case, "sanity checking" can help. Basically, this means looking at the output thoughtfully and trying to work out whether it makes sense. It is hard to pin down what "makes sense" means, though it often involves "back of an envelope" type calculations. The following non-programming example may help clarify the idea: a newspaper reported in 2009, in passing, that 130,000 people die in Australia each year (in total). The population of Australia in 2008 was about 21 million, and the average lifespan in Australia is around 80 years. One version of the sanity check involves using the 21 million and the 80 years figures to work out independently how many people die in Australia each year: 21,000,000 / 80 = 262,500. Oops, that's nowhere near 130,000. Alternatively, you can take the 21 million population figure and the newspaper's claimed 130,000 deaths per year, and calculate the average lifespan: 21,000,000 / 130,000 = about 161.5 years. I don't think so! So the newspaper has it wrong.

If your code passes a sanity check, it doesn't mean the answer it produces is correct. It might still be wrong, just not very wrong. If your code fails a sanity check, then it likely is wrong!

Automating (Re-)Testing

In fact, for repeated testing of code as you modify it and add extra features (or for testing that your code works on more than one platform - e.g. at home and at work/university), the only reasonable thing to do is to use automated testing. A simple approach to this is to write predicates with names like test1, test2, test3, ... and another one called runtests.

runtests :-
  test3, ... .
The individual tests might look like this (using as an example a test to show that a procedure called factorial does something sensible - recall that 4! = 24):
test1 :-
   not(factorial(4, 24)),
   write('Fails factorial(4, 24)'),
If factorial(4, 24) fails, the not() clause succeeds, so the failure message is printed. And so on for test2, test3, etc. To run the tests, issue the single query
?- runtests.
An error message (as programmed by you in test1, test2, test3, etc.) will be printed for each test that fails.

Bill's Wilson's Contact Info

UNSW's CRICOS Provider No. is 00098G
Last updated: