These notes are formulated for Prolog programming, for students taking COMP9414 Artificial Intelligence at the University of New South Wales, Sydney, Australia, but many of the ideas can be applied to programming in functional or procedural languages.
Testing your program is a necessary part of program creation.
Except for trivial programs, testing can never prove that a program is correct. What testing does is to try to find errors in your code. No matter how many errors you find and fix, there may be more. Still, tested code is better than untested code.
min(A, B, B) :- A > B. min(A, B, A) :- A <= B.To ensure that your procedure works, you should try it out on examples of all the likely scenarios. For example, with
min, the
three likely categories of data are where A > B, where A = B, and
where A < B. So you should do at least one test case for each of
these:
min(1, 2, X)? min(3, 3, X)? min(5, 4, X)?and make sure it gives the correct answer for each.
When you believe that min is working correctly, you can
start testing the procedure that calls min.
When you are dealing with lists, replacing an item in a list, say, then again you should look for a range of critical situations - for example:
If your procedure has more than one rule in it, then you should make sure that your testing covers each rule. In particular, if you write a recursive procedure, then you should make sure that you cover the base case or cases, and the recursive case or cases.
In response to any iProlog prompt (": "), you can type
trace!. This has the effect of turning on tracing of
Prolog execution. This can sometimes be helpful in figuring out
what your program is doing that it shouldn't be doing. Traces can
however produce more output than one can easily work through, so
it's usefulness is limited.
There is an example of tracing in iProlog here.
To turn tracing off, type notrace! in response to a
iProlog prompt.
mytests say - and then, if your program
is in the file myprog.pl, you can do the Unix command
prolog myprog.pl <mytests
Then Prolog will load the code in mycode.pl and then treat
the contents of mytests as if you typed the contents of
mytests into Prolog interactively. All that will be printed,
however, will be the output of the queries. If you can figure out
which query caused which output, then that's OK. If not, then you
can add extra code to mytests so that Prolog prints a message
from time to time so you can tell where it is up to. For example,
write('Testing min now'), nl,
write('First test, min(1,2,X)?'), nl,
min(1,2,X)?
nl means print a "newline" - that is terminate the current
line of output and move to the next line. [If you're wondering how
write and nl fit into a logic-language
like Prolog, it's like this:
write and nl are built-in Prolog predicates
that can be thought of as always succeeding, and which have the
side-effect of printing something in the window from which you
are running Prolog.]
You can also insert writes and nls into your rules
in order to keep track of what's going on, though to a large extent the
trace does this sort of thing for you when turned on.
% Version of factorial(N, FactorialN) with a bug in it.
factorial(0, 1).
factorial(N, Result) :-
write("Entering recursive rule for factorial with N = "), write(N), nl,
Nminus1 is N - 1,
factorial(Nminus1, Nminus1Factorial),
Result is N * Nminus1Factorial.
Try copying and pasting this code into Prolog, and then typing the
goal factorial(5, Result)? and see what happens. Can you
figure out the bug? Solution - but have a go
yourself, first!
Some people prefer to copy and paste individual tests into the prolog interpreter rather than repeatedly running all of the tests.
Don't forget to retest your code when/if you move it from your home computer to one of the School of Computer Science and Engineering's computer systems. "It worked at home" is a not uncommon cry of anguish: we test your program on the machines in CSE.
Sometimes a program works for the boundary cases (or is meaningless for the boundary cases) but not for realistic, non-boundary cases. In this case, "sanity checking" can help. Basically, this means looking at the output thoughtfully and trying to work out whether it makes sense. It is hard to pin down what "makes sense" means, though it often involves "back of an envelope" type calculations. The following non-programming example may help clarify the idea: a newspaper reported in 2009, in passing, that 130,000 people die in Australia each year (in total). The population of Australia in 2008 was about 21 million, and the average lifespan in Australia is around 80 years. One version of the sanity check involves using the 21 million and the 80 years figures to work out independently how many people die in Australia each year: 21,000,000 / 80 = 262,500. Oops, that's nowhere near 130,000. Alternatively, you can take the 21 million population figure and the newspaper's claimed 130,000 deaths per year, and calculate the average lifespan: 21,000,000 / 130,000 = about 161.5 years. I don't think so! So the newspaper has it wrong.
If your code passes a sanity check, it doesn't mean the answer it produces is correct. It might still be wrong, just not very wrong. If your code fails a sanity check, then it likely is wrong!
In fact, for repeated testing of code as you modify it and
add extra features (or for testing that your code works on more
than one platform - e.g. at home and at work/university), the
only reasonable thing to do is to use automated testing. A
simple approach to this is to write predicates with names like
test1, test2, test3, ...
and another one called runtests.
runtests :- test1, test2, test3, ... .The individual tests might look like this (using as an example a test to show that a procedure called
factorial
does something sensible - recall that 4! = 24):
test1 :-
not(factorial(4, 24)),
write('Fails factorial(4, 24)'),
nl.
If factorial(4, 24) fails, the not() clause
succeeds, so the failure message is printed. And so on for
test2, test3, etc. To run the tests, issue
the single query
?- runtests.An error message (as programmed by you in
test1,
test2, test3, etc.) will be printed
for each test that fails.
UNSW's CRICOS Provider No. is 00098G
Last updated: