In many subareas of AI a significantly increased interest
in empirical evaluations has been found in recent years.
This tendency opposes the kind of research papers which
were often found in earlier AI publications,
where largely ideas and perspectives where
presented and demonstrated using a toy domain.
While it seems a healthy development to scrutinise the validity of new ideas and claims more closely, it is also important not to lose sight of what we want to achieve with our research.
For the engineering approach to AI we can roughly state our research objective as to find ways for developing (more effectively) systems which perform certain intelligent tasks.
Arguably, the ideal thing would be a bootstrap system which learns to do any task it is put to on its own. However, this seems out of the question for the foreseeable future. Consequently, we are interested in techniques which make the life of the human system developer easier.
Whether a given technique serves this purpose depends on at least the following factors (besides the technique itself): the human system developer(s); the nature of the task; the system developer's understanding of the nature of the task.
Unfortunately, the factors mentioned above do not provide very solid grounds for conducting reliable empirical studies with regard to what technique serves best the purpose. The critical factors are too ill-defined to give us a good handle on it.
Besides the mentioned factors there is another difficulty which sets AI apart from most other sciences: the class of tasks we want to have techniques for is extremely diverse: indeed, ideally we want, for instance, a learning technique which is capable to learn anything - which includes the acquisition of all the scientific knowledge mankind has acquired and will acquire in the future. The trouble is that the different scientific disciplines have emerged because they were found to be different from other disciplines in important ways, such as with regard to the ontology of the discipline, the key concepts, to the research methodology, implicit underlying assumptions, etc. If we do not integrate all this information into the learning technique it seems extremely unlikely that it could work well across the range of disciplines. As a consequence, the learning technique would need to be tailored to each domain.
What is then left for more generally applicable research is essentially the development of tools and techniques which make the tailoring easier.
If we accept that, what is then the role of empirical studies in AI? At least the following questions can be addressed by empirical studies:
The first question can in theory be answered
analytically because we would usually expect our systems
to function deterministically. Even if there should be an
element of chance involved from an outside source
we may be able to model all interesting scenarios in principle.
However, such an analysis is usually not feasible in practice. So it is
in many cases much more sensible to run algorithms on data and see
how they perform.
However, we are usually not really interested in the numbers we obtain from the experiments which may compare the performance of multiple techniques on some benchmarks. Since in many areas we cannot reasonably hope to find a single master technique which can be applied to all tasks, we are rather interested in an improved understanding of why and under what circumstances certain techniques perform in the way they do.
Such an improved understanding requires suitable concepts to characterise our experimental settings and to formulate our conclusions in regard to where a certain technique is most suitably applied, etc. Indeed, empirical research can and should help a great deal in developing a finer and more suitable conceptual framework which will allow us to characterise the applicability of techniques as well as to classify application domains.
Opposed to the first question, the second question on the kind of scenario
which we face in real applications is truly empirical in nature.
There is no way that we could think up all the conditions of real
Here, field studies are
required which tell us about typical features
This is a particularly difficult type of research, as again, we lack largely
the suitable concepts to describe our findings.
We often don't know whether two application domains are similar
in important respects or not.
I.e. we don't know whether they can be handled
by the same technique which we have in mind.
The lack of understanding the differences and commonalities of application domains can be blamed for many of the disappointments in AI. The typical experience of finding substantial problems when we try to scale-up a technique which was successful in a toy-domain demonstrates that we do not know enough about the differences between the toy-domain and the larger domain.
It may be a misconception after all to believe that the toy-domain is somehow representative for a large class of domains.
To alleviate this situation, we need to better understand our application domains. This understanding has to rely on empirical studies but must also concentrate on the development of new and more suitable concepts to describe the findings.
Finally, the third question addresses the psychology of the application developer. This question resembles to an extent the questions in software engineering about the suitability of programming languages etc. It requires us to study the human psychology in regard to the task the application developer has to do. This seems to be very difficult as it is very expensive to have a sensible number of controlled studies of how system developers would cope with a certain technique. Furthermore, there is continuous change in the nature of the applications to be developed. As a consequence, we cannot be sure how reliable findings from previous studies are. Can they safely be carried over to new application domains? Maybe certain aspects of the new domains are more difficult to articulate or to understand in the first place. This may result in unexpectedly poor performance with a technique, e.g. a certain programming language, which worked fine before. For example the development of graphical user interfaces would indeed by difficult to do in COBOL or Pascal.
After having posed so many questions, let us now turn to the papers of this workshop. The workshop brings experiences and views from a variety of angles together, which will hopefully advance our understanding of how to do better research in AI: to better understand how and where to do empirical studies.