Learning, Measurement and Representation

Reference: Bratko chapter 18

To provide a framework and some general terminology for discussing some of the issues in machine learning.
Keywords: connectionism, function approximation algorithms, hypothesis language, learning program, machine learning, noisy data in machine learning, observation language, supervised learning, symbolic learning algorithms, unsupervised learning, classes in classification tasks
  • programs that learn
  • observation and hypothesis languages
  • symbolic learning algorithms
  • function approximation algorithms


A learning program is one that is capable of improving its performance through experience. Given a program, P, and some input, x, a normal program would yield the same result

P(x)  =  y

after every application. However, a learning program can alter its initial state q so that its performance is modified with each application. Thus, we can say

P(x | q) = y

That is, y is the result of applying program P to input, x, given the initial state, q.

The goal of learning is to construct a new "initial" state, q', so that the program alters its behaviour to give a more accurate or quicker result. In the case of supervised learning, to construct q', one needs a set of inputs xi and corresponding target outputs zi (i.e. you want P(xi) = zi when learning is complete.)

L(P, q, ((x1,z1),... ,(xn,zn))) = q'

Thus, one way of thinking about what a learning program does is that it builds an increasingly accurate approximation to a mapping from input to output: P(xi | q') will be closer to zi than P(xi | q) was.

With unsupervised learning the program tries to find regularities and interesting patterns in the data without ever being told what the "right" answer is.

Categorisation problem

The most common learning task is that of acquiring a function which maps objects, that share common properties, to the same class value. This is the categorisation problem.

Observation & Hypothesis Languages

The program must be able to represent its observations of the world, and it must also be able to represent hypotheses about the patterns it may find in those observations. Thus, we may sometimes refer to the observation language and hypothesis language. The observation language describes the inputs and outputs of the program and the hypothesis language describes the internal state of the learning program, which corresponds to its theory of the concepts or patterns that exist in the data.

The input to a learning program consists of descriptions of objects from the universe and, in the case of supervised learning, an output value associated with the example. The universe can be an abstract one, such as the set of all natural numbers, or the universe may be a subset of the real world.


No matter which method of representation we choose, descriptions of objects in the real world must ultimately rely on measurements of some properties or attributes of those objects. These may be physical properties such as size, weight, colour, etc or they may be defined for objects, e.g. the length of time a person has been employed, for the purpose of approving a loan. The accuracy and reliability of a learned concept depends heavily on the accuracy and reliability of the measurements.


A program is limited in the concepts that it can learn by the representational capabilities of both observation and hypothesis languages. For example, if an attribute/value list is used to represent examples for an induction program, the measurement of certain attributes and not others clearly places bounds on the kinds of patterns that the learner can find. The learner is said to be biased by its observation language. The hypothesis language also constrains what may be learned.


We will divide our attention between two different classes of machine learning algorithms that use distinctly different approaches to the problem of representation:

Symbolic learning algorithms
learn concepts by constructing a symbolic expression that describes a class of objects. We consider algorithms that work with representations equivalent to propositional / first-order logic (starting with ID3, an inductive decision tree system).

Function approximation algorithms
include connectionist and statistical methods. These algorithms are most closely related to traditional mathematical notions of approximation and interpolation and represent concepts as mathematical formulae. Our main focus here will be on the backpropagation learning algorithm for feedforward neural networks.


Learning algorithms can be assessed in terms of:

* speed of learning

* speed of classification of the resulting function

* degree of generalization ability (do they get the right answer for cases that were not in their training sets)

Summary: Learning, Measurement and Representation
We have discussed the characteristics of a program that learns, distinguishing between supervised and unsupervised learning. We introduced the concepts of observation language and hypothesis language, discussed measurement and representation, and made the distinction between symbolic learning algorithms and function approximation algorithms.
Learning algorithms may be evaluated in terms of speed of learning, speed of computation once learning has occurred, and degree of generalization ability.

CRICOS Provider Code No. 00098G
Copyright (C) Bill Wilson, 2002, except where another source is acknowledged. Much of the material on this page is based on an earlier version by Claude Sammut.