William H. Wilson

Jian Guo

Ross W. Gayler

Janet Wiles

and J.E.M. Stewart

Chapter 7: pages 363-415 in K.J. Holyoak and J. Barnden (editors) Advances in Connectionist and Neural Computation Theory: Volume 2: Analogical Connections, Norwood, NJ: Ablex, 1994.

Sections |
Section 1 |
Section 1.1 |
Section 1.2 |
Section 1.3 |
Section 1.4 |
Section 1.5 |
Section 1.6 Section 2 | Section 2.1 | Section 2.2 Section 3 Section 4 | Section 4.1 | Section 4.2 | Section 4.3 | Section 4.4 Section 5 | Section 5.1 | Section 5.2 | Section 5.3 Section 6 Section 7 Section 8 | Section 8.1 Section 9 Section 10 Section 11 = Conclusions References |

Figures
| Figure 7.1 | Figure 7.2 | Figure 7.3 | Figure 7.4 | Figure 7.5 | Figure 7.6 | Figure 7.7 | Figure 7.8 | Figure 7.9 | Figure 7.10 | Figure 7.11 | Figure 7.12 | Figure 7.13 | Figure 7.14 | Figure 7.15 | Figure 7.16 |

* We are grateful to Murray Maybery for very stimulating discussion of some of the issues addressed in this chapter.

There is now a reasonable amount of consensus that an analogy entails a mapping from one structure, the base or source, to another structure, the target (Gentner, 1983, 1989; Holyoak & Thagard, 1989). Theories of human analogical reasoning have been reviewed by Gentner (1989), who concludes that there is basic agreement on the one-to-one mapping of elements and the carry over of predicates. Furthermore, as Palmer (1989) points out, some of the theoretical differences represent different levels of description rather than competing models. Despite this consensus about the central role of structure mapping, it really only treats the syntax of analogies, and there are also important pragmatic factors, as has been pointed out by Holland, Holyoak, Nisbett, and Thagard (1986) and Holyoak and Thagard (1989), However in this chapter we are primarily concerned with the problem of how to model the structure mapping or syntactic component of analogical reasoning in terms of parallel distributed processing (PDP) architectures.

According to Gentner (1983), attributes are not normally mapped
in analogies, and only certain relations are mapped, the selection
being based on systematicity, or the degree to which relations enter
into a coherent structure. Gentner (1983) defines an attribute as a
predicate taking one argument, whereas a relation is a predicate taking
two arguments. Strictly, this only covers binary relations; in general, a
relation is a predicate taking two or more arguments, so ternary relations
have three arguments, quaternary relations four arguments, and so on. For
our purposes a predicate is essentially an *N*-place relation; it can be
defined as a *N*-place function from the Cartesian product of the *N* sets to
the set {T,F}. This includes unary relations, which are predicates with
one argument, and are equivalent to attributes in Gentner's terms. Our
derivations based on relations can be applied to functions.

There are a number of process models of human analogical reasoning (Anderson & Thompson, 1989; Bakker & Halford, 1988; Falkenhainer, Forbus, & Gentner, 1990; Holyoak & Thagard, 1989; Rumelhart, 1989). The only one of these models that could be described as connectionist is the Analogical Constraint Mapping Engine (Holyoak & Thagard, 1989). ACME can solve a wide range of analogical reasoning tasks, including water flow/heat flow, logical union, arithmetic addition, solar system/atomic structure, and a number of convergence analogies used by Gick and Holyoak (1980). The structure of base and target are represented as predicate-argument bindings, coded in predicate calculus. A parallel constraint satisfaction algorithm finds the mapping which maximizes the correspondence between source and target. This is achieved by inhibitory connections between rival mappings (tending to ensure that mappings are one-to-one), and by excitatory connections which tend to ensure that if a predicate is mapped, its arguments are mapped, and vice versa.

ACME uses local rather than distributed representations. Connectionist models based on distributed representations have the potential to yield additional advantages of graceful degradation and graceful saturation, together with the emergent properties that result from superposition of representations. We will also argue that PDP models provide a principled mechanism that has the potential to account for processing capacity limitations, and this is an issue that models of analogical reasoning must address if they are to be psychologically realistic.

The concept of information processing capacity has a complex history in cognitive psychology, and a number of conceptual and methodological difficulties have arisen. We will briefly review the history and status of the concept. The modern concept of capacity originated with Broadbent (1958) who argued for a single information channel of limited capacity, and Miller (1956) who argued that human beings could process a limited number of "chunks," but that variable amounts of information could be contained in each chunk. These early approaches were modified by later workers such as Moray (1967) who postulated a limited-capacity central processor, and Kahneman (1973) who proposed a limited pool of attentional resources that could be allocated to one or more tasks.

Further developments of this notion were undertaken by Norman and Bobrow
(1975), who developed the performance operating characteristic, a curve
which expressed performance on one task as a function of performance
on the other. Capacity (sometimes call resource) limitations were
indicated when there was a tradeoff between the tasks, such that
increased performance on *task ^{1}*
was accompanied by decreased performance
on

There has also been growing consensus in recent years that there is more than one resource pool (Baddeley, 1986; Wickens, 1980). It is clear that there are a number of specialized resources, but it is also possible that there is a general purpose processor that has the function of relating other processes to one another, or to the organism's goals. Baddeley (1986) postulated a limited capacity central executive, and Schneider and Detweiler (1987) postulated a limited capacity channel between areas.

Most studies within this tradition have not attempted to place a quantitative value on the amount of information that can be processed. One exception is the work of Miller (1965), who noted that humans seem restricted to short-term memory spans of about seven items (plus or minus two) and also seem restricted to about seven categories in absolute judgments. However, close examination of this intriguing commonality did not lead to a clearly definable factor determining capacity. In short-term memory span the number of items seemed to be approximately constant, irrespective of information value whereas absolute judgments are a function of information value. The commonality became even less plausible when further investigation showed span of absolute judgments varied according to the number of dimensions of stimuli and the experience of the participants (Attneave, 1959; Garner, 1962). Moreover memory span was shown to be a function of word length and rehearsal rate (Baddeley, Thomson, & Buchanan, 1975; Case, Kurland, & Goldberg, 1982; Schweickert & Boruff, 1986; Standing, Bond, Smith, & Isely, 1980).

The most enduring contribution from Miller's paper was the concept of a chunk, which was a single unit or module of information that could vary in size. Short-term memory could hold a restricted number of chunks, but the total information stored could be increased by increasing the information content of each chunk. Thus, if binary digits were recoded as octal digits, people could recall approximately three times as many (e.g., if they store 6 octal digits, they can be decoded as 18 binary digits). Simon (1974) proposed that growth in span with age might be due to increasing the amount of information in each chunk. The restriction on short-term memory seemed to be in the number of independent items, rather than on the amount of information.

Schneider and Detweiler (1987) argue that because working memory entails numerous cortical regions, each comprising numerous modules, the capacity of working memory is much greater than the "magical number seven" suggested by Miller (1956). There is, however, another type of limitation that arises with the architecture they propose. This is in the number of items, or "messages" that can be transferred from one module to another. Consistent with contemporary PDP models, Schneider and Detweiler (1987) postulate no central executive, but there are central connections which transfer information between modules or between regions. Schneider and Detweiler argue, on the basis of a literature review, that only about four modules can be transferred concurrently.

Evidence for this came originally from work by Broadbent (1975) who analyzed the way items were grouped in recall, and found the largest group tended to consist of three items. He suggested that Miller's "magical number seven plus or minus two" might have resulted from two systems, each with a capacity of three or four items. Broadbent conducted several further experiments which provided converging evidence for his conclusion that we process three or four items concurrently, and Schneider and Detweiler (1987) review further research which points to the same value, including the working memory studies of Baddeley and Hitch (1974).

There is corroborative evidence from studies additional to those reviewed by Schneider and Detweiler (1987). Fisher (1984) found that adults processed four items (range three to five) in parallel in the visual search task, which entails similar representational requirements to the memory scanning task. Halford, Maybery, and Bain (1988) found the capacity of primary or active memory to be four to five items for adults, and two to three items with 8-9-year-olds. These lines of evidence collectively suggest that adults process four independent items of information, or four chunks, concurrently, and that the number is less for children.

Attempts to develop parallel distributed processing models of analogical reasoning raise the question of how much information can be processed in parallel. The number of chunks that can be processed is relevant to memory tasks, but it cannot be related directly to reasoning. As we will see, the complexity of concepts can be quantified in terms of the number of dimensions, or independent units of information, required to define the concept. Therefore, we will consider the relation between chunks and dimensions.

There does not appear to be any empirical evidence concerning the number of dimensions that can be processed in parallel in a cognitive task, comparable to the evidence on limitations in the number of chunks. However, a pointer to the likely capacity for processing dimensions can be obtained by comparing dimensions to chunks. There are several ways in which chunks may be compared with attributes on dimensions.

A chunk is a unit of information which has a certain coherence, in the sense that its components are to a large extent indivisible. The chunk CAT cannot be decomposed into its components "C," "A," "T" and remain the same chunk. Remove one of the letters from CAT and it is no longer the same chunk. Attributes on a dimension are units of information which share this same property of coherence. The attribute "red" on the color dimension is a unit of information in the sense that it cannot be decomposed and remain the same unit. It might be countered that chunks and attributes on dimensions may have fuzzy boundaries, but there would still be limits to the number of features that can be changed without the unit losing its identity.

Each chunk has a unique content; if a chunk has the content CAT, it cannot have an alternative content such as DOG. We can think of chunks as like slots each of which can be filled in one and only one way, at any one time. Attributes within one dimension are similar in that they are in complementary distribution; if the color dimension has the value "red," then it cannot have another value such as "green."

The content of one chunk is independent of the content of other chunks; if the first chunk has the content CAT, the second chunk can have other values, which are logically independent of the content of the first chunk. Similarly, the attributes of different dimensions are independent. If the color dimension has the value "red," the shape dimension can have values such as "square," "round," etc., in a way that is logically independent of the values of the color dimensions.

Chunks are of arbitrary size, and their status does not depend on their
information value. For example, words have a much higher information
content than letters, which have a higher information content than
digits. This is true both intuitively and in terms of the information
theory definition of information, where the information value, *H* =
log_{2}*N*, where *N* is
the number of equally likely alternatives. Yet a word,
a letter, or a digit constitute one chunk. The information value of an
attribute on a dimension varies depending on the number of
alternative values. Therefore, an attribute on a dimension does not
specify the amount of information, so it is a unit of information of
arbitrary size.

There are, therefore, a number of parallels between chunks and dimensions. Both are independent, coherent, units of information of arbitrary size. The comparison serves to draw our attention to a paradox in the nature of chunks. Because chunks may be of arbitrary size (alphanumeric characters, words, sentences), there is no severe restriction on the amount of information that they may contain. We can increase the total amount of information processed by increasing the amount of information that is packed into each chunk. Yet, although there is little or no restriction on the total amount of information, there is a severe restriction on the number of units or chunks. The restriction is therefore in the number of separate units, rather than in the actual amount of information processed. It seems then that there is limited independence in the system, so that no matter how much information we process, it can only be divided into a small number of independent units.

Dimensions resemble chunks in that they are independent, because an attribute on any one dimension is independent of attributes on other dimensions as we have seen. A value on a dimension, like a chunk, is an independent item of information of arbitrary size. If the limiting factor is the number of independent units, it should apply to both chunks and dimensions. If we hypothesize that the limitation in the number of dimensions is like the limitation in the number of chunks, we can extrapolate from research on chunks to provide an estimate of the number of dimensions that can be processed in parallel. The best estimate of the number of independent dimensions that can be processed in parallel would be four, the integer central tendency of the empirical estimates in the last section.

Our next problem is to consider why such an apparently paradoxical limitation should exist. An answer is suggested by the nature of the representations in PDP models, as we will see in the next section.

In a PDP representation information is represented as a pattern of activation over a large number of units, normally expressed in theories as a vector. Given the number of regions in working memory architecture, as discussed earlier, it is likely that more than one vector can be activated, but predicate-argument binding requires that the units comprising the vectors be interconnected. The problem is to estimate the number of vectors that can be processed in parallel. It is possible that we can do this by comparing vectors to chunks. In principle there is no upper limit to the size of a vector, so there is no limit to the amount of information that can be represented. A vector of arbitrary precision can represent an arbitrary large amount of information, but because the units are richly interconnected, there is a sense in which the vector represents one item. This suggests that a vector is somewhat like a chunk, in that both comprise independent items of information of arbitrary size. The limitation in the number of chunks implies a restriction on the number of independent vectors that can be represented concurrently. We will attempt to develop this idea in the formulation which follows. If correct it means that PDP models provide a natural basis for capacity limitations applying to chunks and dimensions as discussed in the last section.

Capacity limitations, in a somewhat different sense, have been known to
apply to distributed models for some time. They have been recognized in
distributed memory models (Anderson, 1973; Willshaw, 1981), in cognitive
models (McClelland, 1986), and in Hopfield nets (Amit, 1989; Hopfield,
1982). These limitations occur because, as more items are stored on the
same set of units, weights, the items become less discriminable. Anderson
(1973) shows that, for his model, the
*signal-to-noise ratio* = *N*/*K*,
where *N* is the number of units and *K* is the number of items.
The number of items that can be stored with clarity maintained can be
increased by increasing the number of units. That is, capacity is a
function of the number of units used to represent an item. Capacity has
also been used to refer to the number of alternative representations
over the same set of units (Plate, 1991).

Serial processing models (Newell & Simon, 1972; Anderson, 1983) have not incorporated principled ways of explaining capacity limitations, which is probably one reason why the concept has not received all the attention its importance would seem to deserve. Serial models are based on algorithms in which tasks are segmented into steps that are processed serially, either in a conventional program, or as a production system. More difficult tasks tend to be broken into larger numbers of steps, so difficulty is associated with solution time, but there is no upper limit on capacity to perform.

Applied to the human performer, programs are essentially equivalent to strategies. People can perform a task if and only if they possess an adequate strategy. More difficult tasks simply take longer. Failures are never attributed to overloading capacity, but only to lack of the required strategy. Therefore psychological theories based on the serial processing metaphor tend to downplay the role of capacity.

By contrast with serial processing models, PDP models have a natural interpretation of capacity limitations. We will develop the hypothesis that PDP models provide a natural explanation for empirical data suggesting that humans can process only a limited number of chunks or dimensions in parallel. We will relate this to analogical reasoning through a complexity metric based on dimensionality of structures. Complexity metrics will be discussed in this next section.

If we are to relate capacity to task performance we require a way of assessing task complexity. Two complexity metrics have been developed in the cognitive psychology literature. One complexity metric for reasoning tasks is based on the number of levels of embedding in a subroutine (or goal) hierarchy. It is sensitive to the number of discs on a Tower-of-Hanoi puzzle (Egan & Greeno, 1974) and has also been used by Case (1985). This metric is effective, but has certain difficulties. One is that subroutine hierarchies are not intrinsically constrained, and different hierarchies, with consequent different complexity values, can be obtained by rewriting the program used to model the performance. It has also been argued by Rumelhart and McClelland (1986) that "subroutines ... are probably not a good way to view the operation of the brain" (p. 135). This metric can lead to anomalies when we try to think of complexity in terms of the amount of processing that is performed at each step. The number of levels of embedding in a subroutine hierarchy can be reduced by processing more information in parallel, at each step, but tasks that require a lot of information to be processed in parallel may be harder. Therefore, a metric is needed that takes account of both parallel and serial processing.

Another complexity metric has been used by Leeuwenberg (1969), Restle (1970), and Simon and Kotovsky (1963) to quantify complexity of a wide range of patterns (including spatial patterns, and auditory patterns such as melodies). Simon (1972) has reviewed theories of pattern complexity based on this metric, and has shown that they share common principles. They all assume that people encode patterns in terms of elements and relations, and that they organize the representation in a hierarchic phrase structure. The complexity of a pattern is related to the number of code symbols in the representation. That is, complexity is related to the number of independent symbols required to generate the pattern.

A related metric, that is specifically concerned with concept complexity, was developed by Halford and Wilson (1980) and Halford (1987), and is based on levels of mathematical structures. The essence of the metric is that the complexity of a concept depends on its dimensionality, or the number of independent units of information required to define the concept. The units are of arbitrary size. This is related to the number of arguments in a predicate. A unary relation has only one argument, and therefore has only one dimension of variation. A binary relation has two arguments, and two dimensions of variation, because the values of the arguments can vary independently. Similarly, a ternary relation has three dimensions, and a quaternary relation four dimensions, and so on.

This idea might seem unusual if we are accustomed to thinking of
arguments as lists, that can be processed sequentially. However, each
argument in a relation defines a dimension of variation. The number of
dimensions depends on the *"-arity"* of the relation. A unary
relation on a set *S* is a subset of *S*. A binary relation on
*S* is a subset *S* × *S* of ordered pairs of elements of
*S*. Similarly, a ternary relation on *S* is a set *S*
x *S*×*S* of ordered 3-tuples of *S*, and so on. Each
argument can take a number of different values, so the number of arguments
defines the number of dimensions of variation. Therefore the argument sets
are not simply lists, but provide a measure of the degree of complexity
in the structure.
The number of arguments defines the types of relationships that can exist
within a structure. For example, a unary relation, *R*(*x*)
(e.g., BIG(dog)
defines an attribute). Because it has a single dimension of variation,
it is not possible to specify the way an attribute varies as a function
of another attribute. This becomes possible with binary relations,
*R*(*x*,*y*) (e.g., BIGGER(horse,dog)), which can
express the way variations
in the size of a horse or dog would affect the relation. With ternary
relations, *R*(*x*,*y*,*z*),
it is possible to define the way one attribute
varies as a function of one other, and how it varies as a function of
two others, the latter not being possible with binary relations. As the
"-arity" of a relation increases, so do the orders of interaction that
are possible. This is a most important feature of predicates for our
purposes, because in memory (Humphreys, Bain, & Pike, 1989), in cognitive
development (Halford, in press), and in the present context of analogical
reasoning, we find it necessary to model the orders of interaction among
the dimensions of a task.
Functions are a special case of relations; in functions
mappings are unique. A zero-variate function, *f*( ) = *a* (e.g.,
PRIME-MINISTER-IN-1990() (=Hawke)), is equivalent to a symbolic constant. It
has no argument, but it has one dimension in the sense defined above,
because the range is a source of variation, but is not conventionally
counted as an argument. Univariate functions have one argument and
two dimensions. A univariate function *f*(*a*) = *b*,
is a set of ordered
pairs (*a*, *b*) such that
for each *a* there is precisely one *b* such that
(*a*,*b*)∈*f*. Both *a* and *b* contribute sources
of variation, which is why a
univariate function is two-dimensional; in effect, it is equivalent in
complexity to a binary relation. In a similar way, bivariate functions,
f(a,b)=c have two arguments and three dimensions, and so on.
A unary operator has two arguments, and is a special case of a univariate
function (e.g., the unary operator CHANGE SIGN comprises the set of
ordered pairs (*x*,-*x*)).
Binary operations have three arguments, and have
the same dimensionality as ternary relations and bivariate functions. (A
binary operation on a set *A* is a function from the set
*A*×*A* of ordered
pairs of elements of *A* into *A* (i.e., *A*×*A* → *A*). A bivariate function is
defined as *f*:*A*×*B* → *C*.)
The number of arguments in quaternary relations corresponds to the
number of arguments in trivariate functions and in compositions of
binary operations. For example, the composition of binary operations
of the form *a*(*b* + *c*) = *d* is
defined as the set of ordered 4-tuples {(3,2,4,18),...,(2,7,3,20),...}.
The number of arguments is related to the dimensionality of a structure
because it represents the number of independent terms required to define
the structure, and the orders of interaction within the structure. In
the context of analogical reasoning, the complexity of a structure
mapping task can be related to the dimensionality of the structures
being mapped. This will be considered in the next section.

We will define a structure as a set of elements, with a set of functions or relations defined on the elements. A structure-mapping is a rule for assigning elements of one structure to elements of another, in such a way that any functions, or relations between elements of the first structure correspond to functions, or relations in the second structure. In an idealized sense, structure mappings in analogies are isomorphisms (Holyoak & Thagard, 1989) though in practice the mappings might be incomplete (e.g., if one structure contains more elements than another).

A structure-mapping based on functions is valid if it is commutative, as Halford and Wilson (1980) and Holland et al. (1986) have pointed out. The concept of a commutative diagram originated with category theory (Maclane, 1972), and is different from commutativity in arithmetic. The essence of it is shown in Figure 7.1.

Suppose we have a source structure consisting of elements
*S _{1}*, and

m_{2}
°
f_{s}
=
f_{t}
°
m_{1}
| (1) |

Where structure-mappings are based on relations, the corresponding diagram is as shown in Figure 7. 1. The criterion for validity is the same as that provided by Suppes and Zinnes (1963) and Coombs, Dawes, and Tversky (1970) for representations in measurement theory.

A system α = (A,R) is said to berepresentedby another systemP= (B,S), if there exists a FunctionffromAintoB(which assigns to eachxinAa uniquef(x) inB) such that for allx,yinA

x R y implies f(x) S f(y).
| (2) |

Thus, α is represented by β if there exists a correspondencefthat mapsAintoBin such a way that if the relationRholds between somexandyinAthen the relationSholds betweenf(x) andf(y) inB, wheref(x) andf(y) are the images ofxandyrespectively. (Coombs et al., 1970, p. 11)

**Figure 7.1.
Structure Mappings Based on Functions (Figure 7.1a)
and Relations (Figure 7.1b).**

Coombs et al. (1970) refer to systems, but the same criterion can be applied to structures, where a structure is defined as a set of elements with one or more relations.

In general, valid structure mappings required that predicates in one
structure correspond to predicates in the other. Correspondence is
defined in the following way: a predicate *P* in structure 1 is
in correspondence with predicate *P*' in structure 2 if and only if
the arguments of *P* are mapped into those of *P*' and vice
versa.

As Holyoak and Thagard (1989) point out, process models approximate these requirements, which are treated as constraints which may be violated occasionally in the interests of satisfying other constraints. In the interests of generality, it is worth noting that valid representations are defined in a similar way to structure mappings in analogies (Halford & Wilson, 1980; Holland et al., 1986; Palmer, 1978). A representation is a mapping from a representing to a represented structure. A cognitive representation is a mapping from a cognitive structure to a structure in the world. An analogy is a mapping from one cognitive representation to another.

We can define several levels of mapping, corresponding to the arity of predicates as discussed earlier. The levels also correspond to levels of structure mapping as defined in the cognitive developmental theory of Halford (1987). Furthermore, they can be related to the number of vectors in a PDP representation, as we will see. They are summarized in Figure 7.2.

** 2.1.1. Mappings Based on One-Place Predicates.** These
correspond to element mappings as defined by Halford (1987). We can
express this type of mapping as:

*M*: *R*(*e*) ↔ *R*(*e*'* _{i}*)

It is a mapping between two structures each of which is based on a unary relation. The elements, e., of one structure are mapped into the elements of the other structure, so that the unary relations, R correspond. Because unary relations are predicates with one argument, they correspond to attributes in Gentner's (1983) terms, so mappings based on one-place predicates are validated by attribute similarity. Similarity can be treated in terms of shared features (Hesse, 1966; Tversky, 1977), but Holyoak and Thagard (1989) have shown that in structure mapping models it is sufficient to assume a numerical index of similarity between two predicates. To the extent that metaphors are based on attributes rather than relations (which appears to be true in some cases, Gentner, 1983), mappings at this level correspond to metaphors.

** 2.1.2. Mappings Based on Two-Place Predicates.** These
correspond to relational mappings as defined by Halford's (1987)
developmental theory. They are defined as:

*M*: *R*(*e _{i}*,

The elements in structure 1 are mapped into the elements in structure 2 so that the binary relations in the two structures correspond. At this level there need not be any similarity between elements in the structures, and mappings are validated by similarity of binary relations. Simple proportional analogies of the form A:B::C:D belong to this level. For example, the analogy woman:baby::mare:foal is validated by the similar relation, "mother-of" in source and target.

**Figure 7.2.
Levels of Structure Mappings, with Tensor Product Representation.
**

** 2.1.3. Mappings Based on Three-Place Predicates.**
These correspond to

**Figure 7.3.
System Mappings.
**

System mappings are validated by structural correspondence. The
mapping in Figure 7.3B is invalid because it violates the correspondence
property. Notice that *R* is mapped into *R*'
on two occasions, and into
the inverse of *R*' on the third occasion. Similarity is not logically
necessary, although it may facilitate mapping, as in case of high
transparency (Gentner & Gentner, 1983), or may make it easier to recognize
the possibility of mapping (Holyoak & Koh, 1987). Both the correspondence
and consistency criteria apply to system mappings. System mappings can
be defined as:

The most general expression of the structures in system mappings
is as a ternary relation, but they may also be expressed as a binary
operation,
*e _{i}* *

** 2.1.4. Mappings Based on Four-Place Predicates.**
These correspond to multiple system mappings in Halford's (1987) theory
They are defined as:

Elements are mapped so that quaternary relations correspond. The quaternary relations may be interpreted as compositions of ternary relations, binary operations, or bivariate functions. Like system mappings, multiple system mappings are validated by structural correspondence, and are independent of element or relational similarity. They differ only in the complexity of the structures represented.

In theory mappings based on predicates with more than four arguments are possible, but it is difficult to attach any psychological meaning to such structures, because human processing capacity limitations appear to prevent structures of higher dimensionality being mapped.

The complexity metric for structures can be used to investigate human ability to process structure. The argument was made earlier that humans would be able to process a maximum of our dimensions in parallel. A prediction that can be derived from the theory of complexity is that tasks which require more information to be processed in parallel impose higher demands on processing resources (Kahneman, 1973). Higher level structure mappings require more dimensions to be processed in parallel, and therefore should use more of the available resources, which should produce more interference with a competing secondary task.

To test the prediction that humans process only four dimensions
in parallel, we need a task in which information from four dimensions
must be considered jointly This can be achieved by asking participants
to interpret statistical interactions. A three-way interaction (with 3
independent variables and one dependent variable) is a set of ordered
4-tuples {(*A*,*B*,*C*,*N*), ... }.
If the dimensions are processed in parallel,
and the number of dimensions is limited to 4, it should be possible to
map three-way but not four-way interactions.

Consider a statistical interaction such as that shown in Figure 7.4. Interpreting the interaction entails mapping a mental representation of the figure into a verbal or semantic representation, and therefore includes a structure mapping task. Accordingly, we surveyed academic staff and graduate students in psychology, education, statistics, and economics who were experienced in interpreting statistical interactions. They were asked, assuming clear presentation in an appropriate figure, and not being concerned with scaling effects or nonlinearity, what is the highest level of interaction they could interpret unambiguously, and with confidence. Ten answered two-way, 14 answered three-way, and 6 answered four-way. The integer central tendency is three-way, consistent with the prediction that up to four dimensions can be mapped.

In a further series of experiments, as yet unpublished, we presented participants with graphical representations and verbal descriptions of 3- and 4-way interactions. Bar graphs were used with bars that consisted of movable strips, so participants could restructure the figure to optimize presentation of the interaction for their own purposes. The graphical and verbal representations were isomorphic, but used different labels. Participants' task was to map one into the other (i.e., decide which verbal label mapped to each graphical label).*

*In the actual experiment bar graphs were used because they facilitated rearranging the presentation to suit individual subjects, but the principle of the task is the same as shown in Figure 7.4. Participants were required to map the labels (e.g., A1/A2; B1/B2. . ) into the verbal labels in the description which follows (e.g., decide which of A1,A2,...,D2, should be mapped to A-grade, which to B-grade, which to G-rated, etc.). A sample verbal description was: The judged attractiveness of films depends on four factors-their grade, studio-origin, genre, and rating. In the description below, the influence that any one of these variables has on the judged attractiveness of films is referred to as "an effect" or "the effect. " When A-grade, Paramount films are G-rated films, they are judged as more attractive when they are adventure than when they are A-grade, MGM films are G-rated films, they are judged as more attractive when they are adventure than when they are comedy, and the reverse is true when A-grade, MGM films are R-rated films. On the other hand, when B-grade, Paramount films are G-rated films, they are judged as more attractive when they are adventure than when they are comedy and this is also true when B-grade, Paramount films are R-rated films-overall they are judged as more attractive when they are G-rated. However, when B-grade, MGM films are G-rated films, they are judged as more attractive when they are adventure than when they are comedy, and there is a larger difference than when B-grade, MGM films are R-rated films. The solution is at the end of this Chapter.

**Figure 7.4.
Four-Way Statistical Interaction, as Used in Problem Described in 3.0.
**
Attributes A1, A2, ..., D2 are factors which determine an effect,
measured by numerical values on the ordinate.

Everything was done to constrain participants to process the interactions in parallel (e.g. they were told to try and see the problem as a whole, not to test hypotheses serially, etc.). Results so far indicate that the mapping can usually be made with three-way, but rarely with four-way interactions. Given that a three-way interaction is a four-dimensional structure, and four-way interactions are five-dimensional structures, this is consistent with the hypothesis that humans can map four- but not five-dimensional structures. Although presentation was two-dimensional, four-dimensional structures were mapped, so mapping was not linked to presentation in any direct or simple fashion. The number of dimensions mapped also exceeds the number that can be visualized, suggesting that the limitation is not directly related to our ability to visualize in three dimensions.

The prediction that higher level mappings entail higher processing loads has been tested by assessing loads in tasks in which level of mapping was varied, holding other factors constant. Maybery, Bain, and Halford (1986) compared N-term series (transitive inference) tasks which require premise integration with closely matched tasks where premises could be processed separately Premise integration entails mapping two relational premises (e.g., John is taller than Tom, Mike is taller than John) into an ordered set, and is a system mapping. By contrast, when premises are processed singly (e.g., when verifying the truth of the separate premises, John is taller than Tom, Mike is taller than John), only one binary relation is processed at-a-time, and the task is equivalent to a relational mapping. Probe reaction time was used to assess information processing loads, and was significantly lengthened when the probe coincided with premise integration. Comparison with the condition where premises were processed singly showed that the effect was due to the amount of information processed in parallel.

Halford, Bain, and Maybery (1984) tested the prediction that the multiple system mapping task imposed a higher processing load than the system mapping task. They presented adult participants with arithmetic problems such as:

(7 [ ] 3) / 4 = 1 | (system-mapping) |

(7 [ ] 3) [ ] 4 = 1 | (multiple system-mapping) |

It is assumed that these problems are solved by mapping them into actual arithmetic expressions, such as (7 - 3) / 4 = 1. In the system-mapping task only one binary operation (indicated by square brackets) must be found, because the other operation is supplied, and the expression can be converted to (7 [ ] 3) = 4. In the multiple-system problem, two binary operations (subtraction and division) must be found. Because two binary operations must be mapped, the task is equivalent to a multiple system mapping. It is not possible to solve for one first then decide the other, because there is no way of knowing you are right about one operation until you know what both of them are. Because the conditions of the task tended to prevent extensive serial search, there was a press to process the operations in parallel. A concurrent task, requiring both retention and rehearsal of a short-term memory load, was used as a secondary task to assess processing load. Interference with the secondary task was greater for the multiple system-mapping task indicating that it imposed a higher processing load, as predicted by structure-mapping theory.

We want to try and develop an approach to structure mapping that takes account of the capacity limitations of human processors, and can explain the higher processing loads entailed by structures of higher dimensionality. As suggested before, PDP architecture is probably more appropriate for this purpose than serial processing models, or models which use local representations. However, first we must solve the problem of how to treat structures in PDP representations. This essentially entails representing predicate-argument bindings.

In both ACME and COPYCAT, representations are local in Smolensky's (1990) sense. That is, each element of the representation is coded as a single unit, rather than as a set of features which are distributed over a number of units. In COPYCAT codelets build structures representing such things as relations between elements. In the ACME model predicates and arguments are represented using predicate calculus. However, a genuine PDP model really requires distributed representations, otherwise the advantages of stability, graceful degradation, and graceful saturation of PDP models are lost. Another reason for favoring PDP models is that there are now successful distributed memory models of human memory (e.g., Humphreys et al., 1989), and it may be possible to integrate models of human analogical reasoning with human memory models to some extent. The problem then is how to represent predicate-argument bindings as distributed representations. This problem, and the related problem of representing variable-constant bindings, has received considerable attention in the recent literature.

Hinton (1981) represents a predicate and its argument(s) by a set of vectors, and the predicate-argument binding is represented by a "PROP" vector, which represents the conjunction of the predicate and argument vectors. A predicate-argument binding can be created by adjusting the weights between predicate-argument vectors and the PROP unit. Pattern completion, or recovery of a missing predicate or argument, can be performed using backward links from the PROP unit, but these links have to be learned.

Murdock (1983) suggests that bindings between
relationships and arguments can be represented as convolutions
of vectors. Thus, if *R* is a vector representing the
relationship, and *A1* and *A2* are representing the
arguments, then the relationship-argument binding can be represented
by *R**(*A1***A2*) or *A1**(*R***A2*),
where "*" represents the convolution operation. A vector stored in the
convolution can be recognized by a statistical decision procedure.

Smolensky (1990) has proposed that the variable-binding
problem can be handled in terms of tensor products. The essence of
this approach is shown in Figure 7.5. In a distributed representation, a
variable is represented as a vector *x* on a vector space
*V _{R}*. A constant that
is bound to the variable is also represented as a vector

**Figure 7.5.
**

We will extend Smolensky's approach to variable binding to handle the predicate-argument binding problem. We shall describe our applications of these ideas in terms of the tensor product terminology, noting that it can all be transformed into explicit neural networks. In the exposition which follows, we will assume familiarity with Smolensky's (1990) approach.

The tensor product encoding of predicate-argument bindings will be used in a model of analogical reasoning, which incorporates the levels of structure discussed earlier. We call this the Structural Tensor Analogical Reasoning (STAR) model.

The tensor product represents the predicate that a
particular variable has a particular value. The tensor
*x*⊗*c* represents the
predicate HAS-VALUE(*x*,*c*), where *x* is a
variable and *c* is a constant. As
Smolensky's paper is directed at variable binding, it is natural
that the tensor product is interpreted in this way. However, there is
nothing canonical about this interpretation, and so tensor products
can be validly interpreted as representing the binding of a predicate
to an argument. Therefore, a rank 2 tensor product may be interpreted
as the binding of a predicate to a single argument. We will generalize
the approach to deal with predicates of one or more arguments.

A predicate with a single argument can be interpreted as a zero-variate function or unary relation. It can be represented as a tensor product of two vectors, one representing the predicate and one representing the argument. A predicate with two arguments can be interpreted as a univariate function, or a binary relation. It can be represented as a rank 3 tensor product, with one vector representing the predicate and two representing the argument. Similarly, a predicate with three arguments, which can be interpreted as a bivariate function, a binary operation, or a ternary relation, can be represented as a rank 4 tensor product, with one vector representing the predicate and three vectors representing the arguments. Similarly, a predicate with four arguments can be represented by a rank 5 tensor product. These representations correspond to the levels of structure mapping defined earlier, and are summarized in Figure 7.2.

It should be noted that if the predicate can be taken as fixed or constant, then it can be represented by a tensor product of rank 1 lower than specified above. For example, if a predicate with two arguments is constant, than it is sufficient to represent it as the tensor product of two vectors, representing the two arguments. The additional vector representing the predicate becomes unnecessary because it would always have the same value, and is not required to constrain the activations in the other vectors. This represents a degenerate case which we simply note in passing, and it plays no part in further developments in this chapter. The cases with which we are concerned all entail multiple predicates being represented on the same structure (e.g., the relations "larger than", "mother of," etc., can all be superimposed on the same set of units representing the tensor product of three vectors).

The representation of multiple binary relations
is shown if Figure 7.6. It comprises a tensor product of rank 3:
*V _{arg1}*⊗

**Figure 7.6.
Visualising a rank 3 tensor product net
which also has three nodes in each component
space.**

At any time, one of the vectorsu,v, andwwill act as an output and the other two as inputs for the net. In this figureuis shown as output, andvandwas inputs. Parts (a)-(c) show connections: e.g. the middle component ofu,u_{2}, is connected to all the nodes in the central horizontal plane in part (a), and its output value is the sum of all their outputs. Part (d) is a synthesis of parts (a)-(c), but omits connections, and shows activations in the tensor product nodes. Part (e) shows a rank 2, 3 and 4 nets in a symbolic representation used in this chapter.

**Figure 7.7.
Internal organisation of a node in a tensor product network
**

Tensor product nets operate in two modes, teaching mode and retrieval mode. This figure illustrates these modes in a rank 3 tensor product net. In teaching mode, the net is presented with 3 input vectors (call themu,v, andw); the (i,j,k)-th tensor product node will take as inputs thei-th component ofu, thej-th component ofv, and thek-th component ofw. These inputs will be multiplied together and added to the activation value in the tensor product node. In retrieval mode, the net is presented with 2 input vectors (sayuandv;wwill then serve as the output vector). The (i,j,k)-th tensor product node will multiply together its activation value and itsuandvinputs (namely thei-th component ofuand thej-th component ofv) and output the result as a contribution to thek-th component ofw. The value of thek-th component ofwwill thus be the sum of the contributions from the (i,j,k)-th tensor product nodes for all values ofiandj.

This approach can be generalized to higher level
structures. A tensor product
*V _{arg1}*⊗

A specific example of a rank 4 tensor product would be
arithmetic addition. We would have one vector representing the addition
operation, and three vectors representing the arguments, which in this
case are the addends and the sum. For example, the representation that
3,5 → 8 under the operation of arithmetic addition would comprise a vector
representing addition, and vectors representing 3, 5, 8. The advantage
of this representation is that all combinations can be inferred (e.g.,
given that the operation is addition, and that 3 and an unspecified
number *x* gives 8, the solution to *x* can be found by the
pattern completion process.

**Figure 7.8.
Network Connectivity for a 3x3x3 Tensor Product Network**

Transitivity would be represented as a rank 4 tensor
product. Humans make a transitive inference based on the ternary relation,
which we will call
TRANSITIVELY-ORDERED-by-*R*(*a*,*b*,*c*).
To represent this
we need a vector representing the predicate TRANSITIVELY-ORDERED-by-*R*,
and
vectors representing each of the arguments, *a*, *b*, *c*.
This representation can
be decomposed into three rank 3 tensor products, representing *aRb*,
*bRc*, and *aRc*.
This can be done by entering a random vector, or a fixed vector,
into one rank of the tensor product, in a manner that is mathematically
equivalent to the procedure of Humphreys et al. (1989). For example,
if we enter a fixed vector in place of *c*, this effectively collapses
over *c*, leaving a representation of *aRb*.
Each rank 3 tensor product will
comprise a vector representing the relation *R*, with a pair of
arguments, such as *a* and *b*.

Smolensky's application of tensor product
representations for variable binding utilizes the stored
information in a function-like way: that is, the two-place predicate
*HAS*-*VALUE*(*x*, *y*) which he implicitly studies
can be viewed as a unary function *VALUE*(*x*) = *y*.
Smolensky's method of unbinding a variable *x* (that is, retrieving
its stored value *c*) depends on this functional property. In effect,
retrieval is done by presenting a pattern representing *x* to the
network, which responds with the pattern representing *c*. We need to
consider how patterns corresponding to specific predicates or arguments
can be retrieved from the representations we propose. For example, if
the relations of interest include one like *TALLER*-*THAN*,
then it is possible that

TALLER-THAN(Janet,Bill)

TALLER-THAN(Janet,Graeme)

might both be true. Presenting the pattern
representing *Janet* to the network would result in the output
of the sum of the patterns for *Bill* and for *Graeme*
(plus the patterns for any other Individuals *P* of which
*TALLER*-*THAN*(*Janet*, *P*) is true). It might
be difficult to extract *Bill*, say, from this output pattern. In
fact, provided the patterns representing *Bill*, *Graeme*,
and other such individuals *P* are orthogonal (in the sense of
the usual vector inner product), it is simple, in principle, to extract
*Bill* from the output of what Smolensky (1990, Section 3.1) calls
the exact unbinding procedure. This is done by computing the dot product
of the pattern for *Bill* with the output pattern. If the result
is nonzero, then *TALLER*-*THAN*(*Janet*, *Bill*)
is true, according to the information stored in the network. This process
is illustrated in Figure 7.9. The possible solutions are tested serially.

**Figure 7.9.
**

Predicate Retrieval from a Tensor Product Network.To show thatP(x,y) is true, presentPandxto the network. The output will be the sumsof all conceptspsuch thatP(x,p) holds. Connect this outputsto a network which computes the inner productzofswithy. Ifzis non-zero (non-white, in the diagram), thenP(x,y) is true.

It might seem natural to represent an
*N*-dimensional structure as a binding of *N* fillers and
*N* roles in the manner suggested by Smolensky (1990), as shown in
Figure 7.10. For example, the binary operation of arithmetic addition,
with mappings of the form *a* + *b* = *c*, could be
represented as a set of bindings to particular values for *a*,
*b*, and *c*. For example, if *a* = 3, *b* = 2, and
*c* = 5, as shown in Figure 7.9, then the tensor product binds the
values of the variables to the constants. However, this representation
does not permit the recovery of any vector, given the *N*−1
remaining vectors (e.g., it does not permit the recovery of the vector
representing *b* given the vectors representing *a* and *c*). We
regard this ability as essential to the most creative and dynamic use
of representations in reasoning. As we will see, it is essential to
analogical reasoning which will be considered in the next section.

**Figure 7.10.
Multiple Bindings as an Alternative Method for Storing Predicates**

Method: The bindings a=3, b=2 and c=5 are stored in the tensor product network in a structure in the sense of Smolensky (1990). Further bindings, such as a=2, b=4, c=6 would be stored as (distinct) structures in the same network. To store more than one type of predicate, one would add to each structure a binding such as pred=sum, where sum(a,b,c) signifies a+b=c. [In this figure, roles and fillers are shown as represented by nodes, for greater clarity of presentation. In a distributed version, roles and fillers would be represented by patterns.]

The representation we have chosen is consistent with
the distributed memory model of Humphreys et al. (1989). Their aim was
to model the three-way relations between context, cue, and target in
memory retrieval. For example, if asked "what did you eat for breakfast
on Sunday?" "What did you eat for breakfast?" is the cue, "on Sunday"
is the context, and (say) "bacon and eggs" is the target. Humphreys et
al. (1989) have shown that computation of three-way relations between
context, cue,and target requires a representation based on the tensor
product of three vectors. The same is true in the present context. For
example, given *a* + *b* = *c*, if we want to compute
*c*, given *a* and *b*, or *b*, given *a* and
*c*, etc., then each of *a*, *b*, and *c* must have
a separate representation, and the three-way relations between them must
be represented. A rank 3 tensor product achieves this, whereas some
alternatives, such as the role-filler representation, do not. Where
the operation is also a variable, an additional vector is required,
so arithmetic operations would normally be represented by a rank 4
tensor product.

The model of storing predicate/argument information described above provides a PDP framework for performing certain analogical reasoning tasks. We will begin with simple analogical reasoning tasks of the form a:b::c:d because these have been very common in the psychological literature, and they contain clear examples of the processes we wish to consider.

The particular analogy we will use for illustration
is *woman*:*baby*::*mare*:*?* (i.e., *woman*
is to *baby* as *mare* is to *what* - the approved
answer is *foal*). To solve such problems, it is necessary to
be able to represent multi-argument predicates (in this case, two
arguments) and to access the stored information in a number of different
manners. For example, given the predicate *P* and one argument
*a*_{1}, one needs to be able to
retrieve the/an entity *a*_{2},
such that *P*(*a*_{1},
*a*_{2}), and given *a*_{1}, *a*_{2},
it is necessary to be able to find a predicate *P* such that
*P*(*a*_{1}, *a*_{2}). The tensor product formalism allows both
these operations.

What follows is a basic symbolic version of our
model of simple analogical reasoning. We will illustrate the model by
solving the problem *woman*:*baby*::*mare*:*?*.
Subsequently we shall elaborate some parts of the model where
simplicity of exposition requires that gaps be left. On seeing
*woman*:*baby*, the reasoner associates with this pair the
binary predicate *MOTHER*-*OF*(—). On seeing the cue mare
for the second part of the problem, the reasoner places *mare*
in the first argument position for *MOTHER*-*OF*, and then
tries to retrieve the filler for the missing argument position in
*MOTHER*-*OF*(*mare*,—).

In our PDP framework, we have a rank 3 tensor product
*V _{arg1}*⊗

**Figure 7.11.
Steps in Simple Analogical Reasoning**

As foreshadowed above, there is a gap in this
description. There are more solutions than just MOTHER-OF(—,—) to the
problem of finding a relation ρ such that ρ(woman, baby). For
example, ρ could be LOVES, FEEDS, or LARGER-THAN. A related
problem is that the *arg2* output in the final step could be *colt*,
for example. However, the solution to ρ(woman, baby) need not be a single
predicate. At least in our society, all of LOVES, FEEDS, or LARGER-THAN
are consistent with motherhood, and even in other societies more than one
predicate would be consistent with motherhood. Therefore the solution
to ρ(woman, baby) can be a vector representing a predicate bundle,
BUNDLE-MOTHER-OF comprising the predicates that are consistent with
the mother-baby relation. In a similar way, foal would be a solution to
*X* in BUNDLE-MOTHER-OF(mare,*X*). The solution to *X*
might be an argument bundle representing *X* such that
BUNDLE-MOTHER-OF(mare, *X*) holds. It is also possible that,
at certain levels of description, concepts like motherhood could
be represented as the weighted sum of vectors representing *N*-place
predicates like "loves", "feeds", "larger than", "female", etc. Some
natural categories are based on prototypes, which are weighted averages of
attributes (one-place predicates), and there seems no reason in principle
why they should not be weighted averages of *N*-place predicates. However,
this issue goes beyond our present argument, and requires further
exploration.

A certain amount of ambiguity is inherent
in the syntax of analogies. For example, the analogy
*woman*:*baby*::*mare*:*?* could have
"rabbit" as a logically correct solution, because the relation
"larger-than" holds between *woman*:*baby* and also between
*mare*:*rabbit*. The fact that "foal" is considered a better
solution has more to do with pragmatics than with syntax. Foal might
be considered a better solution because "mother of" is a relation
that is more unique to the triplet woman-baby-mare than is "larger
than." Alternatively the favored solution might be considered more
interesting or novel. The fact that these additional criteria are
necessary highlights the importance of pragmatic factors, which are
a feature of the model of Holyoak and Thagard (1989) as mentioned
earlier. Such ambiguities cannot be resolved by purely syntactic criteria,
and given that it is our aim to model the syntactic aspects of analogies,
it is appropriate that our model should incorporate the ambiguities that
are inherent in the syntax of analogies.

It is possible however to remove the ambiguity by
the procedure mentioned previously for retrieving a predicate from a
network. If the output from the process above consists of a vector
which represents the sum of the possible solutions to the analogy
*woman*:*baby*::*mare*:*foal*, then the answer
*foal* can be recognized by computing the inner product of the
output and the vector representing foal. If the inner product, *k* is
nonzero, then *foal* is a solution. Furthermore, the larger *k*
is, the more relations are satisfied by the pair *mare*:*foal*
in common with the pair *woman*:*baby*. This might be taken
as a measure of salience of the solution. It is possible that the most
favored solution is the most salient in the sense that it satisfies
the most relations; we might prefer *foal* as a solution because
it fits not only the relation *mother of*, but also *loves*,
*feeds*, and *larger than*. As the model presently exists,
potential solutions must be processed serially.

The tensor product net simulation that we built
was of rank 3 and dimensions 6×16×16 - 6 different
binary predicate symbols and 16 different arguments for each
of Positions 1 and 2. Let us denote the tensor product space as
**U**⊗**V**⊗**W**.

Orthonormal bases for **R**^{6}
and **R**^{16} were found, and
checked to ensure that no basis vector contained a large number of zero
components (as such vectors would have violated our goal of a distributed
representation).

A set *S* of predicate instances of the form
-(*x*,*y*), such as:

loves(woman, baby)

mother-of(woman, baby)

feeds(woman, baby)

larger-than(woman, baby)

mother-of(mare, foal)

larger-than(mare, foal)

larger-than(mare, rabbit)

and so on, was chosen. The predicate symbols
*loves*, *mother of*, *feeds*, *larger than*, and
so on were mapped arbitrarily to basis vectors for **R**^{6} and the argument symbols *woman*,
*baby*, *mare*, *foal*, *rabbit*, and so one
were mapped arbitrarily to the basis vectors for **R**^{16} Let **u**(*P*) denote the vector
representing the predicate symbol *P* in **U**, and
let *v*(*x*) and *w*(*y*) denote the vectors
representing the argument symbols *x* and *y* in **V**
and **W**, respectively, so that for example **v**(*x*),
denotes the *j*^{th} component of
the vector representing the symbol *x* in **V**. Each node
*T _{ijk}*, of the tensor product
array was preset to the sum, over the chosen predicate instances

*T _{ijk}* =
Σ

In other words, the predicate instances in *S* were
stored in the network.

Any of **u**, **v**, and **w**
can be recovered from the other two and the tensor
*T _{ijk}*, for example:

**u**(*P _{i}*)
=
Σ

This representation can handle a number of
analogical problem formats. To solve our reference analogy problem,
*woman*:*baby*::*mare*:*?,* the patterns *v*(*woman*) and *w*(*baby*) were presented,
in Phase 1, to the **V**- and **W**- inputs of the network.
The **U**-output,
which would in this case be the sum of the patterns representing
*loves*, *mother of*, *feeds*, and *larger than*,
was saved, and re-presented as **U**-input
in phase 2, along with *mare* as **V**-input.
The **W**-output, say **w**, was then
interpreted. The interpretation phase is necessary because humans looking
at the W-output use a different representation of concepts like foal,
*woman* and *rabbit* from the 16-tuples used in this simulation;
this phase corresponds to *articulating* the answer
to the analogy problem. The
interpretation phase was implemented by forming the inner product of **w**
with each of the patterns **w**(*woman*), **w**(*baby*),
**w**(*mare*), **w**(*foal*), **w**(*rabbit*),
and so on, which form the
basis of the 16-dimensional space **W** This process yields "salience"
scores for each of the possible arguments as solutions to the analogy
problem. The argument with the highest salience score wins. In the present
state of the model, the potential solutions must be processed serially.

There does not appear to be any agreed-upon taxonomy of analogical reasoning tasks, but Figure 7.12 presents a simple classification which is useful to the argument. It divides analogies into six basic types, although this number could be doubled by reversing base and target, but these differ only trivially from those presented.

** 5.2.1. Complete Structure**. Both target and base
structures are complete at presentation, and the task is to find the
correct map between them. This case includes explanatory analogies, and
the SME model (Falkenhainer et al., 1990) and the ACME model (Holyoak &
Thagard, 1989) have been applied to some analogies of this type. Many of
the tasks that have been used to test these models, such as the water
flow/heat flow and Rutherford analogy, entail complex structures, with
many possible mappings between base and target, so much of the interest
has centered around devising psychologically realistic algorithms for
finding the best match.

** 5.2.2. Missing Object.** All target
elements are supplied, and all except one base element. The task is
to find the missing base element. The

** 5.2.3. Missing Relation.** In this case
a missing relation is inferred, given the objects. It is similar to,
but simpler than, the case in Section 5.2.4.

** 5.2.4. Missing Object and Relation.**
The target structure is complete, some base elements are supplied,
and the task is to find the missing object(s) and relation(s) in the
base. This case entails more creativity than the first two cases, because
new information is generated. An example would be if someone knew the
basic structure of the solar system, knew that an atom consisted of
a nucleus and electron(s), but did not know the relation between them
(did not know that electrons orbit around the nucleus). The person could
map nucleus into sun, electron into planet, and then recognize that the
relation "orbits around" applied to electron and nucleus. Some of the
tasks to which the Structure Mapping Engine (Falkenhainer et al., 1990),
ACME (Holyoak & Thagard, 1989), and COPYCAT (Mitchell & Hofstadter,
1990) have been applied are of this type, because they entail inferring
missing relations.

**Figure 7.12.
Taxonomy of Structure-Mapping Tasks.**

There is a subvariety in which the critical relation
is missing from both base and target. In this case, a relation has
to be found which applies to arguments in both base and target. For
example, given *person*:*house*::*dog*:*kennel*,
the relation "lives in" can
be selected. Two structures impose a tighter constraint than a single
structure. Given *person*:*house* a number of predicates
are possible (e.g., "pays for," "repairs," but these do not apply to
*dog*:*kennel*). Similarly,
"is chained to," or "gnaws bones in" might apply to
*dog*:*kennel*, but (hopefully) not to *person*:*house*.

** 5.2.5. Missing Base.** The target
structure is complete, but no base is supplied. The task requires that
memory be searched for a structure that matches the target. Creativity
may be required because the base might not bear any obvious resemblance
to the target. Unfamiliar problems, for which the solver must find an
analogous situation, are of this kind.

** 5.2.6. Incomplete Target, Missing Base.**
Target elements -are supplied, but the relation(s) are not, and the base
is missing. A base must be found in order to supply the missing relations
in the target. The Rutherford analogy tends to be treated nowadays as
an explanatory analogy, in which complete base and target structures are
supplied. However, when it was originally discovered the target structure
would presumably have been incomplete, and it would have been necessary to
find an appropriate base in order to complete the target. Therefore, let
us consider a hypothetical "Rutherford" who did not know that the electron
orbited around the nucleus, but knew the atom consisted of nucleus
and electron (or some similar particles), and knew certain relations
between them, such as the size difference, the electrostatic attraction,
etc. The problem would have been to find an analogy or, in contemporary
terms, a "mental model" of atomic structure. In these circumstances, the
solar system might be adopted because it comprised objects which had size
relations and mutual attraction. Then the relation "orbits around" in the
solar system would be applied to the atom as an hypothesis. It is possible
that much of the creativity of the Rutherford analogy would have consisted
of finding the base, comprising the solar system, then in recognizing
that the ORBITS-AROUND relation in the solar system might apply to the
atom. The incomplete target, missing base analogy is highly creative,
because not only is it necessary to find an appropriate, and probably
non-obvious, base, but the base is used to supply information about the
target that was not previously known. There are possible sub-varieties of
this case, where target relations are supplied but an element is missing.

We will now consider how the STAR model can in principle handle the other cases of analogy shown in Figure 7.12.

** 5.3.1. Complete Structure.** In this
case both and target are completely represented in the tensor product,
and forming the analogy is a matter of assessing the similarity of
base and target predicates. This can be done using the inner product,
as described earlier. If the inner product of the vectors representing
base and target predicates is nonzero, then the base-target mapping
achieves the minimum criterion of acceptability.

This implies that there must be a common predicate in
base and target for an analogy to be formed. This is not a theoretical
requirement for analogical mapping, because it is not contained in the
definition of structure mapping given earlier, and the ACME model can
find matches based solely on structural correspondence, with no shared
predicates. However, it may be a psychological requirement. Gick and
Holyoak (1983) found that common coding of base and target facilitated
solution. The possibility of a common higher order predicate in base and
target facilitated solution. The possibility of a common higher order
predicate in base and target is virtually impossible to rule out in any
actual case of analogical reasoning. For example, in a case such as that
in Figure 7.3A, *R* and *R*' might not share common features,
but there might be a common higher order relation (e.g., if *R*
and *R*' are both asymmetric and transitive, then both structures
represent ordered sets, and a higher order relation like "monotonic"
would apply).

** 5.3.2. Missing Objects.** This case was
considered earlier.

** 5.3.3. Missing Relations.** This case
can be handled by entering base arguments into the tensor product
representation, and obtaining the vector representing the sum of all
predicates that apply to the argument set. The inner product of this
vector with the vector representing the target predicate provides a
measure of the success of the match.

In the case where the target predicate (relation) is missing, target arguments are entered into the tensor product in a separate step, and the vector representing the sum of all predicates applying to the target arguments is obtained. The inner product of vectors representing base and target predicate bundles provides a measure of the degree of match obtained. Individual predicates can be identified by obtaining the inner product of the predicate bundle with vectors representing specific predicates.

** 5.3.4. Missing Object and Relation.** If

Where the target predicate is not known, this will be obtained in a separate step, as for the missing relation case.

** 5.3.5. Missing Base.** The complete
target subvariety of this case can be handled by entering the target
predicate into the tensor product representation. This will yield a tensor
product of all vectors representing arguments with that predicate. The
set of arguments so obtained will include the target arguments, as
well as all other arguments in the representation consistent with the
predicate. Specific arguments can be identified by obtaining the inner
product of the vectors representing them with the tensor product obtained
as above.

Sets of vectors representing selected arguments are then entered into the tensor product representation, and the vector representing the sum of all predicates consistent with the selected set of arguments is obtained. The inner product of this vector with the vector representing the target predicate represents a measure of the degree of match. In this way a base that can be mapped into the target can be found. This process essentially entails using the target predicate as a retrieval cue to find possible bases.

** 5.3.6. Incomplete target, missing base.**
For this subvariety of the missing base case, the target arguments must
first be entered into the tensor product, thereby obtaining the vector
representing the sum of all predicates consistent with target arguments
(as in the missing relations case, considered earlier). Then this vector
is entered into the tensor product representation, and the remainder of
the process is as for the complete target subvariety.

The proposed model is capable of solving a variety of analogical reasoning tasks. The cases we have considered so far entailed two-place predicates, but the model is capable in principle of solving analogies that entail three- or four-place predicates. It becomes computationally more costly as we move to higher place predicates, and consequently to tensor products of higher rank. However, this feature provides a natural explanation for the higher processing loads associated with tasks which required more dimensions to be processed in parallel. The limitation to four dimensions, for which evidence was presented earlier, is presumably a reflection of this processing load.

The model captures some of the important insights of the ACME model of Holyoak and Thagard (1989). ACME provides an elegant solution to the problem of how to implement structure mapping in a connectionist, parallel constraint satisfaction architecture. It does this by using inhibitory connections which ensure mappings are unique, and by excitatory and inhibitory connections which ensure that predicates in one structure correspond to predicates in the other. The present model achieves the correspondence property by superimposing predicate-argument bindings of base and target on the same representation. This provides a constraint between mapping of the predicate and the mapping of its arguments. This constraint is built into the representation, rather than resulting from computation.

The way structure mapping is implemented in the model differs from the way it is implemented in models such as the Structure Mapping Engine (Falkenhainer, Forbus, & Gentner, 1990), ACME (Holyoak & Thagard, 1989), or COPYCAT (Mitchell & Hofstadter, 1990). In all these models two separate structures, representing source and target, are created and then mappings are explicitly created between them. In our model the mappings are implicit in the structural representation, because base and target are both superimposed on the same set of units. Instead of base and target each with its own predicate(s), our model has a common (set of) predicate(s) which operate in two different contexts, base and target. Thus, in our example earlier, "mother of" occurs in two contexts, a human context and a horse context.

In its present form, the model is restricted to analogies in which base and target have a common predicate. There may be no common attributes or first-order relations, but in such cases there must be a common higher order relation.

Although common predicates are not a theoretical requirement for predicates, they may be a psychological requirement, as mentioned earlier. However, there are potential ways for removing this restriction from the model. Base and target predicates could be associated, or belong to the same category, rather having a literal similarity. That is, instead of using the predicate obtained from the base as input to the target, a similar predicate, or one from the same category, could be used as input, without altering the logic of the process.

The model can also be related to distributed memory models. The tensor product representation has been used as the basic data structure in the memory model of Humphreys et al. (1989), and as the underlying representation for a direct access intersection process by Wiles, Humphreys, Bain, and Dennis (1990). In these models, the triple that forms the tensor is created from the outer product of context, memory cue, and target vectors (see Humphreys et al., 1989, Equation 1, p. 215). Their representation of context, cue and target is analogous to the representation of predicate and two arguments which forms the basis of our account of simple analogical reasoning. The tensor product formalism, which represents all interactions of context, cue and target, or in the case of analogies, of predicate and arguments, has considerable potential to provide a computational model of creativity, as Wiles, Halford, Stewart, Humphreys, Bain, and Wilson (in press) have shown.

One difference between the memory and analogical
reasoning models is that because Humphreys et al. (1989) are concerned
with modeling memory, the values stored in their tensor products
are interpreted as weights, whereas we are concerned with modeling
dynamically created representations, so we interpret the values
as activations. Nevertheless, our model is consistent with a memory
retrieval few of analogies. Where arguments are entered into the tensor
product representation, they are equivalent to retrieval cue for a set
of arguments. Where a predicate and *N*−1 arguments are entered, they
serve as a cue for retrieval of the
*N*^{th} argument.

The processes of analogical reasoning are therefore closely related to memory retrieval processes. This does not necessarily imply however that analogical reasoning operates directly on semantic memory. It is possible that it takes place in a temporary representation that is a copy of information in semantic memory. The amount of such information that can be represented at any one time is probably restricted, as we noted before. Therefore, there is a serial aspect to analogical reasoning insofar as it is necessary to represent components of a structure one at-a-time.

As noted earlier, the validity of a structure mapping can be defined in terms of consistency, which can be determined by checking whether mappings are commutative, as defined in Equation 1 and Figure 7.1. Halford and Wilson (1980) described a family of models of consistency checking tasks of varying levels of complexity, as illustrated in Figure 7.13. Levels 1, 2, and 3 as defined by Halford and Wilson (1980) correspond to relational, system, and multiple system mappings as discussed earlier.

To check consistency of a Level 1 system, one shows
that for any symbol *s* in *S*_{1},
ι_{2}(μ_{S}(s))
=
μ_{E}(ι_{1}(*s*))
that is, that it
does not matter whether you transform *s* first, and then map the result
into the environment, or map *s* first, and then transform the result. In
terms of composition of functions, this can be expressed by the equation
ι_{2}μ_{S} =
μ_{E}ι_{1}
The corresponding equations for Level 2 and 3 systems
are respectively
ι_{3} μ_{S} =
μ_{E} ι_{1}×ι_{2}
and
ι_{4} μ_{S} =
μ_{E} ι_{1}×ι_{2}×ι_{3}
In all cases,
the message is that if you start in the top-left corner of a diagram and
transform a member of the symbol system that you find there, then you
will get the same result whether you follow the clockwise transformation
path or the anti-clockwise transformation path.

**Figure 7.13.
Symbol Systems for Consistency Checking.**

Can the consistency checking embodied in the Halford
and Wilson (1980) model be performed by a tensor product network of the
type we have been discussing? As the Halford and Wilson (1980) model is
couched in terms of symbol systems, we need to describe first how that
model is to be interpreted in a PDP modeling environment. Basically,
a Symbol *s* in *S _{i}*,
(

**Figure 7.14.
Consistency Checking with Tensor Product Nets: (a) Level 1 Systems.**

In accordance with Halford and Wilson (1980), the
computations performed by the tensor product nets would be performed
sequentially, presumably reflecting that the supply of tensor product
nets as a computational resource is limited. Figure 7.15 shows a similar
architecture for a Level 2 consistency checking calculation. Notice that
the transformations
μ_{S} and μ_{E},
correspond to rank 3 tensor products in the
level 2 diagram. Figure 7.16 indicates how a Level 3 consistency checking
calculation would be done-the transformations
μ_{S} and μ_{E}
correspond to
rank 4 tensor products in this case. For all three levels, the equality
check (in the bottom-right corner of Figure 7.13-7.15) is shown as though
it was performed by another tensor product net, though there might be
better neural sub-assemblies to test for equality.

**Figure 7.15.
Consistency Checking with Tensor Product Nets: (b) Level 2 Systems.**

It is not clear that this consistency checking process would be performed whenever a structure mapping is constructed because, as we have seen, there are algorithms which can produce valid structure mappings without explicitly performing this consistency cheek. Rather it might be thought of as a kind of meta-cognitive process, or higher order checking process, which serves to check that a mapping that has been made is logically consistent.

It is probably already apparent that the dimensionality
of concepts as defined earlier can be related to the rank of the tensors
in the representation of the structures. A unidimensional concept
corresponds to the tensor product of a vector representing a predicate
and a vector representing its argument. A two
dimensional concept as defined earlier corresponds to a tensor
product of a predicate and two arguments. A three-dimensional concept
corresponds to the tensor product of a predicate and three arguments,
and so on. A concept of dimensionality *N* as defined earlier corresponds
to a representation which entails a tensor product of Rank *N* + 1.

**Figure 7.16.
Consistency Checking with Tensor Product Nets: (c) Level 3 Systems.**

Note that dimensionality is defined in terms of the concept, rather than in terms of the rank of the tensor product. This is partly because conceptual complexity can be defined in other contexts, where tensor products might not have apparent relevance. More importantly, it is appropriate to define dimensionality in terms of the number of arguments in a relation, Consider, for example, the binary operation of addition, which is a ternary relation. It is three-dimensional because three items of information are sufficient to constrain the fourth (e.g., given 2,3,5 we can infer the operation is addition; given 2 x ? = 8 we can infer the missing number is 4).

This means that as we progress to higher order
structures, the computational demands of the representations escalate
rapidly. Given *N* vectors each of *M* elements, the number
of elements in the tensor product is
*M ^{N}*, There are ways of
reducing the number of elements (Smolensky, 1990) because sufficiently
clear representations can be achieved even with some elements
deleted. Nevertheless, the number of elements escalates rapidly.

The fact that the computational demands of representations escalates rapidly as the dimensionality of the structures increases corresponds to the psychological evidence mentioned earlier that the processing loads are higher for higher-dimensional structures. Empirical evidence reviewed earlier suggests that four-dimensional structures can be represented, which would correspond to tensor products of Rank 5. On the evidence currently available this might be the most complex structure that humans can process.

There are two possible reasons for this. One would be the sheer size, in terms of number of units, of the tensor product vector. Another would be the number of connections to each unit in the tensor product. Each of these units must have a connection to, and receive input from, at least one unit in each of the vectors representing the predicate and argument(s). In the representation of a one-dimensional concept, with a tensor product of Rank 2, each unit has a connection to (at least) one unit in each of two vectors. With a Rank 3 tensor product, each tensor product unit has connections to units in three vectors, with a Rank 4 tensor product each unit has connections to units in four vectors, etc. Cortical cells are very richly interconnected, so the sheer number of connections is unlikely to be a problem. However, there may be difficulties in ensuring that each tensor product unit is connected to the right cells in each of the input vectors. That is, even though each tensor product unit might have thousands of connections, it might not have the appropriate connections. The higher the rank of the tensor product, the more difficult it will be to ensure this. This aspect of representations requires further exploration.

This is likely to be a soft rather than a hard limitation, and there will be occasions when the specified limit can be exceeded, just as there will be occasions when it will not be reached. This is quite consistent with the literature on capacity limitations mentioned earlier, which suggests a capacity to process 3-5 chunks or dimensions in parallel.

It appears humans have limited ability to process structure, and can probably only process four dimensions in parallel. However, it is obvious that many concepts are more complex than four dimensions, so how can we deal with more complex concepts? Two processes, conceptual chunking and segmentation, handle this problem. Conceptual chunking entails recoding a multidimensional structure as a unidimensional structure, or more generally, as a structure with fewer dimensions. This can only occur under certain conditions, and is a not universal method for reducing all processing loads.

An everyday example of a conceptual chunk would be
velocity. Velocity is a three-dimensional concept, defined as: velocity
= distance/time (*V* = *s*/*t*). However, it can be
recoded as a single
dimension. We commonly do this by thinking of speed as position of a
pointer on a dial, as the rush of air in the face, blurring of close
objects passing by, and so on.

Once multiple dimensions are recoded as a single
dimension, that dimension occupies only one chunk, and it can then be
combined with up to three other chunks. Velocity can now be combined again
with time to give acceleration, normally defined as distance divided by
time, divided by time (*a* = 2*st*^{−2}). Acceleration can now be chunked into
a single dimension, and can then be combined with mass to produce the
concept of force, *F* = *ma*, and so we can bootstrap our
way up to more and more complex concepts.

Conceptual chunks are similar to mnemonic chunks in
that a number of formerly separate items of information are recoded
as a single item, but there is more emphasis on structure (i.e.,
relations, function, or transformations) between elements, in conceptual
chunks. The chunk corresponding to velocity does not consist merely
of the concatenation of distance and time, but actually embodies the
function *V* = *s*/*t*.

In our formulation, velocity would be represented as a
Rank 4 tensor product, with vectors representing velocity (*v*), distance
(*s*), time (*t*), and multiplication.
If velocity is recoded as a single
vector, then acceleration can be represented as a rank 4 tensor product,
with vectors representing acceleration (*a*), velocity (*v*),
time (*t*), and
multiplication. If we now represent acceleration as a single vector,
we can represent force (*F*) as a Rank 4 tensor product, with vectors
representing *F*, mass (*M*), acceleration (*a*)
and multiplication.

At first it seems that we must be getting something
for nothing when the more complex concept, Force, is represented by
a tensor product with no more vectors than the less-complex concept,
velocity. Actually, however, there is a cost to the more efficient
representation. This is that some of the constituent structure is
no longer immediately accessible. If we write force as
*F*=2*mst*^{−2}, its
representation corresponds to a rank 5 tensor product, because all
dimensions (mass, distance, and both time dimensions, as well as force
itself) are expressed as separate vectors. The effect of distance and time
on force are expressed in the representation (the partial derivatives
∂F/∂t and ∂F/∂s
are expressed). However, when force is expressed as
*F* = *ma*, the effect of variations in distance and time
(the partial derivatives mentioned above) cannot be expressed.

It only makes sense to fold *s* and *t* into
velocity, *v*, if the actual values of s and t are irrelevant. For
an example of a situation where the actual values are relevant, consider
foot races; 100 meters in 10 seconds is reasonably fast, but 1,500 meters
in 150 seconds, although the same velocity, is implausibly fast. On
the other hand, when someone is booked for exceeding the speed limit,
actual time and
distance are irrelevant. In the latter case, but not in the former,
it is appropriate to recode the three vector representation into
one vector. This is reflected in the way people actually encode these
events. We do not say someone ran at 30 kilometers per hour, we say they
ran 100 meters in 12 seconds, but when someone is booked for speeding,
we do not say they drove 2.5 kilometers in one minute, we say they drove
at 150 kilometers per hour. In the first case, two parameters, distance
and time, are relevant and both are mentioned, but in the second case
only one parameter is relevant, and the language reflects this.

Thus, conceptual chunking, or recoding multiple vectors into a single vector, does not dispose of capacity limitations so much as it allows us to operate within them. Given that we can only represent concepts up to four dimensions in parallel, we must have ways of representing complex concepts that do not exceed this limit. We do this by combining multiple vectors into single vectors. However, when we want access to the components of the complex concepts, we must revert to the more specific representation again. Thus, to gain access to the components, we must sacrifice the efficiency gained by chunking. Thus, there tends to be a tradeoff between efficiency and flexibility.

Another implication is that representation of complex concepts is heavily dependent on availability of efficient codes. Because we are stretching our capacity for representations to process all the interrelations between distance, time, and mass that occur in the concept of force, we need concepts that provide efficient codes. We can only represent up to about four independent sources of variation (4 degrees of freedom) in parallel, so we require codes that enable us to represent complex concepts without exceeding this limit. This is why expertise is so important to representation, because without it we are restricted to very simple systems. Thus, when an expert examines a situation in simple mechanics, they represent it in terms of the most efficient concepts. In the case of force, they represent it in terms of mass and acceleration. The novice tries to represent too many dimensions (such as distance and time) because she/he does not have access to the important higher order concepts such as acceleration, and also tries to represent irrelevant dimensions.

Another, limitation of conceptual chunking is that, like mnemonic chunking, it is only possible with constant mappings of components into chunks. The mnemonic chunk "CAT" is possible because "C," "A," and "T" map into the word CAT in constant fashion. In the conceptual chunk speed, the dimensions distance and time map into speed in constant fashion, defined by the ternary relation of division.

The Rutherford analogy between the solar system and the atom, analyzed by Gentner (1982), and modeled by Falkenheiner et al. (1990) and Holyoak and Thagard (1989) also provides an instructive example of a conceptual chunk. The analogy entails mapping the solar body into the nucleus, and the planetary body into the electron, on the basis that the relational structures correspond; that is, the formula linking radius of orbit, the gravitation constant, masses of the two bodies, and orbital velocity, would correspond to the equivalent formulas in the atom. It would constitute a multiple system mapping, and recognition of the mapping without benefit of chunking would impose a high-processing load. However, as Halford (1993) points out, we can create the conceptual chunk "orbital motion" which would consist of only two nodes, the solar body and the orbiting body, with the relation "orbits around" between them. Structure mapping would then entail mapping solar body into the nucleus, and orbiting body into electron. This can be done on the basis that the relation "orbits around" exists between solar and orbiting bodies, and also between nucleus and electron. This only requires a relational mapping (i.e., a mapping based on two-place predicates).

The structure mapping model of Holyoak and Thagard (1989) includes "presumed" mappings (i.e., presumptions that particular items in one structure should be mapped into certain items in the other structure). The conceptual chunk "orbital motion" would entail the presumption that, if solar body and planetary body are mapped into nucleus and electron, respectively, then all the predicates attached to these two chunks would be correspondingly mapped (i.e., radius of planetary orbit would be mapped into radius of electron orbit), and so on. The processing load is then equivalent to that required to map the nodes solar body and planetary body, with a single relation between them. It is equivalent to a relational mapping. The remaining mappings are presumed, having been learned in past experience, and do not impose a processing load.

One characteristic of an expert, as compared with a novice, is that the expert can recognize examples of concepts on the basis of their deep structure (Chi & Ceci, 1987; Holyoak, 1990). An expert who has constructed the conceptual chunk "orbital motion": recognizes an example of it quickly, and with little effort. All that is necessary is to recognize that there is a solar body and an orbiting body, with the relation "orbits around" between them. Once "planetary body orbits around solar body" has been mapped to "electron orbits around nucleus," the remaining features, such as size relations, attraction relations, etc., are presumed to be mapped.

Chunking can, therefore, produce massive gains in efficiency, but at the expense of rigidity. While a concept is encoded as a chunk in this way, its constituent structure is less accessible, and alternative combinations cannot he generated. Hence, novices, although they are less likely to see the conventional structures quickly and efficiently because they have not coded the information into efficient chunks might, for that very reason, more readily see novel structures.

Segmentation entails decomposing structures into smaller components that can be processed serially. This also can be illustrated with the Rutherford analogy Once the core chunks "planetary body orbits around solar body" and "electron orbits around nucleus" are mapped, the remaining components can be mapped serially. Notice, however, that some knowledge of the two structures is required for this serial processing strategy to be organized. When the analogy was first discovered, it is unlikely that such strategies for reducing processing load would have been available. Consequently the processing load must have been very high indeed.

Given that only a limited amount of structure can be processed in one step, the chance of finding a good mapping depends on choosing the right representation of the problem. A complex problem might comprise many predicates and many arguments. The key to the mapping will depend on choosing the right subset of predicates or arguments. For example, in the Rutherford analogy, success depends on representing the solar system by the predicate "revolves around" with arguments "sun" and "planet," and the atom by "nucleus" and "electron." There are many other representations that would conflict with a good match (e.g., representing the solar system by the predicate -hotter than'). An expert will tend to select an appropriate match relatively quickly. A realistic model of human analogical reasoning however needs ways of searching for a good representation. The present model could achieve this in principle by adding a sampling process, whereby attempted matches are based on successive samples of each structure. Success in sampling would be defined in terms of the match achieved; when the appropriate predicate-argument combinations are found, a match will he readily obtained.

Notice that the requirement that predicates in the two structures must correspond can be achieved by such as process. When an appropriate representation is found, the predicates will correspond automatically, as noted earlier. The reason is that the predicate-argument combinations of base and target are superimposed on the same representation. If this is not true, a good match cannot be obtained. Therefore, the sampling process will be constrained to find a predicate-argument combination that puts the two structures in correspondence. With such a structure sampling procedure, the present model could achieve essentially the same result as the ACME model, without assuming appropriate representations at the outset.

The theory proposes that there are limits to the dimensionality of structure that can be represented and mapped in parallel. The problem with testing this proposition is that structures can be recoded into lower dimensional representations, and structures can sometimes be processed serially. However, the factors which govern these processes are known to some extent, and can be controlled in experimental paradigms. Our work on the past has shown that it is possible to constrain participants to map particular levels of structure in parallel under certain conditions. We will briefly summarize the requirements for doing this.

To test the theory five requirements must be met. These are that the task must be assigned objectively to a level of structure mapping, conceptual chunking and segmentation must be controlled, the mappings must not have been pre-learned, and there must be no non-structural cues to the correct mappings.

The most objective way to assign a task to a level of structure mapping is to write the specifications for the task in a machine-readable form such as predicate calculus, then use a structure mapping simulation program to determine the level. This has been done using the simulation program by Bakker and Halford (1988), which has two structure mapping procedures, corresponding to relational and system mappings. The task is attempted by each level of mapping procedure, and the lowest successful level is the one to which the task is assigned. The program has been used with transitive inference and the Rutherford analogy, and shows that the former is a system mapping and the latter is a relational mapping if coded as described in the last section. Where no simulation program is available, the mathematical structure of the task can be examined. Transitive inference has been used in our research because it has a clearly defined mathematical structure that can be assigned objectively to a level of mapping.

Conceptual chunking is avoided if the task requires participants to generate a new structure. For example, each transitive inference task requires participants to generate an ordered set from the premises. The elements (e.g., elements in Figure 7.2) will have been assigned to slots in the premises so that the correct ordering cannot be retrieved from semantic memory. Because the elements have not occurred in that combination before, at least not with any regularity, they will not have been formed into a chunk.

Segmentation can be avoided by ensuring that the various components of the task interact, because it is not then possible to process one component without taking account of the others. This is true for the interpretation of interactions task, discussed earlier. The premises of a transitive inference task also interact in the sense that neither can be interpreted without the other. When we try to assign a premise element to an ordinal position on the basis of one premise, the assignment is modified by the other premise.

It is also necessary to ensure that the mappings have not been pre-learned. This can normally be done by using arbitrary assignments. The requirement that there be no non-structural cues, such as linguistic cues, is similar to saying that the mappings must have low transparency (Gentner & Gentner, 1983).

Although there is agreement that ability to process information increases through childhood (Case, 1985; Halford, 1982) it is not clear whether this growth represents increased capacity, or more efficient use of available capacity, or both. It has been shown that speed of processing increases with age (Case, Kurland, & Goldberg, 1982; see also Halford, 1982; Siegler, 1990, for reviews), that strategies become more sophisticated (Flavell & Wellinan, 1977), that knowledge becomes more sophisticated and better organized (Carey, 1985) and that content familiarity has a powerful effect on memory (Chi, Glaser, & Rees, 1982). However, none of these approaches assesses capacity per se. They show that capacity is not only the thing that changes with age, but do not eliminate the possibility that capacity increases might accompany these other changes.

Ability to map structures does increase with age. Halford and Wilson (1980) examined children's ability to map a structure into an isomorph. They varied the number of dimensions required to validate the mapping, and found that structures defined on two dimensions (two-place predicates) could be mapped by 3-4-year-olds, structures based on three dimensions (three-place predicates) at 5 years, and structures based on four dimensions (four place predicates) at 11 years. This suggests the number of dimensions that can be processed in parallel increases with age, and reaches the adult level at age 11. The prediction that ability to process structure increases with age is also supported by studies of transitive inference (Halford, 1984, 1989; Halford & Kelly, 1984; Halford, Maybery, & Bain, 1986) and class inclusion reasoning (Halford & Leitch, 1989). However these gains might reflect more efficient use of available capacity, and do not necessarily entail a maturational increase in capacity.

We reviewed evidence earlier that whereas adults process four chunks in primary memory in parallel, 8-9-year-old children process two or three chunks. There is other evidence also of an age-related increase in information processing capacity, as district from the efficiency with which capacity is used (Thatcher, Walker, & Giudice, 1987). The evidence is not sufficient to resolve the issue, but we would like to suggest an alternate formulation of the problem. This is that increased ability to process structure might result from differentiation of some regions of the brain, thereby increasing the independence of representations. This would enable more chunks or dimensions to be processed in parallel. Increased differentiation with age would be consistent with the developmental theory of Werner (1948). It would not increase overall capacity, but it would mean that structures of higher dimensionality could be mapped in parallel.

This in turn would permit new conceptual achievements. Concepts such as inclusion and transitivity are defined on compositions of binary relations, and cannot be decomposed, without loss, into single relations. Therefore, if children are limited to processing individual relations, they will not be able to comprehend these concepts. A similar argument applies to binary operations, such as arithmetic addition and multiplication, and the binary connectives of logic (conjunction and disjunction). These concepts also cannot be decomposed without loss into simpler concepts. A certain amount of information must be processed in parallel to make concepts at the system-mapping level accessible. An extra dimension must be processed in parallel to make concepts at the next level accessible. Therefore increased differentiation, with consequent increases in the dimensionality of structures that can be processed in parallel, opens up new conceptual possibilities.

From our account of the way cognitive structures are represented in PDP architectures, a new way of formulating the development of processing capacity question emerges. We have seen that representation of high-dimensional structures depends on ability to compute tensor products of equivalent rank. Therefore the question becomes; do children acquire the ability to compute tensor products of higher rank as they grow older? Should the answer to this question be yes, is this ability due to maturation or learning? To address these questions, we need more studies which control the representations that participants use in a precise way. Our empirical work has shown how this can be done in certain contexts, but there is considerable scope to widen it.

Humans perform analogical reasoning very easily and naturally, but realistic models require specification of our capacity to map structures in parallel. The complexity of concepts can be quantified in terms of their dimensionality, and we propose that there are limits to the number of dimensions that humans can represent and map in parallel. Our evidence suggests that adults can map up to four dimensions in parallel, and that children can map less dimensions in parallel.

The complexity metric that we propose quantifies concepts in terms of the number of independent dimensions required to define them. Four levels of mapping are defined, which increase in abstractness and processing load. These mappings are based on concepts of increasing dimensionality. We have examined the way these concepts would be represented in PDP architectures. We have suggested that one-dimensional concepts are represented as the tensor product of a predicate and a single argument vector, two-dimensional concepts as the tensor product of two argument vectors and a predicate vector, three- (four-) dimensional concepts are represented as the tensor product of three (four) argument vectors and a predicate vector. We review empirical evidence suggesting that four dimensions might represent an upper limit to the complexity of concepts that humans can process in parallel.

The tensor product representations of predicate-argument bindings is used to model analogical reasoning. The main processes entail entering arguments into the representation to obtain unknown predicates, or entering predicates (sometimes accompanied by a subset of arguments) to obtain missing arguments. We have shown in principle how five basic types of analogical reasoning can be performed in terms of the model.

Concepts that entail more than four dimensions are processed by conceptual chunking. This entails recoding multiple vector representations into fewer vectors, or one vector in the limiting case. This reduces processing load, but also makes the constituent information unavailable for restructuring, unless a return is made to the original, multi-vector representation. The primary developmental implication is that ability to process structure might depend on differentiation or other processes which lead to representations of higher dimensionality These in turn would be based on ability to compute tensor products from more vectors.[Footnote 2]

Footnote 2: Solution to problem in Figure 7.4:

A1 = Paramount; A2 = MGM; B 1 = A-grade; B2 = B-grade;

Cl = G-rated; C2 = R-rated; D1 = adventure; D2 = comedy.

Amit, D.J. (1989). Modelling brain function: The world of attractor neural networks. Cambridge, UK: Cambridge University Press.

Anderson, J.A. (1973). A theory for the recognition of items from short memorized lists. Psychological Review, 80, 417-438.

Anderson, J.R. (1983). The architecture of cognition. Cambridge, MA: Harvard University Press.

Anderson, J.R., & Thompson, R. (1989). Use of analogy in a production system architecture. In S. Vosniadou & A. Ortony (Eds.), Similarity and analogical reasoning (pp. 267-297). Cambridge: Cambridge University Press.

Attneave, E (1959). Applications of information theory to psychology: A summary of basic concepts, methods and results. New York: Henry Holt & Company.

Baddeley, A.D. (1986). Working memory. Oxford: Clarendon Press.

Baddeley, A., & Hitch, G. (1974). Working memory. In G.H. Bower (Ed.), The psychology of learning and motivation: Advances in research and theory (pp. 47-89). New York: Academic Press.

Baddeley, A.D., Thomson, N., & Buchanan, M. (1975). Word length and the structure of short-term memory. Journal of Verbal Learning and Verbal Behavior, 14, 575-589.

Bakker, RE, & Halford, G. S. (1988). A basic computational theory of structure-mapping in analogy and transitive inference. Unpublished Tech. Rep. 88/1, Centre for Human Information Processing and Problem Solving, University of Queensland, Brisbane, Australia.

Broadbent, D.E. (1958). Perception and communication. London: Pergamon.

Broadbent, D.E. (1975). The magic number seven after fifteen years. In A. Kennedy & A. Wilkes (Eds.), Studies in long term memory (pp. 3-18). London: Wiley.

Carey, S. (1985). Conceptual change in childhood. Cambridge, MA: MIT Press.

Case, R. (1985). Intellectual development: Birth to adulthood. New York: Academic Press.

Case, R., Kurland, M., & Goldberg J. (1982). Operational efficiency and the growth of short-term memory span. Journal of Experimental Child Psychology, 33, 386-404.

Chi, M.H.T., & Ceci, S.J. (1987). Content knowledge: its role, representation and restructuring in memory development. Advances in Child Development and Behaviour 20, 91-142.

Chi, M.T.H., Glaser, R., & Rees, E. (1982). Expertise in problem solving. In R. Sternberg (Ed.), Advances in the psychology of human intelligence (Vol. 1, pp. 7-75). Hillsdale, NJ: Erlbaum.

Coombs, C.H., Dawes, R.M., & Wersky, A. (1970). Mathematical psychology: An elementary introduction. Englewood Cliffs, NJ: Prentice-Hall.

Duncan, J. (1980). The demonstration of capacity limitation. Cognitive Psychology, 12, 75-96.

Egan, D.E., & Greeno, J.G. (1974). Theory of rule induction: Knowledge acquired in concept learning, serial pattern learning, and problem solving. In L.W Gregg (Ed.), Knowledge and cognition. Potomac, MA: Erlbaum.

Falkenhainer, B., Forbus, K.D., & Gentner, D. (1990). The structure-mapping engine: algorithm and examples. Artificial Intelligence, 41, 1-63.

Fisher, D.L. (1984). Central capacity limits in consistent mapping, visual search tasks: four channels or more? Cognitive Psychology, 16(4), 449-484.

Fisk, A.D., Derrick, W.L., & Schneider, W (1986). A methodological assessment and evaluation of dual-task paradigms. Current Psychological Research and Reviews, 5, 315-327.

Flavell, J.H., & Wellman, H.M. (1977). Metamemory. In R.V Kail & J.W. Hagen (Eds.), Memory in cognitive development (pp. 3-33). Hillsdale, NJ: Erlbaum.

Garner, W.R. (1962). Uncertainty and structure as psychological concepts. New York: Wiley,

Gentner, D. (1982). Are scientific analogies metaphors? In D.S. Miall (Ed.). Metaphor: Problems and perspectives (pp. 106-132). Brighton, England: Harvester Press.

Gentner, D. (1983). Structure-mapping: A theoretical framework for analogy. Cognitive Science, 7, 155-170.

Gentner, D. (1989). The mechanisms of analogical reasoning. In S. Vosniadou & A. Ortony (Eds.), Similarity and analogical reasoning (pp. 199-241). Cambridge: Cambridge University Press.

Gentner, D., & Gentner, D.R. (1983). Flowing waters or teeming crowds: Mental models of electricity. In D. Gentner & A.L. Stevens (Eds.), Mental models (pp. 99-129). Hillsdale, NJ: Lawrence Erlbaum.

Gick, M.L., & Holyoak, K.J. (1980). Analogical problem solving. Cognitive Psychology, 12, 306-355.

Gick, M.L., & Holyoak, K.J. (1983). Schema induction and analogical transfer. Cognitive Psychology, 15, 1-38.

Halford, G.S. (1982). The development of thought. Hillsdale, NJ: Erlbaum.

Halford, G.S. (1984). Can young children integrate premises in transitivity and serial order tasks? Cognitive Psychology, 16, 65-93.

Halford, G.S. (1987). A structure-mapping approach to cognitive development. International Journal of Psychology, 22, 609-642.

Halford, G.S. (1989). Reflections on 25 years of Piagetian cognitive developmental psychology, 1963-1988. Human Development, 32, 325-387.

Halford, G.S. (1993). Children's understanding: The development of mental models. Hillsdale, NJ: Erlbaum.

Halford, G.S., Bain, J.D., & Maybery, M.T. (1984). Does a concurrent memory load interfere with reasoning? Current Psychological Research and Reviews, 3, 14-23.

Halford, G.S., & Kelly, M.E. (1984). On the basis of early transitivity judgements. Journal of Experimental Child Psychology, 38, 42-63.

Halford, G.S., & Leitch, E. (1989). Processing load constraints: a structure-mapping approach. In M.A. Luszcz & T. Nettelbeck (Eds.), Psychological development: Perspectives across the life-span (pp. 151-159). Amsterdam: Elsevier.

Halford, G.S., Maybery, M.T., & Bain, J.1). (1986). Capacity limitations in children's reasoning: A dual task approach. Child Development, 57, 616-627,

Halford, G.S., Maybery, M.T., & Bain, J.D. (1988). Set-size effects in primary memory: An age-related capacity limitation? Memory and Cognition, 16(5), 480-487.

Halford, G.S., & Wilson, W.H. (1980). A category theory approach to cognitive development. Cognitive Psychology, 12, 356-411.

Hesse, M.B. (1966). Models and analogies in science. Notre Dame, IN: Notre Dame University Press.

Hinton, G.E. (1981). Implementing semantic networks in parallel hardware. In G.E. Hinton & 1A. Anderson (Eds.), Parallel models of associative memory (pp. 161-187). Hillsdale, NJ: Erlbaum.

Holland, J.H., Holyoak, KA, Nisbett, R.E., & Thagard, P.R. (1986). Induction: Processes of inference, learning and discovery. Cambridge, MA: Bradford Books/MIT Press.

Holyoak, K. (1990). Symbolic connectionism: Toward third-generation theories of expertise. In K.A. Ericsson & J. Smith (Eds.). Toward a general theory of expertise: Prospects and limits. Cambridge, MA: Cambridge University Press.

Holyoak, K.A., & Koh, K. (1987). Surface and structural similarity in analogical transfer. Memory and Cognition, 15, 332-340.

Holyoak, K.A., & Thagard, P. (1989). Analogical mapping by constraint satisfactions. Cognitive Science, 13(3), 295-355.

Hopfield, J.J. (1982). Neural networks and physical systems with emergent collective computational abilities. Proceedings of the National Academy of Science, 79, 2554-2558.

Humphreys, M.S., Bain, J.D., & Pike, R. (1989). Different ways to cue a coherent memory system: A theory for episodic, semantic and procedural tasks. Psychological Review, 96 (2), 208-233.

Hunt, E., & Lansman, M. (1982). Individual differences in attention. In R. Sternberg (Ed.). Advances in the psychology of human intelligence (Vol. 1, pp. 207-254). Hillsdale, NJ: Erlbaum.

Kahneman, D. (1973). Attention and effort. Englewood Cliffs, NJ: Prentice-Hall.

Leeuwenberg, E.L.L. (1969). Quantitative specification of information in sequential patterns. Psychological Review, 76, 216-220.

Maclane, S. (1972). Categories for the working mathematician. New York: Springer-Verlag.

Maybery, M.T., Bain, J.D., & Halford, G.S. (1986). Information processing demands of transitive inference. Journal of Experimental Psychology: Learning, Memory and Cognition, 12, 600-613.

McClelland, J.L. (1986). Resource requirements of standard and programmable nets. In D.E. Rumelhart & J.L. McClelland (Eds.), Parallel distributed processing: Explorations in the micro-structure of cognition (pp. 460-487). Cambridge, MA: MIT Press.

Miller, G.A. (1956). The magical number seven, plus or minus two: Some limits on our capacity for processing information. Psychological Review, 63, 81-97.

Mitchell, M., & Hofstadter, D.R. (1990). The emergence of understanding in a computer model of concepts and analogy-making. Physica D, 42, 322-334.

Moray, N. (1967). Where is capacity limited? A survey and model. Acta Psychologica, 27, 84-92.

Murdock, B.B., Jr. (1983). A distributed memory model for serial-order information. Psychological Review, 90, 316-338.

Navon, D. (1984). Resources-a theoretical soup stone? Psychological Review, 91, 216-234.

Newell, A., & Simon, H.A. (1972). Human problem solving. New York: Prentice-Hall.

Norman, D. A., & Bobrow, D. G. (1975). On data-limited and resource-limited processes. Cognitive Psychology, 7, 44-64.

Palmer, S.E. (1978). Fundamental aspects of cognitive representation. In E. Rosch & B.B. Lloyd (Eds.), Cognition and categorization (pp. 259-303). Hillsdale, NJ: Erlbaum.

Palmer, S.E. (1989). Levels of description in information-processing theories of analogy. In S. Vosniadou. & A. Ortony (Eds.), Similarity and analogical reasoning (pp. 332-345). Cambridge: Cambridge University Press.

Plate, T. (1991, August). Holographic reduced representations: convolution algebra for compositional distributed representations. 12th International Joint Conference on Artificial Intelligence (pp. 30-35). Sydney, Australia.

Restle, R. (1970). Theory of serial pattern learning: Structural trees. Psychological Review, 77, 481-495.

Rumelhart, D.E. (1989). Toward a micro-structural account of human reasoning. In S. Vosniadou & A. Ortony (Eds.), Similarity and analogical reasonings (pp. 298-312). Cambridge: Cambridge University Press.

Rumelhart, D.E., & McClelland, J.L. (Eds.), (1986). Parallel distributed Processing: Explorations in the micro-structure of cognition. Cambridge, MA: MIT Press.

Schneider, W, & Detweiler, M. (1987). A connectionist/ control architecture for working memory. The Psychology of Learning and Motivation, 21, 53-119.

Schweickert, R., & Boruff, B. (1986). Short-term memory capacity: Magic number or magic spell? Journal of Experimental Psychology: Learning, Memory and Cognition, 12, 419-425.

Siegler, R.S. (1990). Children's thinking (2nd ed.). Englewood Cliffs, NJ: Prentice-Hall.

Simon, H.A. (1972). Complexity and the representation of patterned sequences of symbols. Psychological Review, 79, 369-382.

Simon, H.A. (1974). How big is a chunk? Science, 183, 482-488.

Simon, H.A., & Kotovsky, K. (1963). Human acquisition of concepts for sequential patterns. Psychological Review, 70, 534-546.

Smolensky, R (1990). Tensor product variable binding and the representation of symbolic structures in connectionist systems. Artificial Intelligence, 46(1-2), 159-216.

Standing, L., Bond, B. Smith, R, & Isely, C. (1980). Is the immediate memory span determined by subvocalization rate? British Journal of Psychology, 71, 525-539.

Suppes, P., & Zinnes, J.L. (1963). Basic measurement theory. In R.D. Luce, R.R. Bush, & E. Galanter (Eds.), Handbook of mathematical psychology. New York: Wiley.

Werner, H. (1948). Comparative psychology of mental development. Chicago: Follet.

Wickens, C.D. (1980). The structure of attentional resources. In R.S. Nickerson (Ed.). Attention and performance (Vol. VIII, pp. 239-257). Hillsdale, NJ: Erlbaum.

Wiles, J., Halford, G.S., Stewart, J.E.M., Humphreys, M.S., Bain, J.B.,
& Wilson, W.H. (1994). Tensor models: A creative basis for memory,
pages 147-161 in *Artificial Intelligence and Creativity: an
Interdisciplinary Approach*, edited by T. Dartnall, Dordrecht: Kluwer
Academic.

Wiles, J., Humphreys, M. S., Bain, J.D., & Dennis, S. (1990). Control processes and cue combinations in a connectionist model of human memory (Tech. Report, No. 186). Brisbane, Australia: Key Centre for Software Technology, University of Queensland.

Willshaw, D. (1981). Holography, associative memory, and inductive generalization. In G.E. Hinton & 1A. Anderson (Eds.). Parallel models of associative memory (pp. 83-104). Hillsdale, NJ: Erlbaum.