Connectionist Implications for Processing Capacity Limitations in Analogies

Graeme S. Halford
William H. Wilson
Jian Guo
Ross W. Gayler
Janet Wiles
and J.E.M. Stewart

Chapter 7: pages 363-415 in K.J. Holyoak and J. Barnden (editors) Advances in Connectionist and Neural Computation Theory: Volume 2: Analogical Connections, Norwood, NJ: Ablex, 1994.
Sections Section 1 | Section 1.1 | Section 1.2 | Section 1.3 | Section 1.4 | Section 1.5 | Section 1.6
Section 2 | Section 2.1 | Section 2.2
Section 3
Section 4 | Section 4.1 | Section 4.2 | Section 4.3 | Section 4.4
Section 5 | Section 5.1 | Section 5.2 | Section 5.3
Section 6
Section 7
Section 8 | Section 8.1
Section 9
Section 10
Section 11 = Conclusions
References
Figures Figure 7.1 | Figure 7.2 | Figure 7.3 | Figure 7.4 | Figure 7.5 | Figure 7.6 | Figure 7.7 | Figure 7.8 | Figure 7.9 | Figure 7.10 | Figure 7.11 | Figure 7.12 | Figure 7.13 | Figure 7.14 | Figure 7.15 | Figure 7.16


* We are grateful to Murray Maybery for very stimulating discussion of some of the issues addressed in this chapter.

1. CONNECTIONIST IMPLICATIONS FOR PROCESSING CAPACITY LIMITATIONS IN ANALOGIES

There is now a reasonable amount of consensus that an analogy entails a mapping from one structure, the base or source, to another structure, the target (Gentner, 1983, 1989; Holyoak & Thagard, 1989). Theories of human analogical reasoning have been reviewed by Gentner (1989), who concludes that there is basic agreement on the one-to-one mapping of elements and the carry over of predicates. Furthermore, as Palmer (1989) points out, some of the theoretical differences represent different levels of description rather than competing models. Despite this consensus about the central role of structure mapping, it really only treats the syntax of analogies, and there are also important pragmatic factors, as has been pointed out by Holland, Holyoak, Nisbett, and Thagard (1986) and Holyoak and Thagard (1989), However in this chapter we are primarily concerned with the problem of how to model the structure mapping or syntactic component of analogical reasoning in terms of parallel distributed processing (PDP) architectures.

According to Gentner (1983), attributes are not normally mapped in analogies, and only certain relations are mapped, the selection being based on systematicity, or the degree to which relations enter into a coherent structure. Gentner (1983) defines an attribute as a predicate taking one argument, whereas a relation is a predicate taking two arguments. Strictly, this only covers binary relations; in general, a relation is a predicate taking two or more arguments, so ternary relations have three arguments, quaternary relations four arguments, and so on. For our purposes a predicate is essentially an N-place relation; it can be defined as a N-place function from the Cartesian product of the N sets to the set {T,F}. This includes unary relations, which are predicates with one argument, and are equivalent to attributes in Gentner's terms. Our derivations based on relations can be applied to functions.

1.1. Structure Mapping Processes

There are a number of process models of human analogical reasoning (Anderson & Thompson, 1989; Bakker & Halford, 1988; Falkenhainer, Forbus, & Gentner, 1990; Holyoak & Thagard, 1989; Rumelhart, 1989). The only one of these models that could be described as connectionist is the Analogical Constraint Mapping Engine (Holyoak & Thagard, 1989). ACME can solve a wide range of analogical reasoning tasks, including water flow/heat flow, logical union, arithmetic addition, solar system/atomic structure, and a number of convergence analogies used by Gick and Holyoak (1980). The structure of base and target are represented as predicate-argument bindings, coded in predicate calculus. A parallel constraint satisfaction algorithm finds the mapping which maximizes the correspondence between source and target. This is achieved by inhibitory connections between rival mappings (tending to ensure that mappings are one-to-one), and by excitatory connections which tend to ensure that if a predicate is mapped, its arguments are mapped, and vice versa.

ACME uses local rather than distributed representations. Connectionist models based on distributed representations have the potential to yield additional advantages of graceful degradation and graceful saturation, together with the emergent properties that result from superposition of representations. We will also argue that PDP models provide a principled mechanism that has the potential to account for processing capacity limitations, and this is an issue that models of analogical reasoning must address if they are to be psychologically realistic.

1.2. Capacity

The concept of information processing capacity has a complex history in cognitive psychology, and a number of conceptual and methodological difficulties have arisen. We will briefly review the history and status of the concept. The modern concept of capacity originated with Broadbent (1958) who argued for a single information channel of limited capacity, and Miller (1956) who argued that human beings could process a limited number of "chunks," but that variable amounts of information could be contained in each chunk. These early approaches were modified by later workers such as Moray (1967) who postulated a limited-capacity central processor, and Kahneman (1973) who proposed a limited pool of attentional resources that could be allocated to one or more tasks.

Further developments of this notion were undertaken by Norman and Bobrow (1975), who developed the performance operating characteristic, a curve which expressed performance on one task as a function of performance on the other. Capacity (sometimes call resource) limitations were indicated when there was a tradeoff between the tasks, such that increased performance on task1 was accompanied by decreased performance on task2. Navon and Gopher (1979) developed the performance operating characteristic into an elaborate, resource allocation model, based on an economic analogy, but the resulting paradigm was complex and difficult to implement. There have been a number of cautionary notes concerning the capacity concept by Duncan (1980), and Fisk, Derrick, and Schneider (1986). Navon (1984) argued that deficits caused by two tasks competing for a limited resource cannot be distinguished from deficits caused by the output or side effects of one task interfering with the other. The claim remains debatable, but either way there is an interesting paradigm by Hunt and Lansman (1982) which is not subject to this difficulty as Halford, Maybery, and Bain (1986) have pointed out.

There has also been growing consensus in recent years that there is more than one resource pool (Baddeley, 1986; Wickens, 1980). It is clear that there are a number of specialized resources, but it is also possible that there is a general purpose processor that has the function of relating other processes to one another, or to the organism's goals. Baddeley (1986) postulated a limited capacity central executive, and Schneider and Detweiler (1987) postulated a limited capacity channel between areas.

Most studies within this tradition have not attempted to place a quantitative value on the amount of information that can be processed. One exception is the work of Miller (1965), who noted that humans seem restricted to short-term memory spans of about seven items (plus or minus two) and also seem restricted to about seven categories in absolute judgments. However, close examination of this intriguing commonality did not lead to a clearly definable factor determining capacity. In short-term memory span the number of items seemed to be approximately constant, irrespective of information value whereas absolute judgments are a function of information value. The commonality became even less plausible when further investigation showed span of absolute judgments varied according to the number of dimensions of stimuli and the experience of the participants (Attneave, 1959; Garner, 1962). Moreover memory span was shown to be a function of word length and rehearsal rate (Baddeley, Thomson, & Buchanan, 1975; Case, Kurland, & Goldberg, 1982; Schweickert & Boruff, 1986; Standing, Bond, Smith, & Isely, 1980).

The most enduring contribution from Miller's paper was the concept of a chunk, which was a single unit or module of information that could vary in size. Short-term memory could hold a restricted number of chunks, but the total information stored could be increased by increasing the information content of each chunk. Thus, if binary digits were recoded as octal digits, people could recall approximately three times as many (e.g., if they store 6 octal digits, they can be decoded as 18 binary digits). Simon (1974) proposed that growth in span with age might be due to increasing the amount of information in each chunk. The restriction on short-term memory seemed to be in the number of independent items, rather than on the amount of information.

1.3. Contemporary Estimates of Capacity

Schneider and Detweiler (1987) argue that because working memory entails numerous cortical regions, each comprising numerous modules, the capacity of working memory is much greater than the "magical number seven" suggested by Miller (1956). There is, however, another type of limitation that arises with the architecture they propose. This is in the number of items, or "messages" that can be transferred from one module to another. Consistent with contemporary PDP models, Schneider and Detweiler (1987) postulate no central executive, but there are central connections which transfer information between modules or between regions. Schneider and Detweiler argue, on the basis of a literature review, that only about four modules can be transferred concurrently.

Evidence for this came originally from work by Broadbent (1975) who analyzed the way items were grouped in recall, and found the largest group tended to consist of three items. He suggested that Miller's "magical number seven plus or minus two" might have resulted from two systems, each with a capacity of three or four items. Broadbent conducted several further experiments which provided converging evidence for his conclusion that we process three or four items concurrently, and Schneider and Detweiler (1987) review further research which points to the same value, including the working memory studies of Baddeley and Hitch (1974).

There is corroborative evidence from studies additional to those reviewed by Schneider and Detweiler (1987). Fisher (1984) found that adults processed four items (range three to five) in parallel in the visual search task, which entails similar representational requirements to the memory scanning task. Halford, Maybery, and Bain (1988) found the capacity of primary or active memory to be four to five items for adults, and two to three items with 8-9-year-olds. These lines of evidence collectively suggest that adults process four independent items of information, or four chunks, concurrently, and that the number is less for children.

Attempts to develop parallel distributed processing models of analogical reasoning raise the question of how much information can be processed in parallel. The number of chunks that can be processed is relevant to memory tasks, but it cannot be related directly to reasoning. As we will see, the complexity of concepts can be quantified in terms of the number of dimensions, or independent units of information, required to define the concept. Therefore, we will consider the relation between chunks and dimensions.

1.4. Chunks and Dimensions

There does not appear to be any empirical evidence concerning the number of dimensions that can be processed in parallel in a cognitive task, comparable to the evidence on limitations in the number of chunks. However, a pointer to the likely capacity for processing dimensions can be obtained by comparing dimensions to chunks. There are several ways in which chunks may be compared with attributes on dimensions.

A chunk is a unit of information which has a certain coherence, in the sense that its components are to a large extent indivisible. The chunk CAT cannot be decomposed into its components "C," "A," "T" and remain the same chunk. Remove one of the letters from CAT and it is no longer the same chunk. Attributes on a dimension are units of information which share this same property of coherence. The attribute "red" on the color dimension is a unit of information in the sense that it cannot be decomposed and remain the same unit. It might be countered that chunks and attributes on dimensions may have fuzzy boundaries, but there would still be limits to the number of features that can be changed without the unit losing its identity.

Each chunk has a unique content; if a chunk has the content CAT, it cannot have an alternative content such as DOG. We can think of chunks as like slots each of which can be filled in one and only one way, at any one time. Attributes within one dimension are similar in that they are in complementary distribution; if the color dimension has the value "red," then it cannot have another value such as "green."

The content of one chunk is independent of the content of other chunks; if the first chunk has the content CAT, the second chunk can have other values, which are logically independent of the content of the first chunk. Similarly, the attributes of different dimensions are independent. If the color dimension has the value "red," the shape dimension can have values such as "square," "round," etc., in a way that is logically independent of the values of the color dimensions.

Chunks are of arbitrary size, and their status does not depend on their information value. For example, words have a much higher information content than letters, which have a higher information content than digits. This is true both intuitively and in terms of the information theory definition of information, where the information value, H = log2N, where N is the number of equally likely alternatives. Yet a word, a letter, or a digit constitute one chunk. The information value of an attribute on a dimension varies depending on the number of alternative values. Therefore, an attribute on a dimension does not specify the amount of information, so it is a unit of information of arbitrary size.

There are, therefore, a number of parallels between chunks and dimensions. Both are independent, coherent, units of information of arbitrary size. The comparison serves to draw our attention to a paradox in the nature of chunks. Because chunks may be of arbitrary size (alphanumeric characters, words, sentences), there is no severe restriction on the amount of information that they may contain. We can increase the total amount of information processed by increasing the amount of information that is packed into each chunk. Yet, although there is little or no restriction on the total amount of information, there is a severe restriction on the number of units or chunks. The restriction is therefore in the number of separate units, rather than in the actual amount of information processed. It seems then that there is limited independence in the system, so that no matter how much information we process, it can only be divided into a small number of independent units.

Dimensions resemble chunks in that they are independent, because an attribute on any one dimension is independent of attributes on other dimensions as we have seen. A value on a dimension, like a chunk, is an independent item of information of arbitrary size. If the limiting factor is the number of independent units, it should apply to both chunks and dimensions. If we hypothesize that the limitation in the number of dimensions is like the limitation in the number of chunks, we can extrapolate from research on chunks to provide an estimate of the number of dimensions that can be processed in parallel. The best estimate of the number of independent dimensions that can be processed in parallel would be four, the integer central tendency of the empirical estimates in the last section.

Our next problem is to consider why such an apparently paradoxical limitation should exist. An answer is suggested by the nature of the representations in PDP models, as we will see in the next section.

1.5. Chunks, Vectors, and Capacity

In a PDP representation information is represented as a pattern of activation over a large number of units, normally expressed in theories as a vector. Given the number of regions in working memory architecture, as discussed earlier, it is likely that more than one vector can be activated, but predicate-argument binding requires that the units comprising the vectors be interconnected. The problem is to estimate the number of vectors that can be processed in parallel. It is possible that we can do this by comparing vectors to chunks. In principle there is no upper limit to the size of a vector, so there is no limit to the amount of information that can be represented. A vector of arbitrary precision can represent an arbitrary large amount of information, but because the units are richly interconnected, there is a sense in which the vector represents one item. This suggests that a vector is somewhat like a chunk, in that both comprise independent items of information of arbitrary size. The limitation in the number of chunks implies a restriction on the number of independent vectors that can be represented concurrently. We will attempt to develop this idea in the formulation which follows. If correct it means that PDP models provide a natural basis for capacity limitations applying to chunks and dimensions as discussed in the last section.

Capacity limitations, in a somewhat different sense, have been known to apply to distributed models for some time. They have been recognized in distributed memory models (Anderson, 1973; Willshaw, 1981), in cognitive models (McClelland, 1986), and in Hopfield nets (Amit, 1989; Hopfield, 1982). These limitations occur because, as more items are stored on the same set of units, weights, the items become less discriminable. Anderson (1973) shows that, for his model, the signal-to-noise ratio = N/K, where N is the number of units and K is the number of items. The number of items that can be stored with clarity maintained can be increased by increasing the number of units. That is, capacity is a function of the number of units used to represent an item. Capacity has also been used to refer to the number of alternative representations over the same set of units (Plate, 1991).

Serial processing models (Newell & Simon, 1972; Anderson, 1983) have not incorporated principled ways of explaining capacity limitations, which is probably one reason why the concept has not received all the attention its importance would seem to deserve. Serial models are based on algorithms in which tasks are segmented into steps that are processed serially, either in a conventional program, or as a production system. More difficult tasks tend to be broken into larger numbers of steps, so difficulty is associated with solution time, but there is no upper limit on capacity to perform.

Applied to the human performer, programs are essentially equivalent to strategies. People can perform a task if and only if they possess an adequate strategy. More difficult tasks simply take longer. Failures are never attributed to overloading capacity, but only to lack of the required strategy. Therefore psychological theories based on the serial processing metaphor tend to downplay the role of capacity.

By contrast with serial processing models, PDP models have a natural interpretation of capacity limitations. We will develop the hypothesis that PDP models provide a natural explanation for empirical data suggesting that humans can process only a limited number of chunks or dimensions in parallel. We will relate this to analogical reasoning through a complexity metric based on dimensionality of structures. Complexity metrics will be discussed in this next section.

1.6. Complexity

If we are to relate capacity to task performance we require a way of assessing task complexity. Two complexity metrics have been developed in the cognitive psychology literature. One complexity metric for reasoning tasks is based on the number of levels of embedding in a subroutine (or goal) hierarchy. It is sensitive to the number of discs on a Tower-of-Hanoi puzzle (Egan & Greeno, 1974) and has also been used by Case (1985). This metric is effective, but has certain difficulties. One is that subroutine hierarchies are not intrinsically constrained, and different hierarchies, with consequent different complexity values, can be obtained by rewriting the program used to model the performance. It has also been argued by Rumelhart and McClelland (1986) that "subroutines ... are probably not a good way to view the operation of the brain" (p. 135). This metric can lead to anomalies when we try to think of complexity in terms of the amount of processing that is performed at each step. The number of levels of embedding in a subroutine hierarchy can be reduced by processing more information in parallel, at each step, but tasks that require a lot of information to be processed in parallel may be harder. Therefore, a metric is needed that takes account of both parallel and serial processing.

Another complexity metric has been used by Leeuwenberg (1969), Restle (1970), and Simon and Kotovsky (1963) to quantify complexity of a wide range of patterns (including spatial patterns, and auditory patterns such as melodies). Simon (1972) has reviewed theories of pattern complexity based on this metric, and has shown that they share common principles. They all assume that people encode patterns in terms of elements and relations, and that they organize the representation in a hierarchic phrase structure. The complexity of a pattern is related to the number of code symbols in the representation. That is, complexity is related to the number of independent symbols required to generate the pattern.

A related metric, that is specifically concerned with concept complexity, was developed by Halford and Wilson (1980) and Halford (1987), and is based on levels of mathematical structures. The essence of the metric is that the complexity of a concept depends on its dimensionality, or the number of independent units of information required to define the concept. The units are of arbitrary size. This is related to the number of arguments in a predicate. A unary relation has only one argument, and therefore has only one dimension of variation. A binary relation has two arguments, and two dimensions of variation, because the values of the arguments can vary independently. Similarly, a ternary relation has three dimensions, and a quaternary relation four dimensions, and so on.

This idea might seem unusual if we are accustomed to thinking of arguments as lists, that can be processed sequentially. However, each argument in a relation defines a dimension of variation. The number of dimensions depends on the "-arity" of the relation. A unary relation on a set S is a subset of S. A binary relation on S is a subset S × S of ordered pairs of elements of S. Similarly, a ternary relation on S is a set S x S×S of ordered 3-tuples of S, and so on. Each argument can take a number of different values, so the number of arguments defines the number of dimensions of variation. Therefore the argument sets are not simply lists, but provide a measure of the degree of complexity in the structure. The number of arguments defines the types of relationships that can exist within a structure. For example, a unary relation, R(x) (e.g., BIG(dog) defines an attribute). Because it has a single dimension of variation, it is not possible to specify the way an attribute varies as a function of another attribute. This becomes possible with binary relations, R(x,y) (e.g., BIGGER(horse,dog)), which can express the way variations in the size of a horse or dog would affect the relation. With ternary relations, R(x,y,z), it is possible to define the way one attribute varies as a function of one other, and how it varies as a function of two others, the latter not being possible with binary relations. As the "-arity" of a relation increases, so do the orders of interaction that are possible. This is a most important feature of predicates for our purposes, because in memory (Humphreys, Bain, & Pike, 1989), in cognitive development (Halford, in press), and in the present context of analogical reasoning, we find it necessary to model the orders of interaction among the dimensions of a task. Functions are a special case of relations; in functions mappings are unique. A zero-variate function, f( ) = a (e.g., PRIME-MINISTER-IN-1990() (=Hawke)), is equivalent to a symbolic constant. It has no argument, but it has one dimension in the sense defined above, because the range is a source of variation, but is not conventionally counted as an argument. Univariate functions have one argument and two dimensions. A univariate function f(a) = b, is a set of ordered pairs (a, b) such that for each a there is precisely one b such that (a,b)∈f. Both a and b contribute sources of variation, which is why a univariate function is two-dimensional; in effect, it is equivalent in complexity to a binary relation. In a similar way, bivariate functions, f(a,b)=c have two arguments and three dimensions, and so on. A unary operator has two arguments, and is a special case of a univariate function (e.g., the unary operator CHANGE SIGN comprises the set of ordered pairs (x,-x)). Binary operations have three arguments, and have the same dimensionality as ternary relations and bivariate functions. (A binary operation on a set A is a function from the set A×A of ordered pairs of elements of A into A (i.e., A×AA). A bivariate function is defined as f:A×BC.) The number of arguments in quaternary relations corresponds to the number of arguments in trivariate functions and in compositions of binary operations. For example, the composition of binary operations of the form a(b + c) = d is defined as the set of ordered 4-tuples {(3,2,4,18),...,(2,7,3,20),...}. The number of arguments is related to the dimensionality of a structure because it represents the number of independent terms required to define the structure, and the orders of interaction within the structure. In the context of analogical reasoning, the complexity of a structure mapping task can be related to the dimensionality of the structures being mapped. This will be considered in the next section.

2. A STRUCTURE-MAPPING METRIC FOR CONCEPTUAL COMPLEXITY

We will define a structure as a set of elements, with a set of functions or relations defined on the elements. A structure-mapping is a rule for assigning elements of one structure to elements of another, in such a way that any functions, or relations between elements of the first structure correspond to functions, or relations in the second structure. In an idealized sense, structure mappings in analogies are isomorphisms (Holyoak & Thagard, 1989) though in practice the mappings might be incomplete (e.g., if one structure contains more elements than another).

A structure-mapping based on functions is valid if it is commutative, as Halford and Wilson (1980) and Holland et al. (1986) have pointed out. The concept of a commutative diagram originated with category theory (Maclane, 1972), and is different from commutativity in arithmetic. The essence of it is shown in Figure 7.1.

Suppose we have a source structure consisting of elements S1, and S2, and a function between them, as shown in Figure 7.1. We also have a target structure consisting of elements T1, and T2, with a function between them. Then we have mappings m1 and m2 from source to target structures. Commutativity corresponds to the fact that the diagram is closed, and we arrive at the same result irrespective of the path taken. That is, if we apply fs then m2 we arrive at T2 and if we first apply m1 then ft we also arrive at T2. Formally,
m2 ° fs = ft ° m1 (1)

Where structure-mappings are based on relations, the corresponding diagram is as shown in Figure 7. 1. The criterion for validity is the same as that provided by Suppes and Zinnes (1963) and Coombs, Dawes, and Tversky (1970) for representations in measurement theory.

A system α = (A,R) is said to be represented by another system P = (B,S), if there exists a Function f from A into B (which assigns to each x in A a unique f(x) in B) such that for all x,y in A
x R y implies f(x) S f(y). (2)
Thus, α is represented by β if there exists a correspondence f that maps A into B in such a way that if the relation R holds between some x and y in A then the relation S holds between f(x) and f(y) in B, where f(x) and f(y) are the images of x and y respectively. (Coombs et al., 1970, p. 11)

Figure 7.1.
Structure Mappings Based on Functions (Figure 7.1a) and Relations (Figure 7.1b).

Figure illustrating equations (1) and (2)


Coombs et al. (1970) refer to systems, but the same criterion can be applied to structures, where a structure is defined as a set of elements with one or more relations.

In general, valid structure mappings required that predicates in one structure correspond to predicates in the other. Correspondence is defined in the following way: a predicate P in structure 1 is in correspondence with predicate P' in structure 2 if and only if the arguments of P are mapped into those of P' and vice versa.

As Holyoak and Thagard (1989) point out, process models approximate these requirements, which are treated as constraints which may be violated occasionally in the interests of satisfying other constraints. In the interests of generality, it is worth noting that valid representations are defined in a similar way to structure mappings in analogies (Halford & Wilson, 1980; Holland et al., 1986; Palmer, 1978). A representation is a mapping from a representing to a represented structure. A cognitive representation is a mapping from a cognitive structure to a structure in the world. An analogy is a mapping from one cognitive representation to another.

2.1. Levels of Mapping

We can define several levels of mapping, corresponding to the arity of predicates as discussed earlier. The levels also correspond to levels of structure mapping as defined in the cognitive developmental theory of Halford (1987). Furthermore, they can be related to the number of vectors in a PDP representation, as we will see. They are summarized in Figure 7.2.

2.1.1. Mappings Based on One-Place Predicates. These correspond to element mappings as defined by Halford (1987). We can express this type of mapping as:

M: R(e) ↔ R(e'i)

It is a mapping between two structures each of which is based on a unary relation. The elements, e., of one structure are mapped into the elements of the other structure, so that the unary relations, R correspond. Because unary relations are predicates with one argument, they correspond to attributes in Gentner's (1983) terms, so mappings based on one-place predicates are validated by attribute similarity. Similarity can be treated in terms of shared features (Hesse, 1966; Tversky, 1977), but Holyoak and Thagard (1989) have shown that in structure mapping models it is sufficient to assume a numerical index of similarity between two predicates. To the extent that metaphors are based on attributes rather than relations (which appears to be true in some cases, Gentner, 1983), mappings at this level correspond to metaphors.

2.1.2. Mappings Based on Two-Place Predicates. These correspond to relational mappings as defined by Halford's (1987) developmental theory. They are defined as:

M: R(ei,ej) ↔ R(ei',ej')

The elements in structure 1 are mapped into the elements in structure 2 so that the binary relations in the two structures correspond. At this level there need not be any similarity between elements in the structures, and mappings are validated by similarity of binary relations. Simple proportional analogies of the form A:B::C:D belong to this level. For example, the analogy woman:baby::mare:foal is validated by the similar relation, "mother-of" in source and target.


Figure 7.2.
Levels of Structure Mappings, with Tensor Product Representation.

Figure 7.2


2.1.3. Mappings Based on Three-Place Predicates. These correspond to system mappings in terms of Halford's (1987) theory. Elements are mapped so that ternary relations correspond. At this level there need not be any resemblance even between the binary relations in the two structures. An example of a system mapping is shown in Figure 7.3. The elements a,b,c in structure 1 are mapped uniquely into the elements #,*,$ in structure 2, but there need not be any resemblance between the individual relations in the two structures (i.e., R need not resemble R'). However, there may be higher order relations that are similar in both structures. For example, if the relations R and R' in Figure 7.3A are asymmetric and transitive, then both structures form ordered sets, so there is a higher order similarity between the structures in this respect. N-term series (transitive inference) problems are examples of system mappings (Halford, 1987, 1989). In Figure 7.3A, if R is interpreted as "larger than," then we have A>B, B>C, and A>C, and ABC form a set which is ordered for size.


Figure 7.3.
System Mappings.

Figure 7.3


System mappings are validated by structural correspondence. The mapping in Figure 7.3B is invalid because it violates the correspondence property. Notice that R is mapped into R' on two occasions, and into the inverse of R' on the third occasion. Similarity is not logically necessary, although it may facilitate mapping, as in case of high transparency (Gentner & Gentner, 1983), or may make it easier to recognize the possibility of mapping (Holyoak & Koh, 1987). Both the correspondence and consistency criteria apply to system mappings. System mappings can be defined as:

M: R(ei,ej,ek) ↔ R'(e'i,e'j,e'k)

The most general expression of the structures in system mappings is as a ternary relation, but they may also be expressed as a binary operation, ei * ej = ek, or as a composition of two or more binary relations, ei R ej, ej R el, ei R el. System mappings permit a high degree of flexibility and abstraction, because they can be used to establish correspondences between structures that have only formal similarities. They also permit analogies to be recognized between superficially dissimilar situations (Halford & Leitch, 1989). However, this flexibility and generality is obtained at the cost of higher information processing loads.

2.1.4. Mappings Based on Four-Place Predicates. These correspond to multiple system mappings in Halford's (1987) theory They are defined as:

M: R(ei, ej, ek, el) ↔ R'(e'i, e'j, e'k, e'l)

Elements are mapped so that quaternary relations correspond. The quaternary relations may be interpreted as compositions of ternary relations, binary operations, or bivariate functions. Like system mappings, multiple system mappings are validated by structural correspondence, and are independent of element or relational similarity. They differ only in the complexity of the structures represented.

In theory mappings based on predicates with more than four arguments are possible, but it is difficult to attach any psychological meaning to such structures, because human processing capacity limitations appear to prevent structures of higher dimensionality being mapped.

2.2. Dimensionality of Mappings

The four levels of structure mapping can be related to the complexity metric discussed earlier through the dimensionality of the structures they entail. Element mappings entail one-place predicates, and are one-dimensional. Relational mappings entail two -place predicates, and are two-dimensional, System mappings entail three-place predicates, and they are three-dimensional. Multiple system mappings entail four-place predicates, and they are four-dimensional.

3. HUMAN ABILITY TO PROCESS STRUCTURE

The complexity metric for structures can be used to investigate human ability to process structure. The argument was made earlier that humans would be able to process a maximum of our dimensions in parallel. A prediction that can be derived from the theory of complexity is that tasks which require more information to be processed in parallel impose higher demands on processing resources (Kahneman, 1973). Higher level structure mappings require more dimensions to be processed in parallel, and therefore should use more of the available resources, which should produce more interference with a competing secondary task.

To test the prediction that humans process only four dimensions in parallel, we need a task in which information from four dimensions must be considered jointly This can be achieved by asking participants to interpret statistical interactions. A three-way interaction (with 3 independent variables and one dependent variable) is a set of ordered 4-tuples {(A,B,C,N), ... }. If the dimensions are processed in parallel, and the number of dimensions is limited to 4, it should be possible to map three-way but not four-way interactions.

Consider a statistical interaction such as that shown in Figure 7.4. Interpreting the interaction entails mapping a mental representation of the figure into a verbal or semantic representation, and therefore includes a structure mapping task. Accordingly, we surveyed academic staff and graduate students in psychology, education, statistics, and economics who were experienced in interpreting statistical interactions. They were asked, assuming clear presentation in an appropriate figure, and not being concerned with scaling effects or nonlinearity, what is the highest level of interaction they could interpret unambiguously, and with confidence. Ten answered two-way, 14 answered three-way, and 6 answered four-way. The integer central tendency is three-way, consistent with the prediction that up to four dimensions can be mapped.

In a further series of experiments, as yet unpublished, we presented participants with graphical representations and verbal descriptions of 3- and 4-way interactions. Bar graphs were used with bars that consisted of movable strips, so participants could restructure the figure to optimize presentation of the interaction for their own purposes. The graphical and verbal representations were isomorphic, but used different labels. Participants' task was to map one into the other (i.e., decide which verbal label mapped to each graphical label).*


*In the actual experiment bar graphs were used because they facilitated rearranging the presentation to suit individual subjects, but the principle of the task is the same as shown in Figure 7.4. Participants were required to map the labels (e.g., A1/A2; B1/B2. . ) into the verbal labels in the description which follows (e.g., decide which of A1,A2,...,D2, should be mapped to A-grade, which to B-grade, which to G-rated, etc.). A sample verbal description was: The judged attractiveness of films depends on four factors-their grade, studio-origin, genre, and rating. In the description below, the influence that any one of these variables has on the judged attractiveness of films is referred to as "an effect" or "the effect. " When A-grade, Paramount films are G-rated films, they are judged as more attractive when they are adventure than when they are A-grade, MGM films are G-rated films, they are judged as more attractive when they are adventure than when they are comedy, and the reverse is true when A-grade, MGM films are R-rated films. On the other hand, when B-grade, Paramount films are G-rated films, they are judged as more attractive when they are adventure than when they are comedy and this is also true when B-grade, Paramount films are R-rated films-overall they are judged as more attractive when they are G-rated. However, when B-grade, MGM films are G-rated films, they are judged as more attractive when they are adventure than when they are comedy, and there is a larger difference than when B-grade, MGM films are R-rated films. The solution is at the end of this Chapter.

Figure 7.4.
Four-Way Statistical Interaction, as Used in Problem Described in 3.0.
Attributes A1, A2, ..., D2 are factors which determine an effect, measured by numerical values on the ordinate.

Figure 7.4


Everything was done to constrain participants to process the interactions in parallel (e.g. they were told to try and see the problem as a whole, not to test hypotheses serially, etc.). Results so far indicate that the mapping can usually be made with three-way, but rarely with four-way interactions. Given that a three-way interaction is a four-dimensional structure, and four-way interactions are five-dimensional structures, this is consistent with the hypothesis that humans can map four- but not five-dimensional structures. Although presentation was two-dimensional, four-dimensional structures were mapped, so mapping was not linked to presentation in any direct or simple fashion. The number of dimensions mapped also exceeds the number that can be visualized, suggesting that the limitation is not directly related to our ability to visualize in three dimensions.

The prediction that higher level mappings entail higher processing loads has been tested by assessing loads in tasks in which level of mapping was varied, holding other factors constant. Maybery, Bain, and Halford (1986) compared N-term series (transitive inference) tasks which require premise integration with closely matched tasks where premises could be processed separately Premise integration entails mapping two relational premises (e.g., John is taller than Tom, Mike is taller than John) into an ordered set, and is a system mapping. By contrast, when premises are processed singly (e.g., when verifying the truth of the separate premises, John is taller than Tom, Mike is taller than John), only one binary relation is processed at-a-time, and the task is equivalent to a relational mapping. Probe reaction time was used to assess information processing loads, and was significantly lengthened when the probe coincided with premise integration. Comparison with the condition where premises were processed singly showed that the effect was due to the amount of information processed in parallel.

Halford, Bain, and Maybery (1984) tested the prediction that the multiple system mapping task imposed a higher processing load than the system mapping task. They presented adult participants with arithmetic problems such as:

(7 [ ] 3) / 4 = 1     (system-mapping)
(7 [ ] 3) [ ] 4 = 1     (multiple system-mapping)

It is assumed that these problems are solved by mapping them into actual arithmetic expressions, such as (7 - 3) / 4 = 1. In the system-mapping task only one binary operation (indicated by square brackets) must be found, because the other operation is supplied, and the expression can be converted to (7 [ ] 3) = 4. In the multiple-system problem, two binary operations (subtraction and division) must be found. Because two binary operations must be mapped, the task is equivalent to a multiple system mapping. It is not possible to solve for one first then decide the other, because there is no way of knowing you are right about one operation until you know what both of them are. Because the conditions of the task tended to prevent extensive serial search, there was a press to process the operations in parallel. A concurrent task, requiring both retention and rehearsal of a short-term memory load, was used as a secondary task to assess processing load. Interference with the secondary task was greater for the multiple system-mapping task indicating that it imposed a higher processing load, as predicted by structure-mapping theory.

We want to try and develop an approach to structure mapping that takes account of the capacity limitations of human processors, and can explain the higher processing loads entailed by structures of higher dimensionality. As suggested before, PDP architecture is probably more appropriate for this purpose than serial processing models, or models which use local representations. However, first we must solve the problem of how to treat structures in PDP representations. This essentially entails representing predicate-argument bindings.

4. REPRESENTATION OF PREDICATES AND ARGUMENTS

In both ACME and COPYCAT, representations are local in Smolensky's (1990) sense. That is, each element of the representation is coded as a single unit, rather than as a set of features which are distributed over a number of units. In COPYCAT codelets build structures representing such things as relations between elements. In the ACME model predicates and arguments are represented using predicate calculus. However, a genuine PDP model really requires distributed representations, otherwise the advantages of stability, graceful degradation, and graceful saturation of PDP models are lost. Another reason for favoring PDP models is that there are now successful distributed memory models of human memory (e.g., Humphreys et al., 1989), and it may be possible to integrate models of human analogical reasoning with human memory models to some extent. The problem then is how to represent predicate-argument bindings as distributed representations. This problem, and the related problem of representing variable-constant bindings, has received considerable attention in the recent literature.

Hinton (1981) represents a predicate and its argument(s) by a set of vectors, and the predicate-argument binding is represented by a "PROP" vector, which represents the conjunction of the predicate and argument vectors. A predicate-argument binding can be created by adjusting the weights between predicate-argument vectors and the PROP unit. Pattern completion, or recovery of a missing predicate or argument, can be performed using backward links from the PROP unit, but these links have to be learned.

Murdock (1983) suggests that bindings between relationships and arguments can be represented as convolutions of vectors. Thus, if R is a vector representing the relationship, and A1 and A2 are representing the arguments, then the relationship-argument binding can be represented by R*(A1*A2) or A1*(R*A2), where "*" represents the convolution operation. A vector stored in the convolution can be recognized by a statistical decision procedure.

Smolensky (1990) has proposed that the variable-binding problem can be handled in terms of tensor products. The essence of this approach is shown in Figure 7.5. In a distributed representation, a variable is represented as a vector x on a vector space VR. A constant that is bound to the variable is also represented as a vector c, in a vector space VF. The binding of the constant to the variable is represented by the tensor xc: different variable-constant bindings correspond to different patterns of activations in the tensor product units. Smolensky describes how the tensor product formalism may be faithfully represented as a (local or distributed) neural network (Section 3.4 of Smolensky, 1990). Activations in tensor product units can model rapid dynamic bindings of variables and constants, and assuming that the concepts in each component space of the tensor product are represented by an orthonormal set of vectors, then perfect retrieval of a missing component is possible. This property is important to the formulation that we will develop.


Figure 7.5.

Figure 7.5


We will extend Smolensky's approach to variable binding to handle the predicate-argument binding problem. We shall describe our applications of these ideas in terms of the tensor product terminology, noting that it can all be transformed into explicit neural networks. In the exposition which follows, we will assume familiarity with Smolensky's (1990) approach.

The tensor product encoding of predicate-argument bindings will be used in a model of analogical reasoning, which incorporates the levels of structure discussed earlier. We call this the Structural Tensor Analogical Reasoning (STAR) model.

4.1. Tensor Product Encodings of Predicates

The tensor product represents the predicate that a particular variable has a particular value. The tensor xc represents the predicate HAS-VALUE(x,c), where x is a variable and c is a constant. As Smolensky's paper is directed at variable binding, it is natural that the tensor product is interpreted in this way. However, there is nothing canonical about this interpretation, and so tensor products can be validly interpreted as representing the binding of a predicate to an argument. Therefore, a rank 2 tensor product may be interpreted as the binding of a predicate to a single argument. We will generalize the approach to deal with predicates of one or more arguments.

A predicate with a single argument can be interpreted as a zero-variate function or unary relation. It can be represented as a tensor product of two vectors, one representing the predicate and one representing the argument. A predicate with two arguments can be interpreted as a univariate function, or a binary relation. It can be represented as a rank 3 tensor product, with one vector representing the predicate and two representing the argument. Similarly, a predicate with three arguments, which can be interpreted as a bivariate function, a binary operation, or a ternary relation, can be represented as a rank 4 tensor product, with one vector representing the predicate and three vectors representing the arguments. Similarly, a predicate with four arguments can be represented by a rank 5 tensor product. These representations correspond to the levels of structure mapping defined earlier, and are summarized in Figure 7.2.

It should be noted that if the predicate can be taken as fixed or constant, then it can be represented by a tensor product of rank 1 lower than specified above. For example, if a predicate with two arguments is constant, than it is sufficient to represent it as the tensor product of two vectors, representing the two arguments. The additional vector representing the predicate becomes unnecessary because it would always have the same value, and is not required to constrain the activations in the other vectors. This represents a degenerate case which we simply note in passing, and it plays no part in further developments in this chapter. The cases with which we are concerned all entail multiple predicates being represented on the same structure (e.g., the relations "larger than", "mother of," etc., can all be superimposed on the same set of units representing the tensor product of three vectors).

4.2. Tensor Product Encodings of Multiple N-Place Predicates

The representation of multiple binary relations is shown if Figure 7.6. It comprises a tensor product of rank 3: Varg1Varg2Vpred. That is, one vector represents the predicate, and the other two represent the arguments. These are shown as the vectors u, v, w. The predicate-argument binding is represented as the activations of the tensor product units, each of which is connected to one unit in each of u, v, w. These connections are shown in Figure 7.6a to 7.6c one vector at a time to make the connections clear. The units are shown as a whole in Figure 7.6d, but without the connections. Figure 7.6e shows the symbol we will normally use to represent a tensor product of given rank. It comprises the tensor product symbol in a polygon, the number of sides of which represents the number of vectors. Figure 7.7 provides details of the tensor product units. An alternative representation of a rank 3 tensor product is shown in Figure 7.8.


Figure 7.6.
Visualising a rank 3 tensor product net which also has three nodes in each component space.

At any time, one of the vectors u, v, and w will act as an output and the other two as inputs for the net. In this figure u is shown as output, and v and w as inputs. Parts (a)-(c) show connections: e.g. the middle component of u, u2, is connected to all the nodes in the central horizontal plane in part (a), and its output value is the sum of all their outputs. Part (d) is a synthesis of parts (a)-(c), but omits connections, and shows activations in the tensor product nodes. Part (e) shows a rank 2, 3 and 4 nets in a symbolic representation used in this chapter.

Figure 7.6


Figure 7.7.
Internal organisation of a node in a tensor product network

Tensor product nets operate in two modes, teaching mode and retrieval mode. This figure illustrates these modes in a rank 3 tensor product net. In teaching mode, the net is presented with 3 input vectors (call them u, v, and w); the (i,j,k)-th tensor product node will take as inputs the i-th component of u, the j-th component of v, and the k-th component of w. These inputs will be multiplied together and added to the activation value in the tensor product node. In retrieval mode, the net is presented with 2 input vectors (say u and v; w will then serve as the output vector). The (i,j,k)-th tensor product node will multiply together its activation value and its u and v inputs (namely the i-th component of u and the j-th component of v) and output the result as a contribution to the k-th component of w. The value of the k-th component of w will thus be the sum of the contributions from the (i,j,k)-th tensor product nodes for all values of i and j.

Figure 7.7


This approach can be generalized to higher level structures. A tensor product Varg1Varg2Varg3Vpred of rank 4 can represent a collection of three-place predicates. A tensor product Varg1Varg2⊗...⊗VargNVpred of rank N + 1 can represent a collection of N-place predicates. One vector represents the predicate, and the remaining vectors represent the arguments.

A specific example of a rank 4 tensor product would be arithmetic addition. We would have one vector representing the addition operation, and three vectors representing the arguments, which in this case are the addends and the sum. For example, the representation that 3,5 → 8 under the operation of arithmetic addition would comprise a vector representing addition, and vectors representing 3, 5, 8. The advantage of this representation is that all combinations can be inferred (e.g., given that the operation is addition, and that 3 and an unspecified number x gives 8, the solution to x can be found by the pattern completion process.


Figure 7.8.
Network Connectivity for a 3x3x3 Tensor Product Network

Figure 7.8


Transitivity would be represented as a rank 4 tensor product. Humans make a transitive inference based on the ternary relation, which we will call TRANSITIVELY-ORDERED-by-R(a,b,c). To represent this we need a vector representing the predicate TRANSITIVELY-ORDERED-by-R, and vectors representing each of the arguments, a, b, c. This representation can be decomposed into three rank 3 tensor products, representing aRb, bRc, and aRc. This can be done by entering a random vector, or a fixed vector, into one rank of the tensor product, in a manner that is mathematically equivalent to the procedure of Humphreys et al. (1989). For example, if we enter a fixed vector in place of c, this effectively collapses over c, leaving a representation of aRb. Each rank 3 tensor product will comprise a vector representing the relation R, with a pair of arguments, such as a and b.

4.3. Retrieving Stored Predicates from a Tensor Product Representation

Smolensky's application of tensor product representations for variable binding utilizes the stored information in a function-like way: that is, the two-place predicate HAS-VALUE(x, y) which he implicitly studies can be viewed as a unary function VALUE(x) = y. Smolensky's method of unbinding a variable x (that is, retrieving its stored value c) depends on this functional property. In effect, retrieval is done by presenting a pattern representing x to the network, which responds with the pattern representing c. We need to consider how patterns corresponding to specific predicates or arguments can be retrieved from the representations we propose. For example, if the relations of interest include one like TALLER-THAN, then it is possible that

TALLER-THAN(Janet, Bill)
TALLER-THAN(Janet, Graeme)

might both be true. Presenting the pattern representing Janet to the network would result in the output of the sum of the patterns for Bill and for Graeme (plus the patterns for any other Individuals P of which TALLER-THAN(Janet, P) is true). It might be difficult to extract Bill, say, from this output pattern. In fact, provided the patterns representing Bill, Graeme, and other such individuals P are orthogonal (in the sense of the usual vector inner product), it is simple, in principle, to extract Bill from the output of what Smolensky (1990, Section 3.1) calls the exact unbinding procedure. This is done by computing the dot product of the pattern for Bill with the output pattern. If the result is nonzero, then TALLER-THAN(Janet, Bill) is true, according to the information stored in the network. This process is illustrated in Figure 7.9. The possible solutions are tested serially.


Figure 7.9.

Predicate Retrieval from a Tensor Product Network. To show that P(x,y) is true, present P and x to the network. The output will be the sum s of all concepts p such that P(x,p) holds. Connect this output s to a network which computes the inner product z of s with y. If z is non-zero (non-white, in the diagram), then P(x,y) is true.

Figure 7.9


4.4. Alternative Role-Filler Representation of Predicates and Arguments

It might seem natural to represent an N-dimensional structure as a binding of N fillers and N roles in the manner suggested by Smolensky (1990), as shown in Figure 7.10. For example, the binary operation of arithmetic addition, with mappings of the form a + b = c, could be represented as a set of bindings to particular values for a, b, and c. For example, if a = 3, b = 2, and c = 5, as shown in Figure 7.9, then the tensor product binds the values of the variables to the constants. However, this representation does not permit the recovery of any vector, given the N−1 remaining vectors (e.g., it does not permit the recovery of the vector representing b given the vectors representing a and c). We regard this ability as essential to the most creative and dynamic use of representations in reasoning. As we will see, it is essential to analogical reasoning which will be considered in the next section.


Figure 7.10.
Multiple Bindings as an Alternative Method for Storing Predicates

Method: The bindings a=3, b=2 and c=5 are stored in the tensor product network in a structure in the sense of Smolensky (1990). Further bindings, such as a=2, b=4, c=6 would be stored as (distinct) structures in the same network. To store more than one type of predicate, one would add to each structure a binding such as pred=sum, where sum(a,b,c) signifies a+b=c. [In this figure, roles and fillers are shown as represented by nodes, for greater clarity of presentation. In a distributed version, roles and fillers would be represented by patterns.]

Figure 7.10


The representation we have chosen is consistent with the distributed memory model of Humphreys et al. (1989). Their aim was to model the three-way relations between context, cue, and target in memory retrieval. For example, if asked "what did you eat for breakfast on Sunday?" "What did you eat for breakfast?" is the cue, "on Sunday" is the context, and (say) "bacon and eggs" is the target. Humphreys et al. (1989) have shown that computation of three-way relations between context, cue,and target requires a representation based on the tensor product of three vectors. The same is true in the present context. For example, given a + b = c, if we want to compute c, given a and b, or b, given a and c, etc., then each of a, b, and c must have a separate representation, and the three-way relations between them must be represented. A rank 3 tensor product achieves this, whereas some alternatives, such as the role-filler representation, do not. Where the operation is also a variable, an additional vector is required, so arithmetic operations would normally be represented by a rank 4 tensor product.

5. APPLYING THE MODEL TO ANALOGICAL REASONING

The model of storing predicate/argument information described above provides a PDP framework for performing certain analogical reasoning tasks. We will begin with simple analogical reasoning tasks of the form a:b::c:d because these have been very common in the psychological literature, and they contain clear examples of the processes we wish to consider.

The particular analogy we will use for illustration is woman:baby::mare:? (i.e., woman is to baby as mare is to what - the approved answer is foal). To solve such problems, it is necessary to be able to represent multi-argument predicates (in this case, two arguments) and to access the stored information in a number of different manners. For example, given the predicate P and one argument a1, one needs to be able to retrieve the/an entity a2, such that P(a1, a2), and given a1, a2, it is necessary to be able to find a predicate P such that P(a1, a2). The tensor product formalism allows both these operations.

What follows is a basic symbolic version of our model of simple analogical reasoning. We will illustrate the model by solving the problem woman:baby::mare:?. Subsequently we shall elaborate some parts of the model where simplicity of exposition requires that gaps be left. On seeing woman:baby, the reasoner associates with this pair the binary predicate MOTHER-OF(—). On seeing the cue mare for the second part of the problem, the reasoner places mare in the first argument position for MOTHER-OF, and then tries to retrieve the filler for the missing argument position in MOTHER-OF(mare,—).

In our PDP framework, we have a rank 3 tensor product Varg1Varg2Vpred. as in our earlier treatment of binary predicates. The situation is illustrated in Figure 7. 11, in which the triangle with the ⊗-sign in it, (in general, the symbol will be a polygon with as many sides as the rank of the tensor product), is used to denote the whole of the rank 3 tensor product net Varg1Varg2Vpred. In Figure 7.11, arg1 = woman, arg2 = baby, and pred is inferred to be MOTHER-OF(—,—). The patterns for woman and baby become inputs to the net, and the pattern for the predicate MOTHER-OF(—,—) appears as output on the appropriate "side" of the tensor product net. In the final phase of the reasoning process, mare is substituted for woman, and MOTHER-OF(—,—) becomes an input pattern rather than an output. The Argument 2 slot now produces the output pattern, which should represent foal (see Figure 7.11).


Figure 7.11.
Steps in Simple Analogical Reasoning

Figure 7.11


As foreshadowed above, there is a gap in this description. There are more solutions than just MOTHER-OF(—,—) to the problem of finding a relation ρ such that ρ(woman, baby). For example, ρ could be LOVES, FEEDS, or LARGER-THAN. A related problem is that the arg2 output in the final step could be colt, for example. However, the solution to ρ(woman, baby) need not be a single predicate. At least in our society, all of LOVES, FEEDS, or LARGER-THAN are consistent with motherhood, and even in other societies more than one predicate would be consistent with motherhood. Therefore the solution to ρ(woman, baby) can be a vector representing a predicate bundle, BUNDLE-MOTHER-OF comprising the predicates that are consistent with the mother-baby relation. In a similar way, foal would be a solution to X in BUNDLE-MOTHER-OF(mare,X). The solution to X might be an argument bundle representing X such that BUNDLE-MOTHER-OF(mare, X) holds. It is also possible that, at certain levels of description, concepts like motherhood could be represented as the weighted sum of vectors representing N-place predicates like "loves", "feeds", "larger than", "female", etc. Some natural categories are based on prototypes, which are weighted averages of attributes (one-place predicates), and there seems no reason in principle why they should not be weighted averages of N-place predicates. However, this issue goes beyond our present argument, and requires further exploration.

A certain amount of ambiguity is inherent in the syntax of analogies. For example, the analogy woman:baby::mare:? could have "rabbit" as a logically correct solution, because the relation "larger-than" holds between woman:baby and also between mare:rabbit. The fact that "foal" is considered a better solution has more to do with pragmatics than with syntax. Foal might be considered a better solution because "mother of" is a relation that is more unique to the triplet woman-baby-mare than is "larger than." Alternatively the favored solution might be considered more interesting or novel. The fact that these additional criteria are necessary highlights the importance of pragmatic factors, which are a feature of the model of Holyoak and Thagard (1989) as mentioned earlier. Such ambiguities cannot be resolved by purely syntactic criteria, and given that it is our aim to model the syntactic aspects of analogies, it is appropriate that our model should incorporate the ambiguities that are inherent in the syntax of analogies.

It is possible however to remove the ambiguity by the procedure mentioned previously for retrieving a predicate from a network. If the output from the process above consists of a vector which represents the sum of the possible solutions to the analogy woman:baby::mare:foal, then the answer foal can be recognized by computing the inner product of the output and the vector representing foal. If the inner product, k is nonzero, then foal is a solution. Furthermore, the larger k is, the more relations are satisfied by the pair mare:foal in common with the pair woman:baby. This might be taken as a measure of salience of the solution. It is possible that the most favored solution is the most salient in the sense that it satisfies the most relations; we might prefer foal as a solution because it fits not only the relation mother of, but also loves, feeds, and larger than. As the model presently exists, potential solutions must be processed serially.

5.1. Simulation of Tensor Product Net

The tensor product net simulation that we built was of rank 3 and dimensions 6×16×16 - 6 different binary predicate symbols and 16 different arguments for each of Positions 1 and 2. Let us denote the tensor product space as UVW.

Orthonormal bases for R6 and R16 were found, and checked to ensure that no basis vector contained a large number of zero components (as such vectors would have violated our goal of a distributed representation).

A set S of predicate instances of the form -(x,y), such as:

loves(woman, baby)
mother-of(woman, baby)
feeds(woman, baby)
larger-than(woman, baby)
mother-of(mare, foal)
larger-than(mare, foal)
larger-than(mare, rabbit)

and so on, was chosen. The predicate symbols loves, mother of, feeds, larger than, and so on were mapped arbitrarily to basis vectors for R6 and the argument symbols woman, baby, mare, foal, rabbit, and so one were mapped arbitrarily to the basis vectors for R16 Let u(P) denote the vector representing the predicate symbol P in U, and let v(x) and w(y) denote the vectors representing the argument symbols x and y in V and W, respectively, so that for example v(x), denotes the jth component of the vector representing the symbol x in V. Each node Tijk, of the tensor product array was preset to the sum, over the chosen predicate instances P(x,y) in S, of u(Pi)v(xj)w(yk):

Tijk = Σ u(Pi)v(xj)w(yk)

In other words, the predicate instances in S were stored in the network.

Any of u, v, and w can be recovered from the other two and the tensor Tijk, for example:

u(Pi) = Σ Tijk v(xj)w(yk).

This representation can handle a number of analogical problem formats. To solve our reference analogy problem, woman:baby::mare:?, the patterns v(woman) and w(baby) were presented, in Phase 1, to the V- and W- inputs of the network. The U-output, which would in this case be the sum of the patterns representing loves, mother of, feeds, and larger than, was saved, and re-presented as U-input in phase 2, along with mare as V-input. The W-output, say w, was then interpreted. The interpretation phase is necessary because humans looking at the W-output use a different representation of concepts like foal, woman and rabbit from the 16-tuples used in this simulation; this phase corresponds to articulating the answer to the analogy problem. The interpretation phase was implemented by forming the inner product of w with each of the patterns w(woman), w(baby), w(mare), w(foal), w(rabbit), and so on, which form the basis of the 16-dimensional space W This process yields "salience" scores for each of the possible arguments as solutions to the analogy problem. The argument with the highest salience score wins. In the present state of the model, the potential solutions must be processed serially.

5.2. Other Cases of Analogy

There does not appear to be any agreed-upon taxonomy of analogical reasoning tasks, but Figure 7.12 presents a simple classification which is useful to the argument. It divides analogies into six basic types, although this number could be doubled by reversing base and target, but these differ only trivially from those presented.

5.2.1. Complete Structure. Both target and base structures are complete at presentation, and the task is to find the correct map between them. This case includes explanatory analogies, and the SME model (Falkenhainer et al., 1990) and the ACME model (Holyoak & Thagard, 1989) have been applied to some analogies of this type. Many of the tasks that have been used to test these models, such as the water flow/heat flow and Rutherford analogy, entail complex structures, with many possible mappings between base and target, so much of the interest has centered around devising psychologically realistic algorithms for finding the best match.

5.2.2. Missing Object. All target elements are supplied, and all except one base element. The task is to find the missing base element. The A:B::C:? analogy which commonly occurs in psychometric tests, and used for illustrative purposes earlier, is of this type. Given that n-ary relations can be used, this case can include some of the problems to which COPYCAT is devoted. These are of the form a,b,c:a,b,d::x,y,z:? The example problem woman:baby::mare:? is of this type.

5.2.3. Missing Relation. In this case a missing relation is inferred, given the objects. It is similar to, but simpler than, the case in Section 5.2.4.

5.2.4. Missing Object and Relation. The target structure is complete, some base elements are supplied, and the task is to find the missing object(s) and relation(s) in the base. This case entails more creativity than the first two cases, because new information is generated. An example would be if someone knew the basic structure of the solar system, knew that an atom consisted of a nucleus and electron(s), but did not know the relation between them (did not know that electrons orbit around the nucleus). The person could map nucleus into sun, electron into planet, and then recognize that the relation "orbits around" applied to electron and nucleus. Some of the tasks to which the Structure Mapping Engine (Falkenhainer et al., 1990), ACME (Holyoak & Thagard, 1989), and COPYCAT (Mitchell & Hofstadter, 1990) have been applied are of this type, because they entail inferring missing relations.


Figure 7.12.
Taxonomy of Structure-Mapping Tasks.

Figure 7.12


There is a subvariety in which the critical relation is missing from both base and target. In this case, a relation has to be found which applies to arguments in both base and target. For example, given person:house::dog:kennel, the relation "lives in" can be selected. Two structures impose a tighter constraint than a single structure. Given person:house a number of predicates are possible (e.g., "pays for," "repairs," but these do not apply to dog:kennel). Similarly, "is chained to," or "gnaws bones in" might apply to dog:kennel, but (hopefully) not to person:house.

5.2.5. Missing Base. The target structure is complete, but no base is supplied. The task requires that memory be searched for a structure that matches the target. Creativity may be required because the base might not bear any obvious resemblance to the target. Unfamiliar problems, for which the solver must find an analogous situation, are of this kind.

5.2.6. Incomplete Target, Missing Base. Target elements -are supplied, but the relation(s) are not, and the base is missing. A base must be found in order to supply the missing relations in the target. The Rutherford analogy tends to be treated nowadays as an explanatory analogy, in which complete base and target structures are supplied. However, when it was originally discovered the target structure would presumably have been incomplete, and it would have been necessary to find an appropriate base in order to complete the target. Therefore, let us consider a hypothetical "Rutherford" who did not know that the electron orbited around the nucleus, but knew the atom consisted of nucleus and electron (or some similar particles), and knew certain relations between them, such as the size difference, the electrostatic attraction, etc. The problem would have been to find an analogy or, in contemporary terms, a "mental model" of atomic structure. In these circumstances, the solar system might be adopted because it comprised objects which had size relations and mutual attraction. Then the relation "orbits around" in the solar system would be applied to the atom as an hypothesis. It is possible that much of the creativity of the Rutherford analogy would have consisted of finding the base, comprising the solar system, then in recognizing that the ORBITS-AROUND relation in the solar system might apply to the atom. The incomplete target, missing base analogy is highly creative, because not only is it necessary to find an appropriate, and probably non-obvious, base, but the base is used to supply information about the target that was not previously known. There are possible sub-varieties of this case, where target relations are supplied but an element is missing.

5.3. Further Applications of the Model

We will now consider how the STAR model can in principle handle the other cases of analogy shown in Figure 7.12.

5.3.1. Complete Structure. In this case both and target are completely represented in the tensor product, and forming the analogy is a matter of assessing the similarity of base and target predicates. This can be done using the inner product, as described earlier. If the inner product of the vectors representing base and target predicates is nonzero, then the base-target mapping achieves the minimum criterion of acceptability.

This implies that there must be a common predicate in base and target for an analogy to be formed. This is not a theoretical requirement for analogical mapping, because it is not contained in the definition of structure mapping given earlier, and the ACME model can find matches based solely on structural correspondence, with no shared predicates. However, it may be a psychological requirement. Gick and Holyoak (1983) found that common coding of base and target facilitated solution. The possibility of a common higher order predicate in base and target facilitated solution. The possibility of a common higher order predicate in base and target is virtually impossible to rule out in any actual case of analogical reasoning. For example, in a case such as that in Figure 7.3A, R and R' might not share common features, but there might be a common higher order relation (e.g., if R and R' are both asymmetric and transitive, then both structures represent ordered sets, and a higher order relation like "monotonic" would apply).

5.3.2. Missing Objects. This case was considered earlier.

5.3.3. Missing Relations. This case can be handled by entering base arguments into the tensor product representation, and obtaining the vector representing the sum of all predicates that apply to the argument set. The inner product of this vector with the vector representing the target predicate provides a measure of the success of the match.

In the case where the target predicate (relation) is missing, target arguments are entered into the tensor product in a separate step, and the vector representing the sum of all predicates applying to the target arguments is obtained. The inner product of vectors representing base and target predicate bundles provides a measure of the degree of match obtained. Individual predicates can be identified by obtaining the inner product of the predicate bundle with vectors representing specific predicates.

5.3.4. Missing Object and Relation. If N−1 base arguments are entered, a tensor product of the predicate and remaining argument will be obtained. The inner product of this tensor product with the target predicate will provide a measure of the degree of match. If the match is nonzero, the target predicate and known base arguments can be entered, and the remaining base argument obtained.

Where the target predicate is not known, this will be obtained in a separate step, as for the missing relation case.

5.3.5. Missing Base. The complete target subvariety of this case can be handled by entering the target predicate into the tensor product representation. This will yield a tensor product of all vectors representing arguments with that predicate. The set of arguments so obtained will include the target arguments, as well as all other arguments in the representation consistent with the predicate. Specific arguments can be identified by obtaining the inner product of the vectors representing them with the tensor product obtained as above.

Sets of vectors representing selected arguments are then entered into the tensor product representation, and the vector representing the sum of all predicates consistent with the selected set of arguments is obtained. The inner product of this vector with the vector representing the target predicate represents a measure of the degree of match. In this way a base that can be mapped into the target can be found. This process essentially entails using the target predicate as a retrieval cue to find possible bases.

5.3.6. Incomplete target, missing base. For this subvariety of the missing base case, the target arguments must first be entered into the tensor product, thereby obtaining the vector representing the sum of all predicates consistent with target arguments (as in the missing relations case, considered earlier). Then this vector is entered into the tensor product representation, and the remainder of the process is as for the complete target subvariety.

6. DISCUSSION OF THE MODEL

The proposed model is capable of solving a variety of analogical reasoning tasks. The cases we have considered so far entailed two-place predicates, but the model is capable in principle of solving analogies that entail three- or four-place predicates. It becomes computationally more costly as we move to higher place predicates, and consequently to tensor products of higher rank. However, this feature provides a natural explanation for the higher processing loads associated with tasks which required more dimensions to be processed in parallel. The limitation to four dimensions, for which evidence was presented earlier, is presumably a reflection of this processing load.

The model captures some of the important insights of the ACME model of Holyoak and Thagard (1989). ACME provides an elegant solution to the problem of how to implement structure mapping in a connectionist, parallel constraint satisfaction architecture. It does this by using inhibitory connections which ensure mappings are unique, and by excitatory and inhibitory connections which ensure that predicates in one structure correspond to predicates in the other. The present model achieves the correspondence property by superimposing predicate-argument bindings of base and target on the same representation. This provides a constraint between mapping of the predicate and the mapping of its arguments. This constraint is built into the representation, rather than resulting from computation.

The way structure mapping is implemented in the model differs from the way it is implemented in models such as the Structure Mapping Engine (Falkenhainer, Forbus, & Gentner, 1990), ACME (Holyoak & Thagard, 1989), or COPYCAT (Mitchell & Hofstadter, 1990). In all these models two separate structures, representing source and target, are created and then mappings are explicitly created between them. In our model the mappings are implicit in the structural representation, because base and target are both superimposed on the same set of units. Instead of base and target each with its own predicate(s), our model has a common (set of) predicate(s) which operate in two different contexts, base and target. Thus, in our example earlier, "mother of" occurs in two contexts, a human context and a horse context.

In its present form, the model is restricted to analogies in which base and target have a common predicate. There may be no common attributes or first-order relations, but in such cases there must be a common higher order relation.

Although common predicates are not a theoretical requirement for predicates, they may be a psychological requirement, as mentioned earlier. However, there are potential ways for removing this restriction from the model. Base and target predicates could be associated, or belong to the same category, rather having a literal similarity. That is, instead of using the predicate obtained from the base as input to the target, a similar predicate, or one from the same category, could be used as input, without altering the logic of the process.

The model can also be related to distributed memory models. The tensor product representation has been used as the basic data structure in the memory model of Humphreys et al. (1989), and as the underlying representation for a direct access intersection process by Wiles, Humphreys, Bain, and Dennis (1990). In these models, the triple that forms the tensor is created from the outer product of context, memory cue, and target vectors (see Humphreys et al., 1989, Equation 1, p. 215). Their representation of context, cue and target is analogous to the representation of predicate and two arguments which forms the basis of our account of simple analogical reasoning. The tensor product formalism, which represents all interactions of context, cue and target, or in the case of analogies, of predicate and arguments, has considerable potential to provide a computational model of creativity, as Wiles, Halford, Stewart, Humphreys, Bain, and Wilson (in press) have shown.

One difference between the memory and analogical reasoning models is that because Humphreys et al. (1989) are concerned with modeling memory, the values stored in their tensor products are interpreted as weights, whereas we are concerned with modeling dynamically created representations, so we interpret the values as activations. Nevertheless, our model is consistent with a memory retrieval few of analogies. Where arguments are entered into the tensor product representation, they are equivalent to retrieval cue for a set of arguments. Where a predicate and N−1 arguments are entered, they serve as a cue for retrieval of the Nth argument.

The processes of analogical reasoning are therefore closely related to memory retrieval processes. This does not necessarily imply however that analogical reasoning operates directly on semantic memory. It is possible that it takes place in a temporary representation that is a copy of information in semantic memory. The amount of such information that can be represented at any one time is probably restricted, as we noted before. Therefore, there is a serial aspect to analogical reasoning insofar as it is necessary to represent components of a structure one at-a-time.

7. APPLYING THE MODEL TO CONSISTENCY CHECKING TASKS

As noted earlier, the validity of a structure mapping can be defined in terms of consistency, which can be determined by checking whether mappings are commutative, as defined in Equation 1 and Figure 7.1. Halford and Wilson (1980) described a family of models of consistency checking tasks of varying levels of complexity, as illustrated in Figure 7.13. Levels 1, 2, and 3 as defined by Halford and Wilson (1980) correspond to relational, system, and multiple system mappings as discussed earlier.

To check consistency of a Level 1 system, one shows that for any symbol s in S1, ι2S(s)) = μE1(s)) that is, that it does not matter whether you transform s first, and then map the result into the environment, or map s first, and then transform the result. In terms of composition of functions, this can be expressed by the equation ι2μS = μEι1 The corresponding equations for Level 2 and 3 systems are respectively ι3 μS = μE ι1×ι2 and ι4 μS = μE ι1×ι2×ι3 In all cases, the message is that if you start in the top-left corner of a diagram and transform a member of the symbol system that you find there, then you will get the same result whether you follow the clockwise transformation path or the anti-clockwise transformation path.


Figure 7.13.
Symbol Systems for Consistency Checking.

Figure 7.13


Can the consistency checking embodied in the Halford and Wilson (1980) model be performed by a tensor product network of the type we have been discussing? As the Halford and Wilson (1980) model is couched in terms of symbol systems, we need to describe first how that model is to be interpreted in a PDP modeling environment. Basically, a Symbol s in Si, (i ∈ {1,2,3,4}), becomes a pattern in a vector space which will become one factor in the mathematical formulation of a tensor product network. In a Level 1 system, as illustrated in Figure 7.14, the functions μS, μE, ι1, and ι2 are particular binary relations, and as such could be encoded as a rank 3 tensor product net of the form Varg1Varg2Vpred or as four Rank 2 tensor product nets of the form In this latter formulation, the consistency checking calculation corresponds to a conceptual composition of tensor product networks of the form shown in Figure 7.14.


Figure 7.14.
Consistency Checking with Tensor Product Nets: (a) Level 1 Systems.

Figure 7.14


In accordance with Halford and Wilson (1980), the computations performed by the tensor product nets would be performed sequentially, presumably reflecting that the supply of tensor product nets as a computational resource is limited. Figure 7.15 shows a similar architecture for a Level 2 consistency checking calculation. Notice that the transformations μS and μE, correspond to rank 3 tensor products in the level 2 diagram. Figure 7.16 indicates how a Level 3 consistency checking calculation would be done-the transformations μS and μE correspond to rank 4 tensor products in this case. For all three levels, the equality check (in the bottom-right corner of Figure 7.13-7.15) is shown as though it was performed by another tensor product net, though there might be better neural sub-assemblies to test for equality.


Figure 7.15.
Consistency Checking with Tensor Product Nets: (b) Level 2 Systems.

Figure 7.15


It is not clear that this consistency checking process would be performed whenever a structure mapping is constructed because, as we have seen, there are algorithms which can produce valid structure mappings without explicitly performing this consistency cheek. Rather it might be thought of as a kind of meta-cognitive process, or higher order checking process, which serves to check that a mapping that has been made is logically consistent.

8. Limitations of Representations

It is probably already apparent that the dimensionality of concepts as defined earlier can be related to the rank of the tensors in the representation of the structures. A unidimensional concept corresponds to the tensor product of a vector representing a predicate and a vector representing its argument. A two dimensional concept as defined earlier corresponds to a tensor product of a predicate and two arguments. A three-dimensional concept corresponds to the tensor product of a predicate and three arguments, and so on. A concept of dimensionality N as defined earlier corresponds to a representation which entails a tensor product of Rank N + 1.


Figure 7.16.
Consistency Checking with Tensor Product Nets: (c) Level 3 Systems.

Figure 7.16


Note that dimensionality is defined in terms of the concept, rather than in terms of the rank of the tensor product. This is partly because conceptual complexity can be defined in other contexts, where tensor products might not have apparent relevance. More importantly, it is appropriate to define dimensionality in terms of the number of arguments in a relation, Consider, for example, the binary operation of addition, which is a ternary relation. It is three-dimensional because three items of information are sufficient to constrain the fourth (e.g., given 2,3,5 we can infer the operation is addition; given 2 x ? = 8 we can infer the missing number is 4).

This means that as we progress to higher order structures, the computational demands of the representations escalate rapidly. Given N vectors each of M elements, the number of elements in the tensor product is MN, There are ways of reducing the number of elements (Smolensky, 1990) because sufficiently clear representations can be achieved even with some elements deleted. Nevertheless, the number of elements escalates rapidly.

The fact that the computational demands of representations escalates rapidly as the dimensionality of the structures increases corresponds to the psychological evidence mentioned earlier that the processing loads are higher for higher-dimensional structures. Empirical evidence reviewed earlier suggests that four-dimensional structures can be represented, which would correspond to tensor products of Rank 5. On the evidence currently available this might be the most complex structure that humans can process.

There are two possible reasons for this. One would be the sheer size, in terms of number of units, of the tensor product vector. Another would be the number of connections to each unit in the tensor product. Each of these units must have a connection to, and receive input from, at least one unit in each of the vectors representing the predicate and argument(s). In the representation of a one-dimensional concept, with a tensor product of Rank 2, each unit has a connection to (at least) one unit in each of two vectors. With a Rank 3 tensor product, each tensor product unit has connections to units in three vectors, with a Rank 4 tensor product each unit has connections to units in four vectors, etc. Cortical cells are very richly interconnected, so the sheer number of connections is unlikely to be a problem. However, there may be difficulties in ensuring that each tensor product unit is connected to the right cells in each of the input vectors. That is, even though each tensor product unit might have thousands of connections, it might not have the appropriate connections. The higher the rank of the tensor product, the more difficult it will be to ensure this. This aspect of representations requires further exploration.

This is likely to be a soft rather than a hard limitation, and there will be occasions when the specified limit can be exceeded, just as there will be occasions when it will not be reached. This is quite consistent with the literature on capacity limitations mentioned earlier, which suggests a capacity to process 3-5 chunks or dimensions in parallel.

8.1. Conceptual Chunks

It appears humans have limited ability to process structure, and can probably only process four dimensions in parallel. However, it is obvious that many concepts are more complex than four dimensions, so how can we deal with more complex concepts? Two processes, conceptual chunking and segmentation, handle this problem. Conceptual chunking entails recoding a multidimensional structure as a unidimensional structure, or more generally, as a structure with fewer dimensions. This can only occur under certain conditions, and is a not universal method for reducing all processing loads.

An everyday example of a conceptual chunk would be velocity. Velocity is a three-dimensional concept, defined as: velocity = distance/time (V = s/t). However, it can be recoded as a single dimension. We commonly do this by thinking of speed as position of a pointer on a dial, as the rush of air in the face, blurring of close objects passing by, and so on.

Once multiple dimensions are recoded as a single dimension, that dimension occupies only one chunk, and it can then be combined with up to three other chunks. Velocity can now be combined again with time to give acceleration, normally defined as distance divided by time, divided by time (a = 2st−2). Acceleration can now be chunked into a single dimension, and can then be combined with mass to produce the concept of force, F = ma, and so we can bootstrap our way up to more and more complex concepts.

Conceptual chunks are similar to mnemonic chunks in that a number of formerly separate items of information are recoded as a single item, but there is more emphasis on structure (i.e., relations, function, or transformations) between elements, in conceptual chunks. The chunk corresponding to velocity does not consist merely of the concatenation of distance and time, but actually embodies the function V = s/t.

In our formulation, velocity would be represented as a Rank 4 tensor product, with vectors representing velocity (v), distance (s), time (t), and multiplication. If velocity is recoded as a single vector, then acceleration can be represented as a rank 4 tensor product, with vectors representing acceleration (a), velocity (v), time (t), and multiplication. If we now represent acceleration as a single vector, we can represent force (F) as a Rank 4 tensor product, with vectors representing F, mass (M), acceleration (a) and multiplication.

At first it seems that we must be getting something for nothing when the more complex concept, Force, is represented by a tensor product with no more vectors than the less-complex concept, velocity. Actually, however, there is a cost to the more efficient representation. This is that some of the constituent structure is no longer immediately accessible. If we write force as F=2mst−2, its representation corresponds to a rank 5 tensor product, because all dimensions (mass, distance, and both time dimensions, as well as force itself) are expressed as separate vectors. The effect of distance and time on force are expressed in the representation (the partial derivatives ∂F/∂t and ∂F/∂s are expressed). However, when force is expressed as F = ma, the effect of variations in distance and time (the partial derivatives mentioned above) cannot be expressed.

It only makes sense to fold s and t into velocity, v, if the actual values of s and t are irrelevant. For an example of a situation where the actual values are relevant, consider foot races; 100 meters in 10 seconds is reasonably fast, but 1,500 meters in 150 seconds, although the same velocity, is implausibly fast. On the other hand, when someone is booked for exceeding the speed limit, actual time and distance are irrelevant. In the latter case, but not in the former, it is appropriate to recode the three vector representation into one vector. This is reflected in the way people actually encode these events. We do not say someone ran at 30 kilometers per hour, we say they ran 100 meters in 12 seconds, but when someone is booked for speeding, we do not say they drove 2.5 kilometers in one minute, we say they drove at 150 kilometers per hour. In the first case, two parameters, distance and time, are relevant and both are mentioned, but in the second case only one parameter is relevant, and the language reflects this.

Thus, conceptual chunking, or recoding multiple vectors into a single vector, does not dispose of capacity limitations so much as it allows us to operate within them. Given that we can only represent concepts up to four dimensions in parallel, we must have ways of representing complex concepts that do not exceed this limit. We do this by combining multiple vectors into single vectors. However, when we want access to the components of the complex concepts, we must revert to the more specific representation again. Thus, to gain access to the components, we must sacrifice the efficiency gained by chunking. Thus, there tends to be a tradeoff between efficiency and flexibility.

Another implication is that representation of complex concepts is heavily dependent on availability of efficient codes. Because we are stretching our capacity for representations to process all the interrelations between distance, time, and mass that occur in the concept of force, we need concepts that provide efficient codes. We can only represent up to about four independent sources of variation (4 degrees of freedom) in parallel, so we require codes that enable us to represent complex concepts without exceeding this limit. This is why expertise is so important to representation, because without it we are restricted to very simple systems. Thus, when an expert examines a situation in simple mechanics, they represent it in terms of the most efficient concepts. In the case of force, they represent it in terms of mass and acceleration. The novice tries to represent too many dimensions (such as distance and time) because she/he does not have access to the important higher order concepts such as acceleration, and also tries to represent irrelevant dimensions.

Another, limitation of conceptual chunking is that, like mnemonic chunking, it is only possible with constant mappings of components into chunks. The mnemonic chunk "CAT" is possible because "C," "A," and "T" map into the word CAT in constant fashion. In the conceptual chunk speed, the dimensions distance and time map into speed in constant fashion, defined by the ternary relation of division.

The Rutherford analogy between the solar system and the atom, analyzed by Gentner (1982), and modeled by Falkenheiner et al. (1990) and Holyoak and Thagard (1989) also provides an instructive example of a conceptual chunk. The analogy entails mapping the solar body into the nucleus, and the planetary body into the electron, on the basis that the relational structures correspond; that is, the formula linking radius of orbit, the gravitation constant, masses of the two bodies, and orbital velocity, would correspond to the equivalent formulas in the atom. It would constitute a multiple system mapping, and recognition of the mapping without benefit of chunking would impose a high-processing load. However, as Halford (1993) points out, we can create the conceptual chunk "orbital motion" which would consist of only two nodes, the solar body and the orbiting body, with the relation "orbits around" between them. Structure mapping would then entail mapping solar body into the nucleus, and orbiting body into electron. This can be done on the basis that the relation "orbits around" exists between solar and orbiting bodies, and also between nucleus and electron. This only requires a relational mapping (i.e., a mapping based on two-place predicates).

The structure mapping model of Holyoak and Thagard (1989) includes "presumed" mappings (i.e., presumptions that particular items in one structure should be mapped into certain items in the other structure). The conceptual chunk "orbital motion" would entail the presumption that, if solar body and planetary body are mapped into nucleus and electron, respectively, then all the predicates attached to these two chunks would be correspondingly mapped (i.e., radius of planetary orbit would be mapped into radius of electron orbit), and so on. The processing load is then equivalent to that required to map the nodes solar body and planetary body, with a single relation between them. It is equivalent to a relational mapping. The remaining mappings are presumed, having been learned in past experience, and do not impose a processing load.

One characteristic of an expert, as compared with a novice, is that the expert can recognize examples of concepts on the basis of their deep structure (Chi & Ceci, 1987; Holyoak, 1990). An expert who has constructed the conceptual chunk "orbital motion": recognizes an example of it quickly, and with little effort. All that is necessary is to recognize that there is a solar body and an orbiting body, with the relation "orbits around" between them. Once "planetary body orbits around solar body" has been mapped to "electron orbits around nucleus," the remaining features, such as size relations, attraction relations, etc., are presumed to be mapped.

Chunking can, therefore, produce massive gains in efficiency, but at the expense of rigidity. While a concept is encoded as a chunk in this way, its constituent structure is less accessible, and alternative combinations cannot he generated. Hence, novices, although they are less likely to see the conventional structures quickly and efficiently because they have not coded the information into efficient chunks might, for that very reason, more readily see novel structures.

Segmentation entails decomposing structures into smaller components that can be processed serially. This also can be illustrated with the Rutherford analogy Once the core chunks "planetary body orbits around solar body" and "electron orbits around nucleus" are mapped, the remaining components can be mapped serially. Notice, however, that some knowledge of the two structures is required for this serial processing strategy to be organized. When the analogy was first discovered, it is unlikely that such strategies for reducing processing load would have been available. Consequently the processing load must have been very high indeed.

Given that only a limited amount of structure can be processed in one step, the chance of finding a good mapping depends on choosing the right representation of the problem. A complex problem might comprise many predicates and many arguments. The key to the mapping will depend on choosing the right subset of predicates or arguments. For example, in the Rutherford analogy, success depends on representing the solar system by the predicate "revolves around" with arguments "sun" and "planet," and the atom by "nucleus" and "electron." There are many other representations that would conflict with a good match (e.g., representing the solar system by the predicate -hotter than'). An expert will tend to select an appropriate match relatively quickly. A realistic model of human analogical reasoning however needs ways of searching for a good representation. The present model could achieve this in principle by adding a sampling process, whereby attempted matches are based on successive samples of each structure. Success in sampling would be defined in terms of the match achieved; when the appropriate predicate-argument combinations are found, a match will he readily obtained.

Notice that the requirement that predicates in the two structures must correspond can be achieved by such as process. When an appropriate representation is found, the predicates will correspond automatically, as noted earlier. The reason is that the predicate-argument combinations of base and target are superimposed on the same representation. If this is not true, a good match cannot be obtained. Therefore, the sampling process will be constrained to find a predicate-argument combination that puts the two structures in correspondence. With such a structure sampling procedure, the present model could achieve essentially the same result as the ACME model, without assuming appropriate representations at the outset.

9. REQUIREMENTS FOR TESTING THE THEORY

The theory proposes that there are limits to the dimensionality of structure that can be represented and mapped in parallel. The problem with testing this proposition is that structures can be recoded into lower dimensional representations, and structures can sometimes be processed serially. However, the factors which govern these processes are known to some extent, and can be controlled in experimental paradigms. Our work on the past has shown that it is possible to constrain participants to map particular levels of structure in parallel under certain conditions. We will briefly summarize the requirements for doing this.

To test the theory five requirements must be met. These are that the task must be assigned objectively to a level of structure mapping, conceptual chunking and segmentation must be controlled, the mappings must not have been pre-learned, and there must be no non-structural cues to the correct mappings.

The most objective way to assign a task to a level of structure mapping is to write the specifications for the task in a machine-readable form such as predicate calculus, then use a structure mapping simulation program to determine the level. This has been done using the simulation program by Bakker and Halford (1988), which has two structure mapping procedures, corresponding to relational and system mappings. The task is attempted by each level of mapping procedure, and the lowest successful level is the one to which the task is assigned. The program has been used with transitive inference and the Rutherford analogy, and shows that the former is a system mapping and the latter is a relational mapping if coded as described in the last section. Where no simulation program is available, the mathematical structure of the task can be examined. Transitive inference has been used in our research because it has a clearly defined mathematical structure that can be assigned objectively to a level of mapping.

Conceptual chunking is avoided if the task requires participants to generate a new structure. For example, each transitive inference task requires participants to generate an ordered set from the premises. The elements (e.g., elements in Figure 7.2) will have been assigned to slots in the premises so that the correct ordering cannot be retrieved from semantic memory. Because the elements have not occurred in that combination before, at least not with any regularity, they will not have been formed into a chunk.

Segmentation can be avoided by ensuring that the various components of the task interact, because it is not then possible to process one component without taking account of the others. This is true for the interpretation of interactions task, discussed earlier. The premises of a transitive inference task also interact in the sense that neither can be interpreted without the other. When we try to assign a premise element to an ordinal position on the basis of one premise, the assignment is modified by the other premise.

It is also necessary to ensure that the mappings have not been pre-learned. This can normally be done by using arbitrary assignments. The requirement that there be no non-structural cues, such as linguistic cues, is similar to saying that the mappings must have low transparency (Gentner & Gentner, 1983).

10. AGE AND PROCESSING CAPACITY

Although there is agreement that ability to process information increases through childhood (Case, 1985; Halford, 1982) it is not clear whether this growth represents increased capacity, or more efficient use of available capacity, or both. It has been shown that speed of processing increases with age (Case, Kurland, & Goldberg, 1982; see also Halford, 1982; Siegler, 1990, for reviews), that strategies become more sophisticated (Flavell & Wellinan, 1977), that knowledge becomes more sophisticated and better organized (Carey, 1985) and that content familiarity has a powerful effect on memory (Chi, Glaser, & Rees, 1982). However, none of these approaches assesses capacity per se. They show that capacity is not only the thing that changes with age, but do not eliminate the possibility that capacity increases might accompany these other changes.

Ability to map structures does increase with age. Halford and Wilson (1980) examined children's ability to map a structure into an isomorph. They varied the number of dimensions required to validate the mapping, and found that structures defined on two dimensions (two-place predicates) could be mapped by 3-4-year-olds, structures based on three dimensions (three-place predicates) at 5 years, and structures based on four dimensions (four place predicates) at 11 years. This suggests the number of dimensions that can be processed in parallel increases with age, and reaches the adult level at age 11. The prediction that ability to process structure increases with age is also supported by studies of transitive inference (Halford, 1984, 1989; Halford & Kelly, 1984; Halford, Maybery, & Bain, 1986) and class inclusion reasoning (Halford & Leitch, 1989). However these gains might reflect more efficient use of available capacity, and do not necessarily entail a maturational increase in capacity.

We reviewed evidence earlier that whereas adults process four chunks in primary memory in parallel, 8-9-year-old children process two or three chunks. There is other evidence also of an age-related increase in information processing capacity, as district from the efficiency with which capacity is used (Thatcher, Walker, & Giudice, 1987). The evidence is not sufficient to resolve the issue, but we would like to suggest an alternate formulation of the problem. This is that increased ability to process structure might result from differentiation of some regions of the brain, thereby increasing the independence of representations. This would enable more chunks or dimensions to be processed in parallel. Increased differentiation with age would be consistent with the developmental theory of Werner (1948). It would not increase overall capacity, but it would mean that structures of higher dimensionality could be mapped in parallel.

This in turn would permit new conceptual achievements. Concepts such as inclusion and transitivity are defined on compositions of binary relations, and cannot be decomposed, without loss, into single relations. Therefore, if children are limited to processing individual relations, they will not be able to comprehend these concepts. A similar argument applies to binary operations, such as arithmetic addition and multiplication, and the binary connectives of logic (conjunction and disjunction). These concepts also cannot be decomposed without loss into simpler concepts. A certain amount of information must be processed in parallel to make concepts at the system-mapping level accessible. An extra dimension must be processed in parallel to make concepts at the next level accessible. Therefore increased differentiation, with consequent increases in the dimensionality of structures that can be processed in parallel, opens up new conceptual possibilities.

From our account of the way cognitive structures are represented in PDP architectures, a new way of formulating the development of processing capacity question emerges. We have seen that representation of high-dimensional structures depends on ability to compute tensor products of equivalent rank. Therefore the question becomes; do children acquire the ability to compute tensor products of higher rank as they grow older? Should the answer to this question be yes, is this ability due to maturation or learning? To address these questions, we need more studies which control the representations that participants use in a precise way. Our empirical work has shown how this can be done in certain contexts, but there is considerable scope to widen it.

11. CONCLUSIONS

Humans perform analogical reasoning very easily and naturally, but realistic models require specification of our capacity to map structures in parallel. The complexity of concepts can be quantified in terms of their dimensionality, and we propose that there are limits to the number of dimensions that humans can represent and map in parallel. Our evidence suggests that adults can map up to four dimensions in parallel, and that children can map less dimensions in parallel.

The complexity metric that we propose quantifies concepts in terms of the number of independent dimensions required to define them. Four levels of mapping are defined, which increase in abstractness and processing load. These mappings are based on concepts of increasing dimensionality. We have examined the way these concepts would be represented in PDP architectures. We have suggested that one-dimensional concepts are represented as the tensor product of a predicate and a single argument vector, two-dimensional concepts as the tensor product of two argument vectors and a predicate vector, three- (four-) dimensional concepts are represented as the tensor product of three (four) argument vectors and a predicate vector. We review empirical evidence suggesting that four dimensions might represent an upper limit to the complexity of concepts that humans can process in parallel.

The tensor product representations of predicate-argument bindings is used to model analogical reasoning. The main processes entail entering arguments into the representation to obtain unknown predicates, or entering predicates (sometimes accompanied by a subset of arguments) to obtain missing arguments. We have shown in principle how five basic types of analogical reasoning can be performed in terms of the model.

Concepts that entail more than four dimensions are processed by conceptual chunking. This entails recoding multiple vector representations into fewer vectors, or one vector in the limiting case. This reduces processing load, but also makes the constituent information unavailable for restructuring, unless a return is made to the original, multi-vector representation. The primary developmental implication is that ability to process structure might depend on differentiation or other processes which lead to representations of higher dimensionality These in turn would be based on ability to compute tensor products from more vectors.[Footnote 2]


Footnote 2: Solution to problem in Figure 7.4:

A1 = Paramount; A2 = MGM; B 1 = A-grade; B2 = B-grade;
Cl = G-rated; C2 = R-rated; D1 = adventure; D2 = comedy.


REFERENCES

Amit, D.J. (1989). Modelling brain function: The world of attractor neural networks. Cambridge, UK: Cambridge University Press.

Anderson, J.A. (1973). A theory for the recognition of items from short memorized lists. Psychological Review, 80, 417-438.

Anderson, J.R. (1983). The architecture of cognition. Cambridge, MA: Harvard University Press.

Anderson, J.R., & Thompson, R. (1989). Use of analogy in a production system architecture. In S. Vosniadou & A. Ortony (Eds.), Similarity and analogical reasoning (pp. 267-297). Cambridge: Cambridge University Press.

Attneave, E (1959). Applications of information theory to psychology: A summary of basic concepts, methods and results. New York: Henry Holt & Company.

Baddeley, A.D. (1986). Working memory. Oxford: Clarendon Press.

Baddeley, A., & Hitch, G. (1974). Working memory. In G.H. Bower (Ed.), The psychology of learning and motivation: Advances in research and theory (pp. 47-89). New York: Academic Press.

Baddeley, A.D., Thomson, N., & Buchanan, M. (1975). Word length and the structure of short-term memory. Journal of Verbal Learning and Verbal Behavior, 14, 575-589.

Bakker, RE, & Halford, G. S. (1988). A basic computational theory of structure-mapping in analogy and transitive inference. Unpublished Tech. Rep. 88/1, Centre for Human Information Processing and Problem Solving, University of Queensland, Brisbane, Australia.

Broadbent, D.E. (1958). Perception and communication. London: Pergamon.

Broadbent, D.E. (1975). The magic number seven after fifteen years. In A. Kennedy & A. Wilkes (Eds.), Studies in long term memory (pp. 3-18). London: Wiley.

Carey, S. (1985). Conceptual change in childhood. Cambridge, MA: MIT Press.

Case, R. (1985). Intellectual development: Birth to adulthood. New York: Academic Press.

Case, R., Kurland, M., & Goldberg J. (1982). Operational efficiency and the growth of short-term memory span. Journal of Experimental Child Psychology, 33, 386-404.

Chi, M.H.T., & Ceci, S.J. (1987). Content knowledge: its role, representation and restructuring in memory development. Advances in Child Development and Behaviour 20, 91-142.

Chi, M.T.H., Glaser, R., & Rees, E. (1982). Expertise in problem solving. In R. Sternberg (Ed.), Advances in the psychology of human intelligence (Vol. 1, pp. 7-75). Hillsdale, NJ: Erlbaum.

Coombs, C.H., Dawes, R.M., & Wersky, A. (1970). Mathematical psychology: An elementary introduction. Englewood Cliffs, NJ: Prentice-Hall.

Duncan, J. (1980). The demonstration of capacity limitation. Cognitive Psychology, 12, 75-96.

Egan, D.E., & Greeno, J.G. (1974). Theory of rule induction: Knowledge acquired in concept learning, serial pattern learning, and problem solving. In L.W Gregg (Ed.), Knowledge and cognition. Potomac, MA: Erlbaum.

Falkenhainer, B., Forbus, K.D., & Gentner, D. (1990). The structure-mapping engine: algorithm and examples. Artificial Intelligence, 41, 1-63.

Fisher, D.L. (1984). Central capacity limits in consistent mapping, visual search tasks: four channels or more? Cognitive Psychology, 16(4), 449-484.

Fisk, A.D., Derrick, W.L., & Schneider, W (1986). A methodological assessment and evaluation of dual-task paradigms. Current Psychological Research and Reviews, 5, 315-327.

Flavell, J.H., & Wellman, H.M. (1977). Metamemory. In R.V Kail & J.W. Hagen (Eds.), Memory in cognitive development (pp. 3-33). Hillsdale, NJ: Erlbaum.

Garner, W.R. (1962). Uncertainty and structure as psychological concepts. New York: Wiley,

Gentner, D. (1982). Are scientific analogies metaphors? In D.S. Miall (Ed.). Metaphor: Problems and perspectives (pp. 106-132). Brighton, England: Harvester Press.

Gentner, D. (1983). Structure-mapping: A theoretical framework for analogy. Cognitive Science, 7, 155-170.

Gentner, D. (1989). The mechanisms of analogical reasoning. In S. Vosniadou & A. Ortony (Eds.), Similarity and analogical reasoning (pp. 199-241). Cambridge: Cambridge University Press.

Gentner, D., & Gentner, D.R. (1983). Flowing waters or teeming crowds: Mental models of electricity. In D. Gentner & A.L. Stevens (Eds.), Mental models (pp. 99-129). Hillsdale, NJ: Lawrence Erlbaum.

Gick, M.L., & Holyoak, K.J. (1980). Analogical problem solving. Cognitive Psychology, 12, 306-355.

Gick, M.L., & Holyoak, K.J. (1983). Schema induction and analogical transfer. Cognitive Psychology, 15, 1-38.

Halford, G.S. (1982). The development of thought. Hillsdale, NJ: Erlbaum.

Halford, G.S. (1984). Can young children integrate premises in transitivity and serial order tasks? Cognitive Psychology, 16, 65-93.

Halford, G.S. (1987). A structure-mapping approach to cognitive development. International Journal of Psychology, 22, 609-642.

Halford, G.S. (1989). Reflections on 25 years of Piagetian cognitive developmental psychology, 1963-1988. Human Development, 32, 325-387.

Halford, G.S. (1993). Children's understanding: The development of mental models. Hillsdale, NJ: Erlbaum.

Halford, G.S., Bain, J.D., & Maybery, M.T. (1984). Does a concurrent memory load interfere with reasoning? Current Psychological Research and Reviews, 3, 14-23.

Halford, G.S., & Kelly, M.E. (1984). On the basis of early transitivity judgements. Journal of Experimental Child Psychology, 38, 42-63.

Halford, G.S., & Leitch, E. (1989). Processing load constraints: a structure-mapping approach. In M.A. Luszcz & T. Nettelbeck (Eds.), Psychological development: Perspectives across the life-span (pp. 151-159). Amsterdam: Elsevier.

Halford, G.S., Maybery, M.T., & Bain, J.1). (1986). Capacity limitations in children's reasoning: A dual task approach. Child Development, 57, 616-627,

Halford, G.S., Maybery, M.T., & Bain, J.D. (1988). Set-size effects in primary memory: An age-related capacity limitation? Memory and Cognition, 16(5), 480-487.

Halford, G.S., & Wilson, W.H. (1980). A category theory approach to cognitive development. Cognitive Psychology, 12, 356-411.

Hesse, M.B. (1966). Models and analogies in science. Notre Dame, IN: Notre Dame University Press.

Hinton, G.E. (1981). Implementing semantic networks in parallel hardware. In G.E. Hinton & 1A. Anderson (Eds.), Parallel models of associative memory (pp. 161-187). Hillsdale, NJ: Erlbaum.

Holland, J.H., Holyoak, KA, Nisbett, R.E., & Thagard, P.R. (1986). Induction: Processes of inference, learning and discovery. Cambridge, MA: Bradford Books/MIT Press.

Holyoak, K. (1990). Symbolic connectionism: Toward third-generation theories of expertise. In K.A. Ericsson & J. Smith (Eds.). Toward a general theory of expertise: Prospects and limits. Cambridge, MA: Cambridge University Press.

Holyoak, K.A., & Koh, K. (1987). Surface and structural similarity in analogical transfer. Memory and Cognition, 15, 332-340.

Holyoak, K.A., & Thagard, P. (1989). Analogical mapping by constraint satisfactions. Cognitive Science, 13(3), 295-355.

Hopfield, J.J. (1982). Neural networks and physical systems with emergent collective computational abilities. Proceedings of the National Academy of Science, 79, 2554-2558.

Humphreys, M.S., Bain, J.D., & Pike, R. (1989). Different ways to cue a coherent memory system: A theory for episodic, semantic and procedural tasks. Psychological Review, 96 (2), 208-233.

Hunt, E., & Lansman, M. (1982). Individual differences in attention. In R. Sternberg (Ed.). Advances in the psychology of human intelligence (Vol. 1, pp. 207-254). Hillsdale, NJ: Erlbaum.

Kahneman, D. (1973). Attention and effort. Englewood Cliffs, NJ: Prentice-Hall.

Leeuwenberg, E.L.L. (1969). Quantitative specification of information in sequential patterns. Psychological Review, 76, 216-220.

Maclane, S. (1972). Categories for the working mathematician. New York: Springer-Verlag.

Maybery, M.T., Bain, J.D., & Halford, G.S. (1986). Information processing demands of transitive inference. Journal of Experimental Psychology: Learning, Memory and Cognition, 12, 600-613.

McClelland, J.L. (1986). Resource requirements of standard and programmable nets. In D.E. Rumelhart & J.L. McClelland (Eds.), Parallel distributed processing: Explorations in the micro-structure of cognition (pp. 460-487). Cambridge, MA: MIT Press.

Miller, G.A. (1956). The magical number seven, plus or minus two: Some limits on our capacity for processing information. Psychological Review, 63, 81-97.

Mitchell, M., & Hofstadter, D.R. (1990). The emergence of understanding in a computer model of concepts and analogy-making. Physica D, 42, 322-334.

Moray, N. (1967). Where is capacity limited? A survey and model. Acta Psychologica, 27, 84-92.

Murdock, B.B., Jr. (1983). A distributed memory model for serial-order information. Psychological Review, 90, 316-338.

Navon, D. (1984). Resources-a theoretical soup stone? Psychological Review, 91, 216-234.

Newell, A., & Simon, H.A. (1972). Human problem solving. New York: Prentice-Hall.

Norman, D. A., & Bobrow, D. G. (1975). On data-limited and resource-limited processes. Cognitive Psychology, 7, 44-64.

Palmer, S.E. (1978). Fundamental aspects of cognitive representation. In E. Rosch & B.B. Lloyd (Eds.), Cognition and categorization (pp. 259-303). Hillsdale, NJ: Erlbaum.

Palmer, S.E. (1989). Levels of description in information-processing theories of analogy. In S. Vosniadou. & A. Ortony (Eds.), Similarity and analogical reasoning (pp. 332-345). Cambridge: Cambridge University Press.

Plate, T. (1991, August). Holographic reduced representations: convolution algebra for compositional distributed representations. 12th International Joint Conference on Artificial Intelligence (pp. 30-35). Sydney, Australia.

Restle, R. (1970). Theory of serial pattern learning: Structural trees. Psychological Review, 77, 481-495.

Rumelhart, D.E. (1989). Toward a micro-structural account of human reasoning. In S. Vosniadou & A. Ortony (Eds.), Similarity and analogical reasonings (pp. 298-312). Cambridge: Cambridge University Press.

Rumelhart, D.E., & McClelland, J.L. (Eds.), (1986). Parallel distributed Processing: Explorations in the micro-structure of cognition. Cambridge, MA: MIT Press.

Schneider, W, & Detweiler, M. (1987). A connectionist/ control architecture for working memory. The Psychology of Learning and Motivation, 21, 53-119.

Schweickert, R., & Boruff, B. (1986). Short-term memory capacity: Magic number or magic spell? Journal of Experimental Psychology: Learning, Memory and Cognition, 12, 419-425.

Siegler, R.S. (1990). Children's thinking (2nd ed.). Englewood Cliffs, NJ: Prentice-Hall.

Simon, H.A. (1972). Complexity and the representation of patterned sequences of symbols. Psychological Review, 79, 369-382.

Simon, H.A. (1974). How big is a chunk? Science, 183, 482-488.

Simon, H.A., & Kotovsky, K. (1963). Human acquisition of concepts for sequential patterns. Psychological Review, 70, 534-546.

Smolensky, R (1990). Tensor product variable binding and the representation of symbolic structures in connectionist systems. Artificial Intelligence, 46(1-2), 159-216.

Standing, L., Bond, B. Smith, R, & Isely, C. (1980). Is the immediate memory span determined by subvocalization rate? British Journal of Psychology, 71, 525-539.

Suppes, P., & Zinnes, J.L. (1963). Basic measurement theory. In R.D. Luce, R.R. Bush, & E. Galanter (Eds.), Handbook of mathematical psychology. New York: Wiley.

Werner, H. (1948). Comparative psychology of mental development. Chicago: Follet.

Wickens, C.D. (1980). The structure of attentional resources. In R.S. Nickerson (Ed.). Attention and performance (Vol. VIII, pp. 239-257). Hillsdale, NJ: Erlbaum.

Wiles, J., Halford, G.S., Stewart, J.E.M., Humphreys, M.S., Bain, J.B., & Wilson, W.H. (1994). Tensor models: A creative basis for memory, pages 147-161 in Artificial Intelligence and Creativity: an Interdisciplinary Approach, edited by T. Dartnall, Dordrecht: Kluwer Academic.

Wiles, J., Humphreys, M. S., Bain, J.D., & Dennis, S. (1990). Control processes and cue combinations in a connectionist model of human memory (Tech. Report, No. 186). Brisbane, Australia: Key Centre for Software Technology, University of Queensland.

Willshaw, D. (1981). Holography, associative memory, and inductive generalization. In G.E. Hinton & 1A. Anderson (Eds.). Parallel models of associative memory (pp. 83-104). Hillsdale, NJ: Erlbaum.