This is where the graphotactics problem came from. This paper is not about neural networks; it describes a letter-trigram-based method of addressing the question of whether a word that is not in your on-line lexicon is a novel but valid word (i.e. a neologism) or a mis-spelling or typo. For example, with the typo "patns" (for "pants"), the letter-trigram t-n-s does not occur in English words, so "patns" is unlikely to be a neologism. A problem with this method is that the actual constraint in English is on phoneme-sequences (phonotactics), not letter-sequences (graphotactics). Thus, in "hrose" (for "horse") the phoneme-trigram corresponding to h-r-o doesn't occur in English, but the letter-trigram does, as part of sequences like ...chro..., as in "chromium", "synchrotron". One could go to tetragrams or beyond, but then the size of the table (456976 for tetragrams) becomes unworkably large. The following papers describe an attempt to address the problem by considering longer-range dependencies in letter-sequences using generalisations of simple recurrent networks. In the end, the generalised SRNs may be of more interest than the problem for which they were devised.
Initial experiments in learning letter sequences as a next-letter-prediction-task, using simple recurrent networks, found that these networks did not learn the task well enough or fast enough. This paper considers the possibility of using networks, later named "Elman tower networks", which are like simple recurrent networks, but have more than one state vector. The contents of state vector number n+1 is that of state vector number n in the previous time step, and all state vectors are fed back to the hidden layer using trainable weighted connections. See diagram below. This method significantly improves learning performance, as demonstrated in this paper using Elman towers with 2, 4, and 7 state vectors.
This paper adds to the previous one that the large oscillations in total sum-squared error found in the previous paper can be eliminated by zeroing state-vector activations between training examples. It also shows that the same modification (adding extra state vectors) also improves learning performance in Jordan networks.
An integrated and updated presentation of the two previous papers.
Still interested? Here are some notes on how to do Elman tower nets in tlearn.
You can also read my PDF lecture notes on recurrent networks, including material on these generalised Elman networks (and generalised Jordan nets). Start at about page 55. The material on pages 59 and 60 shows how the connectivity of Elman tower networks can be simulated by regular Elman nets (simple recurrent networks) and is not published elsewhere. In practice, however, tower networks learn appropriate tasks faster and better than simple recurrent networks.
Networks related to tower networks have also been studied by a range of authors (see p. 783 in Haykin for details) and referred to as NARX networks (Non-linear AutoRegressive with eXogenous inputs).
Other Neural Network research by Bill Wilson
Page Maintained by: Bill Wilson
Bill Wilson's Contact Details
Last updated:
UNSW's CRICOS Provider No. is 00098G