One problem that I worked on was an attempt to capture the graphotactic patterns of English words: e.g. that a word can end but not begin with "nd". It developed into a comparison of different recurrent network archictectures for studying problems of this type. The graphotactic project in turn developed out of a project that used letter trigrams to try to distinguish, among novel words in text, the ones that were typographical or similar errors, and those that were real words not covered by one's lexicon. (Obviously, some typographical errors result in words with graphotactically legal structure (like the mis-spelling of "architecture" that I just noticed, a couple of sentences back), and some real unknown words would be borrowings from languages with different graphotactic structures, but the idea was to get at least some idea of whether the novel word was OK.
Two former students in this area are: