TITLE: From Markov Decision Processes to Artificial Intelligence
PRESENTER: Rich Sutton, Professor (iCORE chair)
AFFILIATION: Department of Computing Science, University of Alberta
DATE: Friday, 23rd April 2004
TIME: 12:00noon - 1.00pm
PLACE: CSE K17 1st Floor Seminar Room
The path to general, human-level intelligence may go through Markov decision processes (MDPs), a discrete-time, probabilistic formulation of sequential decision problems in terms of states, actions, and rewards. Developed in the 1950s, MDPs were extensively explored and applied in operations research and engineering before coming to the attention of artificial intelligence researchers about 15 years ago. Much of the new interest has come from the field of reinforcement learning, where novel twists on classical dynamic programming methods have enabled the solution of more and vastly larger problems, such as backgammon (Tesauro, 1995) and elevator control (Crites and Barto, 1996). Despite remaining technical issues, real progress seems to have been made toward general learning and planning methods relevant to artificial intelligence. We suggest that the MDP framework can be extended further, to the threshold of human-level intelligence, by abstracting and generalizing each of its three components - actions, states, and rewards. We briefly survey recent work on temporally abstract actions (Precup, 2000; Parr, 1998), predictive representations of state (Littman et al., 2002), and non-reward subgoals (Sutton, Precup & Singh, 1998) to make this suggestion.
BIOGRAPHY OF SPEAKER:
Richard S. Sutton was born in Toledo, Ohio, and grew up in Oak Brook, Illinois, a suburb of Chicago. He received the B.A. degree in psychology from Stanford University in 1978, and the M.S. and Ph.D. degrees in Computer Science from the University of Massachusetts in 1980 and 1984. He worked for nine years at GTE Laboratories in Waltham as principal investigator of their connectionist machine learning project, and for three years at the University of Massachusetts in Amherst as a research scientist in the computer science department. In 1998-2002 Richard Sutton worked at AT&T Labs in Florham Park, New Jersey, and from August of 2003 he has been a professor of Computing Science at the University of Alberta. He is a fellow of the American Association for Artificial Intelligence. Since September 1999 he has been fighting stage IV melanoma. Richard Sutton's research interests center on the learning problems facing a decision-maker interacting with its environment, which he sees as central to artificial intelligence. He is the author of the original paper on temporal-difference learning and, with Andrew Barto, of the textbook Reinforcement Learning: An Introduction. He is also interested in animal learning psychology, in connectionist networks, and generally in systems hat continually improve their representations and models of the world.
Professor Claude Sammut
School of Computer Science and Engineering
University of New South Wales
Sydney Australia 2052
School of Computer Science & Engineering, UNSW.