TITLE: Reinforcement Learning with Behavioural Cloning: a way to Fly
PRESENTER: Visiting academic Eduardo F. Morales
AFFILIATION: Tec de Monterrey - Cuernavaca
DATE: Friday 27th June 2003
TIME: 12:00pm to 1.00pm
PLACE: CSE K17 1st Floor Seminar Room
Reinforcement learning deals with learning optimal or near optimal policies while interacting with the environment. Application domains with several continuous variables are difficult to solve with existing reinforcement learning methods due to the large search space. One way to simplify the task is to abstract the search space in some way. In this talk, we will use a relational representation where it is easy to: (i) define powerful and useful abstractions, (ii) incorporate domain knowledge, and (iii) re-use previously learned policies on other similar problems. These are relevant to a domain like a flight simulator, where it is important to represent the relative position of the aircraft and to learn how to fly through several points in space regardless of their position. A relational abstraction requires the definition of relational actions. In this talk, we will describe how to learn such relational actions from traces of flights using a behavioural cloning approach. Due to the nature of the domain and our particular behavioural cloning algorithm, several conflicting relational actions may be induced for the same relational state. Behavioural cloning, however, induces only a small subset of relational actions providing guidance to the reinforcement learning task. Reinforcement learning is then used, over such reduced space, to define an optimal policy. It will be shown experimentally how a combination of behavioural cloning and reinforcement learning using a relational representation is powerful enough to learn how to fly through different points in space and different turbulence conditions.
BIOGRAPHY OF SPEAKER:
Eduardo Morales received his BSc degree on Physics Engineering from Universidad Autonoma Metropolitana, Mexico (1984), his MSc on Information Technology from the University of Edinburgh, UK (1985), and his PhD in Computer Science from the Turing Institute - University of Strathclyde, UK (1992). He has worked at the Electric Power Research Institute, Palo Alto, Calif., the Institute of Electrical Research (Instituto de Investigaciones Electricas), Mexico, and since 1994 at the Tec de Monterrey, Campus Cuernavaca, Mexico from which he is currently on sabbatical leave.
School of Computer Science & Engineering, UNSW.