In this paper we present a hybrid system combining techniques from symbolic planning and reinforcement learning. Planning is used to automatically construct task hierarchies for hierarchical reinforcement learning based on abstract models of the behaviours' purpose, and to perform intelligent termination improvement when an executing behaviour is no longer appropriate. Reinforcement learning is used to produce concrete implementations of abstractly defined behaviours and to learn the best possible choice of behaviour when plans are ambiguous. Two new hierarchical reinforcement learning algorithms are presented: Planned Hierarchical Semi-Markov Q-Learning (P-HSMQ), a variant of the HSMQ algorithm (Dietterich 2000) which uses plan-built task hierarchies, and Teleo-Reactive Q-Learning, a more complex algorithm which implements hierarchical reinforcement learning with teleo-reactive execution semantics (Nilsson 1994). Each algorithm is demonstrated in a simple grid-world domain.
Download full paper (compressed postscript)