There has been much recent interest in the potential of using reinforcment learning techniques for control in autonomous robotic agents. How to implement effective reinforcement learning in a real-world robotic environment still involves many open questions. Are standard reinforcement learning algorithms like Watkins' Q-Learning appropriate, or are other approaches more suitable? Some specific issues to be considered are noise/disturbance and the possibly non-Markovian aspects of the control problem. These are the particular issues we focus upon in this paper.
The test-bed for the experiments described in this paper is a real six-legged insectoid walking robot; the task set is to learn an effectively co-ordinated walking gait. The performance of a new algorithm we call C-Trace is compared to Watkins' well-known 1-step Q-Learning reinforcement learning algorithm. We discuss the markedly superior performance of this new algorithm in the context of both theoretical and existing empirical results regarding learning in noisy and non-Markovian domains.
Download full paper (compressed postscript)