Actual return reinforcement learning versus Temporal Differences: Some theoretical and experimental results, Pendrith M.D., Ryan M.R.K., The 13th International Conference on Machine Learning, Bari, Italy, 3-6 July 1996, 1996.

This paper argues that for many domains, we can expect credit-assignment methods that use actual returns to be more effective for reinforcement learning than the more commonly used temporal difference methods. We present analysis and empirical evidence from three sets of experiments in different domains to support this claim. A new algorithm we call C-Trace, a variant of the P-Trace RL algorithm is introduced, and some possible advantages of using algorithms of this type are discussed.

Download full paper (compressed postscript)


Mark Pendrith - pendrith@cse.unsw.edu.au
Malcolm Ryan - malcolmr@cse.unsw.edu.au