This paper argues that for many domains, we can expect credit-assignment methods that use actual returns to be more effective for reinforcement learning than the more commonly used temporal difference methods. We present analysis and empirical evidence from three sets of experiments in different domains to support this claim. A new algorithm we call C-Trace, a variant of the P-Trace RL algorithm is introduced, and some possible advantages of using algorithms of this type are discussed.
Download full paper (compressed postscript)