RUDDER: Return Decomposition for Delayed Rewards
Jose A. Arjona-Medina, Michael Gillhofer, Michael Widrich, Thomas, Unterthiner, Johannes Brandstetter, Sepp Hochreiter

TL;DR
RUDDER introduces a reward redistribution and return decomposition method in reinforcement learning to effectively handle delayed rewards, simplifying Q-value estimation and significantly improving learning speed and performance.
Contribution
It presents a novel approach combining reward redistribution and return decomposition to address delayed rewards in reinforcement learning, enhancing learning efficiency and effectiveness.
Findings
RUDDER significantly outperforms traditional methods on artificial delayed reward tasks.
It improves Atari game scores, especially in games with delayed rewards.
RUDDER is exponentially faster than MCTS and TD(λ) in artificial tasks.
Abstract
We propose RUDDER, a novel reinforcement learning approach for delayed rewards in finite Markov decision processes (MDPs). In MDPs the Q-values are equal to the expected immediate reward plus the expected future rewards. The latter are related to bias problems in temporal difference (TD) learning and to high variance problems in Monte Carlo (MC) learning. Both problems are even more severe when rewards are delayed. RUDDER aims at making the expected future rewards zero, which simplifies Q-value estimation to computing the mean of the immediate reward. We propose the following two new concepts to push the expected future rewards toward zero. (i) Reward redistribution that leads to return-equivalent decision processes with the same optimal policies and, when optimal, zero expected future rewards. (ii) Return decomposition via contribution analysis which transforms the reinforcement…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsReinforcement Learning in Robotics · Parallel Computing and Optimization Techniques · Topic Modeling
