RUDDER: Return Decomposition for Delayed Rewards

Jose A. Arjona-Medina; Michael Gillhofer; Michael Widrich; Thomas; Unterthiner; Johannes Brandstetter; Sepp Hochreiter

arXiv:1806.07857·cs.LG·September 11, 2019·59 cites

RUDDER: Return Decomposition for Delayed Rewards

Jose A. Arjona-Medina, Michael Gillhofer, Michael Widrich, Thomas, Unterthiner, Johannes Brandstetter, Sepp Hochreiter

PDF

Open Access 2 Repos

TL;DR

RUDDER introduces a reward redistribution and return decomposition method in reinforcement learning to effectively handle delayed rewards, simplifying Q-value estimation and significantly improving learning speed and performance.

Contribution

It presents a novel approach combining reward redistribution and return decomposition to address delayed rewards in reinforcement learning, enhancing learning efficiency and effectiveness.

Findings

01

RUDDER significantly outperforms traditional methods on artificial delayed reward tasks.

02

It improves Atari game scores, especially in games with delayed rewards.

03

RUDDER is exponentially faster than MCTS and TD(λ) in artificial tasks.

Abstract

We propose RUDDER, a novel reinforcement learning approach for delayed rewards in finite Markov decision processes (MDPs). In MDPs the Q-values are equal to the expected immediate reward plus the expected future rewards. The latter are related to bias problems in temporal difference (TD) learning and to high variance problems in Monte Carlo (MC) learning. Both problems are even more severe when rewards are delayed. RUDDER aims at making the expected future rewards zero, which simplifies Q-value estimation to computing the mean of the immediate reward. We propose the following two new concepts to push the expected future rewards toward zero. (i) Reward redistribution that leads to return-equivalent decision processes with the same optimal policies and, when optimal, zero expected future rewards. (ii) Return decomposition via contribution analysis which transforms the reinforcement…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsReinforcement Learning in Robotics · Parallel Computing and Optimization Techniques · Topic Modeling