Learning to Mix n-Step Returns: Generalizing lambda-Returns for Deep   Reinforcement Learning

Sahil Sharma; Girish Raguvir J; Srivatsan Ramesh; Balaraman Ravindran

arXiv:1705.07445·cs.LG·November 7, 2017·1 cites

Learning to Mix n-Step Returns: Generalizing lambda-Returns for Deep Reinforcement Learning

Sahil Sharma, Girish Raguvir J, Srivatsan Ramesh, Balaraman Ravindran

PDF

Open Access

TL;DR

This paper introduces Confidence-based Autodidactic Returns (CAR), a novel method for learning optimal n-step return weights in Deep Reinforcement Learning, outperforming traditional lambda-returns in Atari 2600 benchmarks.

Contribution

It provides an exhaustive benchmark of lambda-returns and proposes CAR, enabling RL agents to learn return weighting end-to-end, moving beyond fixed exponential decay schemes.

Findings

01

CAR outperforms lambda-returns and fixed n-step returns in Atari 2600 experiments.

02

End-to-end learned weighting schemes improve RL performance.

03

Benchmark results highlight the advantages of adaptive return weighting.

Abstract

Reinforcement Learning (RL) can model complex behavior policies for goal-directed sequential decision making tasks. A hallmark of RL algorithms is Temporal Difference (TD) learning: value function for the current state is moved towards a bootstrapped target that is estimated using next state's value function. $λ$ -returns generalize beyond 1-step returns and strike a balance between Monte Carlo and TD learning methods. While lambda-returns have been extensively studied in RL, they haven't been explored a lot in Deep RL. This paper's first contribution is an exhaustive benchmarking of lambda-returns. Although mathematically tractable, the use of exponentially decaying weighting of n-step returns based targets in lambda-returns is a rather ad-hoc design choice. Our second major contribution is that we propose a generalization of lambda-returns called Confidence-based Autodidactic…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsReinforcement Learning in Robotics · Evolutionary Algorithms and Applications · Sports Analytics and Performance

MethodsN-step Returns