Learning to Mix n-Step Returns: Generalizing lambda-Returns for Deep Reinforcement Learning
Sahil Sharma, Girish Raguvir J, Srivatsan Ramesh, Balaraman Ravindran

TL;DR
This paper introduces Confidence-based Autodidactic Returns (CAR), a novel method for learning optimal n-step return weights in Deep Reinforcement Learning, outperforming traditional lambda-returns in Atari 2600 benchmarks.
Contribution
It provides an exhaustive benchmark of lambda-returns and proposes CAR, enabling RL agents to learn return weighting end-to-end, moving beyond fixed exponential decay schemes.
Findings
CAR outperforms lambda-returns and fixed n-step returns in Atari 2600 experiments.
End-to-end learned weighting schemes improve RL performance.
Benchmark results highlight the advantages of adaptive return weighting.
Abstract
Reinforcement Learning (RL) can model complex behavior policies for goal-directed sequential decision making tasks. A hallmark of RL algorithms is Temporal Difference (TD) learning: value function for the current state is moved towards a bootstrapped target that is estimated using next state's value function. -returns generalize beyond 1-step returns and strike a balance between Monte Carlo and TD learning methods. While lambda-returns have been extensively studied in RL, they haven't been explored a lot in Deep RL. This paper's first contribution is an exhaustive benchmarking of lambda-returns. Although mathematically tractable, the use of exponentially decaying weighting of n-step returns based targets in lambda-returns is a rather ad-hoc design choice. Our second major contribution is that we propose a generalization of lambda-returns called Confidence-based Autodidactic…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsReinforcement Learning in Robotics · Evolutionary Algorithms and Applications · Sports Analytics and Performance
MethodsN-step Returns
