Horizon Reduction Makes RL Scalable
Seohong Park, Kevin Frans, Deepinder Mann, Benjamin Eysenbach, Aviral Kumar, Sergey Levine

TL;DR
This paper demonstrates that reducing the horizon in offline reinforcement learning significantly improves its scalability, enabling algorithms to perform better on complex, long-horizon tasks, and introduces a new method called SHARSA.
Contribution
The paper identifies horizon length as a key barrier to offline RL scalability and proposes horizon reduction techniques, including the novel SHARSA method, to overcome this challenge.
Findings
Horizon length limits offline RL scalability.
Horizon reduction techniques improve performance on challenging tasks.
SHARSA outperforms existing methods in scalability and asymptotic performance.
Abstract
In this work, we study the scalability of offline reinforcement learning (RL) algorithms. In principle, a truly scalable offline RL algorithm should be able to solve any given problem, regardless of its complexity, given sufficient data, compute, and model capacity. We investigate if and how current offline RL algorithms match up to this promise on diverse, challenging, previously unsolved tasks, using datasets up to 1000x larger than typical offline RL datasets. We observe that despite scaling up data, many existing offline RL algorithms exhibit poor scaling behavior, saturating well below the maximum performance. We hypothesize that the horizon is the main cause behind the poor scaling of offline RL. We empirically verify this hypothesis through several analysis experiments, showing that long horizons indeed present a fundamental barrier to scaling up offline RL. We then show that…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsReinforcement Learning in Robotics · Stochastic Gradient Optimization Techniques · Domain Adaptation and Few-Shot Learning
