Is Value Learning Really the Main Bottleneck in Offline RL?
Seohong Park, Kevin Frans, Sergey Levine, Aviral Kumar

TL;DR
This paper investigates the main bottlenecks in offline reinforcement learning, revealing that policy extraction choices and generalization issues, rather than value function learning, are primary performance constraints.
Contribution
The study systematically analyzes offline RL components, highlighting the impact of policy extraction methods and test-time generalization, and proposes simple methods to improve performance.
Findings
Policy extraction algorithms significantly influence offline RL performance.
Imperfect policy generalization on out-of-distribution states is a major bottleneck.
Test-time policy training techniques can enhance offline RL outcomes.
Abstract
While imitation learning requires access to high-quality data, offline reinforcement learning (RL) should, in principle, perform similarly or better with substantially lower data quality by using a value function. However, current results indicate that offline RL often performs worse than imitation learning, and it is often unclear what holds back the performance of offline RL. Motivated by this observation, we aim to understand the bottlenecks in current offline RL algorithms. While poor performance of offline RL is typically attributed to an imperfect value function, we ask: is the main bottleneck of offline RL indeed in learning the value function, or something else? To answer this question, we perform a systematic empirical study of (1) value learning, (2) policy extraction, and (3) policy generalization in offline RL problems, analyzing how these components affect performance. We…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsERP Systems Implementation and Impact
