How to Leverage Unlabeled Data in Offline Reinforcement Learning
Tianhe Yu, Aviral Kumar, Yevgen Chebotar, Karol Hausman, Chelsea Finn,, Sergey Levine

TL;DR
This paper shows that in offline reinforcement learning, assigning zero rewards to unlabeled data can be surprisingly effective, simplifying reward labeling and leveraging unlabeled data efficiently.
Contribution
The paper introduces a simple zero-reward approach for unlabeled data in offline RL, supported by theoretical analysis and empirical validation across multiple tasks.
Findings
Zero reward assignment to unlabeled data often yields good performance.
Reweighting can reduce bias from incorrect reward labels.
The approach is effective in robotic control tasks.
Abstract
Offline reinforcement learning (RL) can learn control policies from static datasets but, like standard RL methods, it requires reward annotations for every transition. In many cases, labeling large datasets with rewards may be costly, especially if those rewards must be provided by human labelers, while collecting diverse unlabeled data might be comparatively inexpensive. How can we best leverage such unlabeled data in offline RL? One natural solution is to learn a reward function from the labeled data and use it to label the unlabeled data. In this paper, we find that, perhaps surprisingly, a much simpler method that simply applies zero rewards to unlabeled data leads to effective data sharing both in theory and in practice, without learning any reward model at all. While this approach might seem strange (and incorrect) at first, we provide extensive theoretical and empirical analysis…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsReinforcement Learning in Robotics
