Real-World Offline Reinforcement Learning from Vision Language Model Feedback
Sreyas Venkataraman, Yufei Wang, Ziyu Wang, Navin Sriram Ravie, Zackory Erickson, David Held

TL;DR
This paper introduces a system that automatically generates reward labels from vision-language feedback for offline reinforcement learning, enabling effective policy learning in real-world robotic tasks without manual reward annotation.
Contribution
It presents a novel approach combining vision-language models with offline RL to automatically label rewards, facilitating policy learning from unlabeled, sub-optimal datasets.
Findings
Successfully applied to a real-world robot dressing task
Outperforms behavior cloning and inverse RL baselines
Effective in simulation with rigid and deformable objects
Abstract
Offline reinforcement learning can enable policy learning from pre-collected, sub-optimal datasets without online interactions. This makes it ideal for real-world robots and safety-critical scenarios, where collecting online data or expert demonstrations is slow, costly, and risky. However, most existing offline RL works assume the dataset is already labeled with the task rewards, a process that often requires significant human effort, especially when ground-truth states are hard to ascertain (e.g., in the real-world). In this paper, we build on prior work, specifically RL-VLM-F, and propose a novel system that automatically generates reward labels for offline datasets using preference feedback from a vision-language model and a text description of the task. Our method then learns a policy using offline RL with the reward-labeled dataset. We demonstrate the system's applicability to a…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsRobotics and Automated Systems
