Real-World Offline Reinforcement Learning from Vision Language Model Feedback

Sreyas Venkataraman; Yufei Wang; Ziyu Wang; Navin Sriram Ravie; Zackory Erickson; David Held

arXiv:2411.05273·cs.RO·August 7, 2025

Real-World Offline Reinforcement Learning from Vision Language Model Feedback

Sreyas Venkataraman, Yufei Wang, Ziyu Wang, Navin Sriram Ravie, Zackory Erickson, David Held

PDF

Open Access

TL;DR

This paper introduces a system that automatically generates reward labels from vision-language feedback for offline reinforcement learning, enabling effective policy learning in real-world robotic tasks without manual reward annotation.

Contribution

It presents a novel approach combining vision-language models with offline RL to automatically label rewards, facilitating policy learning from unlabeled, sub-optimal datasets.

Findings

01

Successfully applied to a real-world robot dressing task

02

Outperforms behavior cloning and inverse RL baselines

03

Effective in simulation with rigid and deformable objects

Abstract

Offline reinforcement learning can enable policy learning from pre-collected, sub-optimal datasets without online interactions. This makes it ideal for real-world robots and safety-critical scenarios, where collecting online data or expert demonstrations is slow, costly, and risky. However, most existing offline RL works assume the dataset is already labeled with the task rewards, a process that often requires significant human effort, especially when ground-truth states are hard to ascertain (e.g., in the real-world). In this paper, we build on prior work, specifically RL-VLM-F, and propose a novel system that automatically generates reward labels for offline datasets using preference feedback from a vision-language model and a text description of the task. Our method then learns a policy using offline RL with the reward-labeled dataset. We demonstrate the system's applicability to a…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsRobotics and Automated Systems