Policy Learning from Large Vision-Language Model Feedback without Reward Modeling
Tung M. Luu, Donghoon Lee, Younghwan Lee, and Chang D. Yoo

TL;DR
PLARE introduces a reward-free offline reinforcement learning method that uses vision-language models to generate preference signals from visual trajectories, enabling effective robotic manipulation training without explicit reward functions.
Contribution
It presents a novel approach leveraging large vision-language models to guide policy learning directly from preference labels, removing the need for reward function design.
Findings
PLARE matches or outperforms existing VLM-based reward methods on MetaWorld tasks.
It successfully trains policies for real-world robotic manipulation without explicit reward functions.
The approach demonstrates practical applicability in real robot experiments.
Abstract
Offline reinforcement learning (RL) provides a powerful framework for training robotic agents using pre-collected, suboptimal datasets, eliminating the need for costly, time-consuming, and potentially hazardous online interactions. This is particularly useful in safety-critical real-world applications, where online data collection is expensive and impractical. However, existing offline RL algorithms typically require reward labeled data, which introduces an additional bottleneck: reward function design is itself costly, labor-intensive, and requires significant domain expertise. In this paper, we introduce PLARE, a novel approach that leverages large vision-language models (VLMs) to provide guidance signals for agent training. Instead of relying on manually designed reward functions, PLARE queries a VLM for preference labels on pairs of visual trajectory segments based on a language…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsConstraint Satisfaction and Optimization · Topic Modeling · Semantic Web and Ontologies
