Policy Learning from Large Vision-Language Model Feedback without Reward Modeling

Tung M. Luu; Donghoon Lee; Younghwan Lee; and Chang D. Yoo

arXiv:2507.23391·cs.LG·August 1, 2025

Policy Learning from Large Vision-Language Model Feedback without Reward Modeling

Tung M. Luu, Donghoon Lee, Younghwan Lee, and Chang D. Yoo

PDF

Open Access

TL;DR

PLARE introduces a reward-free offline reinforcement learning method that uses vision-language models to generate preference signals from visual trajectories, enabling effective robotic manipulation training without explicit reward functions.

Contribution

It presents a novel approach leveraging large vision-language models to guide policy learning directly from preference labels, removing the need for reward function design.

Findings

01

PLARE matches or outperforms existing VLM-based reward methods on MetaWorld tasks.

02

It successfully trains policies for real-world robotic manipulation without explicit reward functions.

03

The approach demonstrates practical applicability in real robot experiments.

Abstract

Offline reinforcement learning (RL) provides a powerful framework for training robotic agents using pre-collected, suboptimal datasets, eliminating the need for costly, time-consuming, and potentially hazardous online interactions. This is particularly useful in safety-critical real-world applications, where online data collection is expensive and impractical. However, existing offline RL algorithms typically require reward labeled data, which introduces an additional bottleneck: reward function design is itself costly, labor-intensive, and requires significant domain expertise. In this paper, we introduce PLARE, a novel approach that leverages large vision-language models (VLMs) to provide guidance signals for agent training. Instead of relying on manually designed reward functions, PLARE queries a VLM for preference labels on pairs of visual trajectory segments based on a language…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsConstraint Satisfaction and Optimization · Topic Modeling · Semantic Web and Ontologies