Preference VLM: Leveraging VLMs for Scalable Preference-Based   Reinforcement Learning

Udita Ghosh; Dripta S. Raychaudhuri; Jiachen Li; Konstantinos Karydis,; Amit Roy-Chowdhury

arXiv:2502.01616·cs.LG·February 4, 2025

Preference VLM: Leveraging VLMs for Scalable Preference-Based Reinforcement Learning

Udita Ghosh, Dripta S. Raychaudhuri, Jiachen Li, Konstantinos Karydis,, Amit Roy-Chowdhury

PDF

Open Access

TL;DR

PrefVLM integrates vision-language models with selective human feedback to reduce annotation costs in preference-based reinforcement learning, achieving comparable success with fewer annotations and enabling efficient transfer across tasks.

Contribution

This work introduces PrefVLM, a novel framework that leverages VLMs and selective feedback to enhance scalability and reduce human annotation in preference-based RL.

Findings

01

Achieves similar or better success rates with up to 2x fewer human annotations.

02

Uses VLMs to generate and filter preference labels effectively.

03

Enables efficient transfer of learned policies across tasks.

Abstract

Preference-based reinforcement learning (RL) offers a promising approach for aligning policies with human intent but is often constrained by the high cost of human feedback. In this work, we introduce PrefVLM, a framework that integrates Vision-Language Models (VLMs) with selective human feedback to significantly reduce annotation requirements while maintaining performance. Our method leverages VLMs to generate initial preference labels, which are then filtered to identify uncertain cases for targeted human annotation. Additionally, we adapt VLMs using a self-supervised inverse dynamics loss to improve alignment with evolving policies. Experiments on Meta-World manipulation tasks demonstrate that PrefVLM achieves comparable or superior success rates to state-of-the-art methods while using up to 2 x fewer human annotations. Furthermore, we show that adapted VLMs enable efficient…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsData Stream Mining Techniques · Reinforcement Learning in Robotics