Preference VLM: Leveraging VLMs for Scalable Preference-Based Reinforcement Learning
Udita Ghosh, Dripta S. Raychaudhuri, Jiachen Li, Konstantinos Karydis,, Amit Roy-Chowdhury

TL;DR
PrefVLM integrates vision-language models with selective human feedback to reduce annotation costs in preference-based reinforcement learning, achieving comparable success with fewer annotations and enabling efficient transfer across tasks.
Contribution
This work introduces PrefVLM, a novel framework that leverages VLMs and selective feedback to enhance scalability and reduce human annotation in preference-based RL.
Findings
Achieves similar or better success rates with up to 2x fewer human annotations.
Uses VLMs to generate and filter preference labels effectively.
Enables efficient transfer of learned policies across tasks.
Abstract
Preference-based reinforcement learning (RL) offers a promising approach for aligning policies with human intent but is often constrained by the high cost of human feedback. In this work, we introduce PrefVLM, a framework that integrates Vision-Language Models (VLMs) with selective human feedback to significantly reduce annotation requirements while maintaining performance. Our method leverages VLMs to generate initial preference labels, which are then filtered to identify uncertain cases for targeted human annotation. Additionally, we adapt VLMs using a self-supervised inverse dynamics loss to improve alignment with evolving policies. Experiments on Meta-World manipulation tasks demonstrate that PrefVLM achieves comparable or superior success rates to state-of-the-art methods while using up to 2 x fewer human annotations. Furthermore, we show that adapted VLMs enable efficient…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsData Stream Mining Techniques · Reinforcement Learning in Robotics
