The Dark Side of Rich Rewards: Understanding and Mitigating Noise in VLM Rewards

Sukai Huang; Shu-Wei Liu; Nir Lipovetzky; Trevor Cohn

arXiv:2409.15922·cs.LG·November 11, 2025

The Dark Side of Rich Rewards: Understanding and Mitigating Noise in VLM Rewards

Sukai Huang, Shu-Wei Liu, Nir Lipovetzky, Trevor Cohn

PDF

Open Access

TL;DR

This paper investigates the negative impact of false positive rewards in Vision-Language Model-based reward signals for embodied agents, introduces BiMI to reduce noise, and demonstrates improved learning efficiency.

Contribution

It identifies false positive rewards as particularly harmful, analyzes the limitations of cosine similarity, and proposes BiMI as a novel reward function to mitigate reward noise.

Findings

01

False positive rewards significantly hinder learning.

02

BiMI improves training efficiency in navigation tasks.

03

Cosine similarity is prone to false positive errors.

Abstract

While Vision-Language Models (VLMs) are increasingly used to generate reward signals for training embodied agents to follow instructions, our research reveals that agents guided by VLM rewards often underperform compared to those employing only intrinsic (exploration-driven) rewards, contradicting expectations set by recent work. We hypothesize that false positive rewards -- instances where unintended trajectories are incorrectly rewarded -- are more detrimental than false negatives. Our analysis confirms this hypothesis, revealing that the widely used cosine similarity metric is prone to false positive reward estimates. To address this, we introduce BiMI ({Bi}nary {M}utual {I}nformation), a novel reward function designed to mitigate noise. BiMI significantly enhances learning efficiency across diverse and challenging embodied navigation environments. Our findings offer a nuanced…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsReinforcement Learning in Robotics · Elevator Systems and Control · Evolutionary Algorithms and Applications

MethodsSparse Evolutionary Training