Let's Think in Two Steps: Mitigating Agreement Bias in MLLMs with Self-Grounded Verification

Moises Andrade; Joonhyuk Cha; Brandon Ho; Vriksha Srihari; Karmesh Yadav; Zsolt Kira

arXiv:2507.11662·cs.AI·March 10, 2026

Let's Think in Two Steps: Mitigating Agreement Bias in MLLMs with Self-Grounded Verification

Moises Andrade, Joonhyuk Cha, Brandon Ho, Vriksha Srihari, Karmesh Yadav, Zsolt Kira

PDF

Open Access 3 Reviews

TL;DR

This paper identifies agreement bias in multimodal large language model verifiers and introduces Self-Grounded Verification (SGV), a method that improves their alignment and accuracy in evaluating agent behaviors across diverse tasks.

Contribution

The paper proposes SGV, a novel approach that enhances MLLM verifiers by leveraging self-generated priors, leading to more human-aligned judgments and better performance in various applications.

Findings

01

SGV improves failure detection by 25 percentage points.

02

SGV increases accuracy by 14 percentage points.

03

Enhanced task completion in multiple AI agents surpassing previous state-of-the-art.

Abstract

Verifiers--functions assigning rewards to agent behavior--have been key to AI progress in math, code, and games. However, extending gains to domains without clear-cut success criteria remains a challenge: while humans can recognize desired outcomes, translating this intuition into scalable rules is nontrivial. Multimodal LLMs (MLLMs) offer a promising solution, given their world knowledge, human-preference alignment, and reasoning capabilities. We evaluate MLLM verifiers across web navigation, computer use, and robotics, spanning 13+ models, 28+ designs, and thousands of trajectories from diverse agents. We identify a critical limitation: a strong tendency for MLLMs to over-validate agent behavior--a phenomenon we term agreement bias. This bias is pervasive, resilient to test-time scaling, and can harm applications relying on MLLM judgments/rewards (e.g., self-improvement, steering,…

Peer Reviews

Decision·ICLR 2026 Poster

Reviewer 01Rating 4Confidence 3

Strengths

1. This work identifies a significant and practical problem, the "agreement bias" of MLLM verifiers. This is an important contribution as these verifiers are increasingly proposed for data filtering, self-refinement, and online agent guidance. 2. The paper demonstrates strong empirical results, particularly in improving the True Negative Rate (failure detection). This is a crucial metric that is more informative than overall accuracy for this problem, as the primary goal of a verifier is to catc

Weaknesses

1. The core mechanism of SGV—generating "broad priors" in Step 1 independent of the agent's trajectory —may be a significant flaw. By being ungrounded from the specific context of the agent's current state, these priors may be overly generic or common sense hallucinations that are irrelevant to the task at hand. This could lead the verifier to be "overly strict," unfairly penalizing valid or creative solutions that deviate from the generic script, a failure mode the authors acknowledge. The pape

Reviewer 02Rating 8Confidence 3

Strengths

1. This paper proposes a straightforward extension of the idea in Pan, et al. (2024) which shows that automatic evaluators can be used to improve the performance of web navigation and device control agents at training or inference time 2. The paper makes a compelling case that models exhibit agreement bias when evaluating agent trajectories, i.e., a bias toward positive labels (Table 1a). The paper also clearly shows that its method leads to a reduction in agreement bias (Table 1b), and that th

Weaknesses

1. It would be nice to see some comparison of the proposed method with simpler strategies, e.g., different prompts to the verifier model, or prompting models to generate confidences and applying Platt scaling 2. The paper could benefit from an additional round of proofreading. For example: - Line 101: missing a period - Line 221: “Table Table 7” -> “Table 7” - Line 263: broken reference (“??”) - Line 1234: “AgentRewarBench” -> “AgentRewardBench” - \citet should be replaced

Reviewer 03Rating 4Confidence 3

Strengths

1. Originality: Framing "agreement bias" as a distinct limitation from self-bias and targeting it via self-generated priors is a novel angle for MLLM verifiers. 2. Quality: Experiments use diverse benchmarks (1,200+ tasks) and models, ensuring results are generalizable rather than model-specific. 3. Clarity: The SGV method is described simply, with step-by-step breakdowns and concrete examples (e.g., Figure 5) making it easy to follow. 4. Significance: Improving MLLM verifier reliability directl

Weaknesses

1. SGV does not address underlying vision-language flaws (e.g., Figure 7’s counting error), and the paper lacks discussion on combining SGV with specialist models for fine-grained perception. 2. Current studies primarily focus on moderate-length trajectories (e.g., "We set the maximum number of steps to 30"). However, the scalability of SGV to extremely long sequences remains unclear. Such sequences are common in computer usage scenarios, and the context window pressure under extremely long sequ

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsMulti-Agent Systems and Negotiation · Access Control and Trust · Business Process Modeling and Analysis