EQA-RM: A Generative Embodied Reward Model with Test-time Scaling

Yuhang Chen; Zhen Tan; Tianlong Chen

arXiv:2506.10389·cs.LG·June 13, 2025

EQA-RM: A Generative Embodied Reward Model with Test-time Scaling

Yuhang Chen, Zhen Tan, Tianlong Chen

PDF

Open Access 1 Video

TL;DR

EQA-RM is a novel generative reward model designed for embodied question answering tasks, offering interpretable feedback and test-time scaling, with high efficiency and strong performance on a new benchmark.

Contribution

The paper introduces EQA-RM, a generative multimodal reward model for EQA, trained with C-GRPO, and presents EQARewardBench for standardized evaluation.

Findings

01

EQA-RM achieves 61.9% accuracy with only 700 samples.

02

EQA-RM outperforms proprietary and open-source baselines.

03

Test-time scaling enables dynamic evaluation granularity.

Abstract

Reward Models (RMs), vital for large model alignment, are underexplored for complex embodied tasks like Embodied Question Answering (EQA) where nuanced evaluation of agents' spatial, temporal, and logical understanding is critical yet not considered by generic approaches. We introduce EQA-RM, a novel generative multimodal reward model specifically architected for EQA, trained via our innovative Contrastive Group Relative Policy Optimization (C-GRPO) strategy to learn fine-grained behavioral distinctions. The generative nature of EQA-RM provides interpretable, structured reward feedback (beyond simple scalars), uniquely enabling test-time scaling to dynamically adjust evaluation granularity, from concise scores to detailed critiques of reasoning and grounding, at inference without retraining. Concurrently, we introduce EQARewardBench, a new benchmark built on OpenEQA for standardized EQA…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

EQA-RM: A Generative Embodied Reward Model with Test-time Scaling· underline

Taxonomy

TopicsMultimodal Machine Learning Applications · Topic Modeling · Machine Learning in Healthcare