Reward Design for Physical Reasoning in Vision-Language Models

Derek Lilienthal; Manisha Mukherjee; and Sameera Horawalavithana

arXiv:2604.13993·cs.AI·April 16, 2026

Reward Design for Physical Reasoning in Vision-Language Models

Derek Lilienthal, Manisha Mukherjee, and Sameera Horawalavithana

PDF

TL;DR

This paper systematically studies how different reward signals influence physical reasoning in vision-language models, revealing that reward design impacts reasoning behaviors and performance variably across domains.

Contribution

It introduces a novel internal attention-based reward and provides a comprehensive ablation study on reward effects in physical reasoning tasks.

Findings

01

Accuracy-based rewards yield the strongest overall performance gains.

02

Rubric rewards enhance structured reasoning but do not always improve accuracy.

03

Attention-based rewards improve spatial reasoning without harming symbolic reasoning.

Abstract

Physical reasoning over visual inputs demands tight integration of visual perception, domain knowledge, and multi-step symbolic inference. Yet even state-of-the-art Vision Language Models (VLMs) fall far short of human performance on physics benchmarks. While post-training algorithms such as Supervised Fine-Tuning (SFT) and Group Relative Policy Optimization (GRPO) have demonstrated strong reasoning gains in language models, how reward design shapes VLM physical reasoning behavior remains poorly understood. We present a systematic reward ablation study for GRPO-based VLM training on physical reasoning. We compare four reward signals of increasing semantic richness: format compliance, answer accuracy, a composite rubric reward (answer correctness, physics principle identification, and unit consistency), and a novel internal reward derived from model attention weights over input image…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.