SVQA-R1: Reinforcing Spatial Reasoning in MLLMs via View-Consistent Reward Optimization

Peiyao Wang; Haibin Ling

arXiv:2506.01371·cs.CV·June 3, 2025

SVQA-R1: Reinforcing Spatial Reasoning in MLLMs via View-Consistent Reward Optimization

Peiyao Wang, Haibin Ling

PDF

Open Access

TL;DR

This paper introduces SVQA-R1, a novel reinforcement learning framework that enhances spatial reasoning in vision-language models for VQA tasks by using view-consistent rewards, leading to improved accuracy and interpretability.

Contribution

We extend the R1 paradigm to spatial VQA with a new group-wise RL strategy called Spatial-GRPO, promoting grounded spatial understanding without supervised fine-tuning.

Findings

01

Significant accuracy improvements on spatial VQA benchmarks.

02

Model exhibits interpretable reasoning paths.

03

Effective across multiple spatial reasoning tasks.

Abstract

Spatial reasoning remains a critical yet underdeveloped capability in existing vision-language models (VLMs), especially for Spatial Visual Question Answering (Spatial VQA) tasks that require understanding relative positions, distances, and object configurations. Inspired by the R1 paradigm introduced in DeepSeek-R1, which enhances reasoning in language models through rule-based reinforcement learning (RL), we propose SVQA-R1, the first framework to extend R1-style training to spatial VQA. In particular, we introduce Spatial-GRPO, a novel group-wise RL strategy that constructs view-consistent rewards by perturbing spatial relations between objects, e.g., mirror flipping, thereby encouraging the model to develop a consistent and grounded understanding of space. Our model, SVQA-R1, not only achieves dramatically improved accuracy on spatial VQA benchmarks but also exhibits interpretable…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsSemantic Web and Ontologies · Constraint Satisfaction and Optimization · Logic, Reasoning, and Knowledge