Smooth Operator: Smooth Verifiable Reward Activates Spatial Reasoning Ability of Vision-Language Model

Siwen Jiao; Tianxiong Lv; Kangan Qian; Chenxu Zhao; Xiuyuan Zhu; Tianlun Li; Xiaolong Cheng; Jinyu Li; Zhihao Liao; Yang Cai

arXiv:2601.07695·cs.CV·January 16, 2026

Smooth Operator: Smooth Verifiable Reward Activates Spatial Reasoning Ability of Vision-Language Model

Siwen Jiao, Tianxiong Lv, Kangan Qian, Chenxu Zhao, Xiuyuan Zhu, Tianlun Li, Xiaolong Cheng, Jinyu Li, Zhihao Liao, Yang Cai

PDF

Open Access

TL;DR

This paper introduces a novel reward activation and training framework for vision-language models that enhances 3D scene understanding by improving reward signal density and data efficiency, enabling better spatial reasoning without changing model architecture.

Contribution

The paper proposes SNRA and AP-GRPO, innovative methods that improve reward signal density and mitigate information loss, advancing 3D reasoning in vision-language models.

Findings

01

AP-GRPO achieves performance comparable to large supervised models.

02

The methods activate latent 3D reasoning abilities.

03

Higher data efficiency in training.

Abstract

Vision-Language Models (VLMs) face a critical bottleneck in achieving precise numerical prediction for 3D scene understanding. Traditional reinforcement learning (RL) approaches, primarily based on relative ranking, often suffer from severe reward sparsity and gradient instability, failing to effectively exploit the verifiable signals provided by 3D physical constraints. Notably, in standard GRPO frameworks, relative normalization causes "near-miss" samples (characterized by small but non-zero errors) to suffer from advantage collapse. This leads to a severe data utilization bottleneck where valuable boundary samples are discarded during optimization. To address this, we introduce the Smooth Numerical Reward Activation (SNRA) operator and the Absolute-Preserving GRPO (AP-GRPO) framework. SNRA employs a dynamically parameterized Sigmoid function to transform raw feedback into a dense,…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsMultimodal Machine Learning Applications · Advanced Neural Network Applications · Domain Adaptation and Few-Shot Learning