Reinforcing 3D Understanding in Point-VLMs via Geometric Reward Credit Assignment

Jingkun Chen; Ruoshi Xu; Mingqi Gao; Shengda Luo; Jungong Han

arXiv:2604.21160·cs.CV·April 24, 2026

Reinforcing 3D Understanding in Point-VLMs via Geometric Reward Credit Assignment

Jingkun Chen, Ruoshi Xu, Mingqi Gao, Shengda Luo, Jungong Han

PDF

TL;DR

This paper introduces a novel reinforcement learning framework for Point-Vision-Language Models that improves 3D structure accuracy and physical consistency by disentangling supervision signals and enforcing geometric constraints.

Contribution

It proposes Geometric Reward Credit Assignment and a Reprojection-Consistency term to enhance 3D spatial reasoning and physical plausibility in point-based vision-language models.

Findings

01

Boosted 3D KPA from 0.64 to 0.93

02

Increased 3D bounding box IoU to 0.686

03

Raised reprojection consistency to 0.852

Abstract

Point-Vision-Language Models promise to empower embodied agents with executable spatial reasoning, yet they frequently succumb to geometric hallucination where predicted 3D structures contradict the observed 2D reality. We identify a key cause of this failure not as a representation bottleneck but as a structural misalignment in reinforcement learning, where sparse geometric tokens are drowned out by noisy and broadcasted sequence-level rewards. To resolve this causal dilution, we propose Geometric Reward Credit Assignment, a framework that disentangles holistic supervision into field-specific signals and routes them exclusively to their responsible token spans. This mechanism transforms vague feedback into precise gradient updates and effectively turns generic policy optimization into targeted structural alignment. Furthermore, we internalize physical constraints via a…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.