SPAN: Spatial-Projection Alignment for Monocular 3D Object Detection

Yifan Wang; Yian Zhao; Fanqi Pu; Xiaochen Yang; Yang Tang; Xi Chen; Wenming Yang

arXiv:2511.06702·cs.CV·March 11, 2026

SPAN: Spatial-Projection Alignment for Monocular 3D Object Detection

Yifan Wang, Yian Zhao, Fanqi Pu, Xiaochen Yang, Yang Tang, Xi Chen, Wenming Yang

PDF

Open Access

TL;DR

This paper introduces SPAN, a novel method for monocular 3D object detection that enforces geometric consistency through spatial and projection alignment, significantly improving detection accuracy.

Contribution

The paper proposes Spatial-Projection Alignment (SPAN) with spatial point and 3D-2D projection alignment, addressing geometric inconsistency in decoupled 3D attribute prediction.

Findings

01

Improves monocular 3D detection accuracy

02

Integrates seamlessly with existing detectors

03

Demonstrates significant performance gains

Abstract

Existing monocular 3D detectors typically tame the pronounced nonlinear regression of 3D bounding box through decoupled prediction paradigm, which employs multiple branches to estimate geometric center, depth, dimensions, and rotation angle separately. Although this decoupling strategy simplifies the learning process, it inherently ignores the geometric collaborative constraints between different attributes, resulting in the lack of geometric consistency prior, thereby leading to suboptimal performance. To address this issue, we propose novel Spatial-Projection Alignment (SPAN) with two pivotal components: (i). Spatial Point Alignment enforces an explicit global spatial constraint between the predicted and ground-truth 3D bounding boxes, thereby rectifying spatial drift caused by decoupled attribute regression. (ii). 3D-2D Projection Alignment ensures that the projected 3D box is…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsAdvanced Neural Network Applications · Advanced Image and Video Retrieval Techniques · Robotics and Sensor-Based Localization