Mono3DV: Monocular 3D Object Detection with 3D-Aware Bipartite Matching and Variational Query DeNoising
Kiet Dang Vu, Trung Thai Tran, Kien Nguyen Do Trung, Duc Dung Nguyen

TL;DR
Mono3DV introduces a Transformer-based monocular 3D object detection framework that incorporates 3D-aware matching, a denoising scheme, and variational query denoising to improve accuracy without external data.
Contribution
The paper presents a novel 3D-aware bipartite matching strategy, a 3D denoising scheme, and a variational query denoising mechanism, advancing monocular 3D detection performance.
Findings
Achieves state-of-the-art results on KITTI benchmark
Effectively incorporates 3D geometry into matching process
Significantly improves stability and accuracy of 3D predictions
Abstract
While DETR-like architectures have demonstrated significant potential for monocular 3D object detection, they are often hindered by a critical limitation: the exclusion of 3D attributes from the bipartite matching process. This exclusion arises from the inherent ill-posed nature of 3D estimation from monocular image, which introduces instability during training. Consequently, high-quality 3D predictions can be erroneously suppressed by 2D-only matching criteria, leading to suboptimal results. To address this, we propose Mono3DV, a novel Transformer-based framework. Our approach introduces three key innovations. First, we develop a 3D-Aware Bipartite Matching strategy that directly incorporates 3D geometric information into the matching cost, resolving the misalignment caused by purely 2D criteria. Second, it is important to stabilize the Bipartite Matching to resolve the instability…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsAdvanced Neural Network Applications · 3D Shape Modeling and Analysis · Image and Signal Denoising Methods
