MonoVQD: Monocular 3D Object Detection with Variational Query Denoising and Self-Distillation
Kiet Dang Vu, Trung Thai Tran, Duc Dung Nguyen

TL;DR
MonoVQD introduces a novel framework for monocular 3D object detection that enhances DETR-based architectures through variational query denoising, self-distillation, and a new self-attention mechanism, achieving superior results on benchmarks.
Contribution
The paper presents three innovations: a mask separated self-attention mechanism, variational query denoising to address gradient issues, and a self-distillation strategy, advancing DETR-based monocular 3D detection.
Findings
Achieves state-of-the-art performance on KITTI benchmark.
Enhances detection accuracy in multi-view 3D scenarios.
Demonstrates broad applicability and robustness of the proposed methods.
Abstract
Precisely localizing 3D objects from a single image constitutes a central challenge in monocular 3D detection. While DETR-like architectures offer a powerful paradigm, their direct application in this domain encounters inherent limitations, preventing optimal performance. Our work addresses these challenges by introducing MonoVQD, a novel framework designed to fundamentally advance DETR-based monocular 3D detection. We propose three main contributions. First, we propose the Mask Separated Self-Attention mechanism that enables the integration of the denoising process into a DETR architecture. This improves the stability of Hungarian matching to achieve a consistent optimization objective. Second, we present the Variational Query Denoising technique to address the gradient vanishing problem of conventional denoising methods, which severely restricts the efficiency of the denoising…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsIndustrial Vision Systems and Defect Detection
