ALIGN: Advanced Query Initialization with LiDAR-Image Guidance for Occlusion-Robust 3D Object Detection
Janghyun Baek, Mincheol Chang, Seokha Moon, Seung Joon Lee, Jinkyu Kim

TL;DR
ALIGN introduces a novel query initialization method for 3D object detection that leverages LiDAR and image data to improve occlusion robustness and detection accuracy, especially in crowded scenes.
Contribution
The paper presents a new approach with three components—OCE, ANS, and DQB—that enhances query initialization for better occlusion handling in 3D detection.
Findings
Improves detection performance by up to +0.9 mAP and +1.2 NDS on nuScenes
Enhances accuracy for occluded and crowded objects
Consistently outperforms existing query initialization strategies
Abstract
Recent query-based 3D object detection methods using camera and LiDAR inputs have shown strong performance, but existing query initialization strategies,such as random sampling or BEV heatmap-based sampling, often result in inefficient query usage and reduced accuracy, particularly for occluded or crowded objects. To address this limitation, we propose ALIGN (Advanced query initialization with LiDAR and Image GuidaNce), a novel approach for occlusion-robust, object-aware query initialization. Our model consists of three key components: (i) Occlusion-aware Center Estimation (OCE), which integrates LiDAR geometry and image semantics to estimate object centers accurately (ii) Adaptive Neighbor Sampling (ANS), which generates object candidates from LiDAR clustering and supplements each object by sampling spatially and semantically aligned points around it and (iii) Dynamic Query Balancing…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsAdvanced Neural Network Applications · Robotics and Sensor-Based Localization · Advanced Image and Video Retrieval Techniques
