EVT: Efficient View Transformation for Multi-Modal 3D Object Detection
Yongjin Lee, Hyeon-Mun Jeong, Yurim Jeon, Sanghyun Kim

TL;DR
EVT introduces a novel view transformation framework for multi-modal 3D object detection that enhances accuracy and efficiency by leveraging LiDAR guidance and geometry-aware attention, achieving state-of-the-art results in real-time.
Contribution
The paper proposes EVT, a new view transformation method that improves BEV representation quality and detection accuracy while reducing computational overhead.
Findings
Achieves 75.3% NDS on nuScenes test set.
Provides real-time inference speed.
Outperforms existing methods in accuracy and efficiency.
Abstract
Multi-modal sensor fusion in Bird's Eye View (BEV) representation has become the leading approach for 3D object detection. However, existing methods often rely on depth estimators or transformer encoders to transform image features into BEV space, which reduces robustness or introduces significant computational overhead. Moreover, the insufficient geometric guidance in view transformation results in ray-directional misalignments, limiting the effectiveness of BEV representations. To address these challenges, we propose Efficient View Transformation (EVT), a novel 3D object detection framework that constructs a well-structured BEV representation, improving both accuracy and efficiency. Our approach focuses on two key aspects. First, Adaptive Sampling and Adaptive Projection (ASAP), which utilizes LiDAR guidance to generate 3D sampling points and adaptive kernels, enables more effective…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsAdvanced Neural Network Applications · Advanced Image and Video Retrieval Techniques · Robotics and Sensor-Based Localization
MethodsSparse Evolutionary Training
