EVT: Efficient View Transformation for Multi-Modal 3D Object Detection

Yongjin Lee; Hyeon-Mun Jeong; Yurim Jeon; Sanghyun Kim

arXiv:2411.10715·cs.CV·July 14, 2025

EVT: Efficient View Transformation for Multi-Modal 3D Object Detection

Yongjin Lee, Hyeon-Mun Jeong, Yurim Jeon, Sanghyun Kim

PDF

Open Access

TL;DR

EVT introduces a novel view transformation framework for multi-modal 3D object detection that enhances accuracy and efficiency by leveraging LiDAR guidance and geometry-aware attention, achieving state-of-the-art results in real-time.

Contribution

The paper proposes EVT, a new view transformation method that improves BEV representation quality and detection accuracy while reducing computational overhead.

Findings

01

Achieves 75.3% NDS on nuScenes test set.

02

Provides real-time inference speed.

03

Outperforms existing methods in accuracy and efficiency.

Abstract

Multi-modal sensor fusion in Bird's Eye View (BEV) representation has become the leading approach for 3D object detection. However, existing methods often rely on depth estimators or transformer encoders to transform image features into BEV space, which reduces robustness or introduces significant computational overhead. Moreover, the insufficient geometric guidance in view transformation results in ray-directional misalignments, limiting the effectiveness of BEV representations. To address these challenges, we propose Efficient View Transformation (EVT), a novel 3D object detection framework that constructs a well-structured BEV representation, improving both accuracy and efficiency. Our approach focuses on two key aspects. First, Adaptive Sampling and Adaptive Projection (ASAP), which utilizes LiDAR guidance to generate 3D sampling points and adaptive kernels, enables more effective…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsAdvanced Neural Network Applications · Advanced Image and Video Retrieval Techniques · Robotics and Sensor-Based Localization

MethodsSparse Evolutionary Training