Open 3D World in Autonomous Driving
Xinlong Cheng, Lei Li

TL;DR
This paper introduces a novel multimodal framework that combines 3D LIDAR point cloud data with textual information to improve open vocabulary object detection in autonomous driving, evaluated on new and existing datasets.
Contribution
It presents an innovative method for fusing 3D point cloud features with textual data to enhance open vocabulary perception in autonomous driving environments.
Findings
Effective fusion of BEV features with textual data improves detection accuracy.
The approach demonstrates strong zero-shot performance on unseen datasets.
The framework advances open vocabulary perception in large-scale 3D outdoor environments.
Abstract
The capability for open vocabulary perception represents a significant advancement in autonomous driving systems, facilitating the comprehension and interpretation of a wide array of textual inputs in real-time. Despite extensive research in open vocabulary tasks within 2D computer vision, the application of such methodologies to 3D environments, particularly within large-scale outdoor contexts, remains relatively underdeveloped. This paper presents a novel approach that integrates 3D point cloud data, acquired from LIDAR sensors, with textual information. The primary focus is on the utilization of textual data to directly localize and identify objects within the autonomous driving context. We introduce an efficient framework for the fusion of bird's-eye view (BEV) region features with textual features, thereby enabling the system to seamlessly adapt to novel textual inputs and…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsAutonomous Vehicle Technology and Safety
MethodsFocus
