LidaRefer: Context-aware Outdoor 3D Visual Grounding for Autonomous Driving
Yeong-Seung Baek, Heung-Seon Oh

TL;DR
LidaRefer is a novel framework for outdoor 3D visual grounding in autonomous driving, utilizing context-aware features, transformer architecture, and a new supervision strategy to improve object localization accuracy.
Contribution
It introduces a context-aware 3D visual grounding method with an object-centric feature selection, transformer-based cross-modal alignment, and a novel supervision strategy for outdoor scenes.
Findings
Achieves state-of-the-art performance on Talk2Car-3D dataset.
Effectively models spatial relationships between objects.
Improves accuracy in outdoor 3D visual grounding tasks.
Abstract
3D visual grounding (VG) aims to locate objects or regions within 3D scenes guided by natural language descriptions. While indoor 3D VG has advanced, outdoor 3D VG remains underexplored due to two challenges: (1) large-scale outdoor LiDAR scenes are dominated by background points and contain limited foreground information, making cross-modal alignment and contextual understanding more difficult; and (2) most outdoor datasets lack spatial annotations for referential non-target objects, which hinders explicit learning of referential context. To this end, we propose LidaRefer, a context-aware 3D VG framework for outdoor scenes. LidaRefer incorporates an object-centric feature selection strategy to focus on semantically relevant visual features while reducing computational overhead. Then, its transformer-based encoder-decoder architecture excels at establishing fine-grained cross-modal…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsRobotics and Sensor-Based Localization · Computer Graphics and Visualization Techniques · Augmented Reality Applications
