VLM-Loc: Localization in Point Cloud Maps via Vision-Language Models
Shuhao Kang, Youqi Liao, Peijie Wang, Wenlong Liao, Qilin Zhang, Benjamin Busam, Xieyuanli Chen, Yun Liu

TL;DR
VLM-Loc introduces a novel framework that uses vision-language models with spatial reasoning to improve text-based localization in 3D point cloud maps, outperforming existing methods.
Contribution
The paper presents VLM-Loc, a new approach that leverages large vision-language models with structured spatial representations for accurate point cloud localization from natural language.
Findings
VLM-Loc achieves higher accuracy than previous methods.
The approach demonstrates robustness across diverse environments.
CityLoc benchmark enables systematic evaluation of T2P localization.
Abstract
Text-to-point-cloud (T2P) localization aims to infer precise spatial positions within 3D point cloud maps from natural language descriptions, reflecting how humans perceive and communicate spatial layouts through language. However, existing methods largely rely on shallow text-point cloud correspondence without effective spatial reasoning, limiting their accuracy in complex environments. To address this limitation, we propose VLM-Loc, a framework that leverages the spatial reasoning capability of large vision-language models (VLMs) for T2P localization. Specifically, we transform point clouds into bird's-eye-view (BEV) images and scene graphs that jointly encode geometric and semantic context, providing structured inputs for the VLM to learn cross-modal representations bridging linguistic and spatial semantics. On top of these representations, we introduce a partial node assignment…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsMultimodal Machine Learning Applications · Advanced Neural Network Applications · Robotics and Sensor-Based Localization
