VLG-Loc: Vision-Language Global Localization from Labeled Footprint Maps

Mizuho Aoki; Kohei Honda; Yasuhiro Yoshimura; Takeshi Ishita; Ryo Yonetani

arXiv:2512.12793·cs.RO·December 19, 2025

VLG-Loc: Vision-Language Global Localization from Labeled Footprint Maps

Mizuho Aoki, Kohei Honda, Yasuhiro Yoshimura, Takeshi Ishita, Ryo Yonetani

PDF

Open Access

TL;DR

VLG-Loc introduces a vision-language based global localization approach that uses labeled footprint maps and a vision-language model to robustly localize robots in changing environments, outperforming scan-based methods.

Contribution

The paper proposes a novel localization method combining vision-language models with footprint maps, enabling robust localization without geometric details.

Findings

01

Outperforms existing scan-based localization methods in robustness.

02

Effective in simulated and real-world retail environments.

03

Improved accuracy through probabilistic fusion of visual and scan data.

Abstract

This paper presents Vision-Language Global Localization (VLG-Loc), a novel global localization method that uses human-readable labeled footprint maps containing only names and areas of distinctive visual landmarks in an environment. While humans naturally localize themselves using such maps, translating this capability to robotic systems remains highly challenging due to the difficulty of establishing correspondences between observed landmarks and those in the map without geometric and appearance details. To address this challenge, VLG-Loc leverages a vision-language model (VLM) to search the robot's multi-directional image observations for the landmarks noted in the map. The method then identifies robot poses within a Monte Carlo localization framework, where the found landmarks are used to evaluate the likelihood of each pose hypothesis. Experimental validation in simulated and…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsRobotics and Sensor-Based Localization · Advanced Image and Video Retrieval Techniques · Face recognition and analysis