TL;DR
This paper introduces TOL, a new large-scale benchmark and a localization framework that estimates urban positions from textual descriptions using OpenStreetMap data, without relying on geometric observations.
Contribution
It formulates the novel Text-to-OSM localization task, creates the TOL benchmark, and proposes TOLoc, a coarse-to-fine framework that leverages semantic and directional information for accurate localization.
Findings
TOLoc outperforms existing methods by over 6% at 5m accuracy.
The benchmark covers 316 km of urban environments across three cities.
TOLoc demonstrates strong generalization to unseen environments.
Abstract
Natural language provides an intuitive way to express spatial intent in geospatial applications. While existing localization methods often rely on dense point cloud maps or high-resolution imagery, OpenStreetMap (OSM) offers a compact and freely available map representation that encodes rich semantic and structural information, making it well-suited for large-scale localization. However, text-to-OSM (T2O) localization remains largely unexplored. In this paper, we formulate the T2O localization task, which aims to estimate accurate 2D positions in urban environments from textual scene descriptions without relying on geometric observations or GNSS-based initial location. To support the proposed task, we introduce TOL, a large-scale benchmark spanning multiple continents and diverse urban environments. TOL contains approximately 121K textual queries paired with OSM map tiles and covers…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
