Pix2Map: Cross-modal Retrieval for Inferring Street Maps from Images
Xindi Wu, KwunFung Lau, Francesco Ferroni, Aljo\v{s}a O\v{s}ep, Deva, Ramanan

TL;DR
Pix2Map introduces a novel cross-modal retrieval approach that infers urban street map topology directly from ego-view images, enabling map updating and expansion for autonomous navigation.
Contribution
The paper presents a new method for inferring street maps from images by learning a joint embedding space for images and map graphs, facilitating accurate retrieval of map topologies.
Findings
Accurately retrieves street maps from images for seen and unseen roads.
Enables map updating and expansion using image data.
Proof-of-concept for visual localization and image retrieval from spatial graphs.
Abstract
Self-driving vehicles rely on urban street maps for autonomous navigation. In this paper, we introduce Pix2Map, a method for inferring urban street map topology directly from ego-view images, as needed to continually update and expand existing maps. This is a challenging task, as we need to infer a complex urban road topology directly from raw image data. The main insight of this paper is that this problem can be posed as cross-modal retrieval by learning a joint, cross-modal embedding space for images and existing maps, represented as discrete graphs that encode the topological layout of the visual surroundings. We conduct our experimental evaluation using the Argoverse dataset and show that it is indeed possible to accurately retrieve street maps corresponding to both seen and unseen roads solely from image data. Moreover, we show that our retrieved maps can be used to update or…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsAutomated Road and Building Extraction · Video Surveillance and Tracking Methods · Advanced Image and Video Retrieval Techniques
