TL;DR
This paper introduces a novel end-to-end method that synthesizes realistic street views from satellite images to improve cross-view geo-localization, achieving state-of-the-art results on benchmark datasets.
Contribution
It presents the first approach to generate realistic street views from satellite images and jointly perform geo-localization in an end-to-end framework.
Findings
Achieved state-of-the-art performance on CVUSA and CVACT benchmarks.
Demonstrated effective satellite-to-street view synthesis with compelling qualitative results.
Proposed a multi-task architecture combining image synthesis and retrieval.
Abstract
The goal of cross-view image based geo-localization is to determine the location of a given street view image by matching it against a collection of geo-tagged satellite images. This task is notoriously challenging due to the drastic viewpoint and appearance differences between the two domains. We show that we can address this discrepancy explicitly by learning to synthesize realistic street views from satellite inputs. Following this observation, we propose a novel multi-task architecture in which image synthesis and retrieval are considered jointly. The rationale behind this is that we can bias our network to learn latent feature representations that are useful for retrieval if we utilize them to generate images across the two input domains. To the best of our knowledge, ours is the first approach that creates realistic street views from satellite images and localizes the…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
