Localizing and Orienting Street Views Using Overhead Imagery
Nam Vo, James Hays

TL;DR
This paper introduces a new dataset and deep learning methods for accurately localizing and orienting street view images using overhead imagery, addressing challenges like viewpoint and orientation differences.
Contribution
The paper proposes novel deep CNN architectures and a new loss function for improved cross-domain image matching in geolocalization tasks.
Findings
Best models are 2.5 times more accurate than baseline Siamese networks.
Explicit orientation supervision enhances location prediction accuracy.
A new large-scale dataset with one million image pairs was collected for this task.
Abstract
In this paper we aim to determine the location and orientation of a ground-level query image by matching to a reference database of overhead (e.g. satellite) images. For this task we collect a new dataset with one million pairs of street view and overhead images sampled from eleven U.S. cities. We explore several deep CNN architectures for cross-domain matching -- Classification, Hybrid, Siamese, and Triplet networks. Classification and Hybrid architectures are accurate but slow since they allow only partial feature precomputation. We propose a new loss function which significantly improves the accuracy of Siamese and Triplet embedding networks while maintaining their applicability to large-scale retrieval tasks like image geolocalization. This image matching task is challenging not just because of the dramatic viewpoint difference between ground-level and overhead imagery but because…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsAdvanced Image and Video Retrieval Techniques · Robotics and Sensor-Based Localization · Advanced Neural Network Applications
