TL;DR
This paper introduces PlaNet, a deep learning system that classifies images into geographic regions to determine their location, outperforming previous methods and achieving superhuman accuracy by leveraging multi-scale cues and temporal coherence.
Contribution
The paper presents a novel classification-based approach to photo geolocation using deep CNNs trained on millions of images, surpassing prior landmark recognition methods.
Findings
PlaNet outperforms previous geolocation approaches.
The model achieves superhuman accuracy in some cases.
Combining PlaNet with LSTM improves performance by 50%.
Abstract
Is it possible to build a system to determine the location where a photo was taken using just its pixels? In general, the problem seems exceptionally difficult: it is trivial to construct situations where no location can be inferred. Yet images often contain informative cues such as landmarks, weather patterns, vegetation, road markings, and architectural details, which in combination may allow one to determine an approximate location and occasionally an exact location. Websites such as GeoGuessr and View from your Window suggest that humans are relatively good at integrating these cues to geolocate images, especially en-masse. In computer vision, the photo geolocation problem is usually approached using image retrieval methods. In contrast, we pose the problem as one of classification by subdividing the surface of the earth into thousands of multi-scale geographic cells, and train a…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
10 Even Cooler Deep Learning Applications | Two Minute Papers #59· youtube
