PIGEON: Predicting Image Geolocations
Lukas Haas, Michal Skreta, Silas Alberti, Chelsea Finn

TL;DR
This paper introduces PIGEON and PIGEOTTO, two advanced image geolocalization models that significantly improve accuracy and generalization across unseen locations using novel training techniques and retrieval methods.
Contribution
The paper presents a new geolocalization system combining semantic geocell creation, multi-task contrastive pretraining, and location cluster retrieval, achieving state-of-the-art results and better generalization.
Findings
PIGEON places over 40% of guesses within 25 km globally.
PIGEOTTO outperforms previous SOTA by up to 7.7% on city accuracy.
PIGEOTTO generalizes effectively to unseen locations.
Abstract
Planet-scale image geolocalization remains a challenging problem due to the diversity of images originating from anywhere in the world. Although approaches based on vision transformers have made significant progress in geolocalization accuracy, success in prior literature is constrained to narrow distributions of images of landmarks, and performance has not generalized to unseen places. We present a new geolocalization system that combines semantic geocell creation, multi-task contrastive pretraining, and a novel loss function. Additionally, our work is the first to perform retrieval over location clusters for guess refinements. We train two models for evaluations on street-level data and general-purpose image geolocalization; the first model, PIGEON, is trained on data from the game of Geoguessr and is capable of placing over 40% of its guesses within 25 kilometers of the target…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsAdvanced Image and Video Retrieval Techniques · Multimodal Machine Learning Applications · Domain Adaptation and Few-Shot Learning
MethodsMulti-Head Attention · Attention Is All You Need · Softmax · Linear Layer · Layer Normalization · Dense Connections · Residual Connection · Vision Transformer · Contrastive Language-Image Pre-training
