PIGEON: Predicting Image Geolocations

Lukas Haas; Michal Skreta; Silas Alberti; Chelsea Finn

arXiv:2307.05845·cs.CV·May 30, 2024·2 cites

PIGEON: Predicting Image Geolocations

Lukas Haas, Michal Skreta, Silas Alberti, Chelsea Finn

PDF

Open Access 1 Repo

TL;DR

This paper introduces PIGEON and PIGEOTTO, two advanced image geolocalization models that significantly improve accuracy and generalization across unseen locations using novel training techniques and retrieval methods.

Contribution

The paper presents a new geolocalization system combining semantic geocell creation, multi-task contrastive pretraining, and location cluster retrieval, achieving state-of-the-art results and better generalization.

Findings

01

PIGEON places over 40% of guesses within 25 km globally.

02

PIGEOTTO outperforms previous SOTA by up to 7.7% on city accuracy.

03

PIGEOTTO generalizes effectively to unseen locations.

Abstract

Planet-scale image geolocalization remains a challenging problem due to the diversity of images originating from anywhere in the world. Although approaches based on vision transformers have made significant progress in geolocalization accuracy, success in prior literature is constrained to narrow distributions of images of landmarks, and performance has not generalized to unseen places. We present a new geolocalization system that combines semantic geocell creation, multi-task contrastive pretraining, and a novel loss function. Additionally, our work is the first to perform retrieval over location clusters for guess refinements. We train two models for evaluations on street-level data and general-purpose image geolocalization; the first model, PIGEON, is trained on data from the game of Geoguessr and is capable of placing over 40% of its guesses within 25 kilometers of the target…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

LukasHaas/PIGEON
pytorchOfficial

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsAdvanced Image and Video Retrieval Techniques · Multimodal Machine Learning Applications · Domain Adaptation and Few-Shot Learning

MethodsMulti-Head Attention · Attention Is All You Need · Softmax · Linear Layer · Layer Normalization · Dense Connections · Residual Connection · Vision Transformer · Contrastive Language-Image Pre-training