G^3: Geolocation via Guidebook Grounding
Grace Luo, Giscard Biamby, Trevor Darrell, Daniel Fried, Anna Rohrbach

TL;DR
This paper introduces a novel geolocation method that leverages human-written guidebooks to improve image location prediction, outperforming existing image-only approaches by over 5% in accuracy.
Contribution
It proposes the task of guidebook-grounded geolocation, utilizing textual clues from guidebooks to enhance visual geolocation accuracy, and introduces a new dataset and method for this purpose.
Findings
Outperforms state-of-the-art image-only geolocation methods by over 5% in Top-1 accuracy.
Supervising attention with country-level pseudo labels improves performance.
Uses a dataset of StreetView images and associated guidebooks for training and evaluation.
Abstract
We demonstrate how language can improve geolocation: the task of predicting the location where an image was taken. Here we study explicit knowledge from human-written guidebooks that describe the salient and class-discriminative visual features humans use for geolocation. We propose the task of Geolocation via Guidebook Grounding that uses a dataset of StreetView images from a diverse set of locations and an associated textual guidebook for GeoGuessr, a popular interactive geolocation game. Our approach predicts a country for each image by attending over the clues automatically extracted from the guidebook. Supervising attention with country-level pseudo labels achieves the best performance. Our approach substantially outperforms a state-of-the-art image-only geolocation method, with an improvement of over 5% in Top-1 accuracy. Our dataset and code can be found at…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsMultimodal Machine Learning Applications · Advanced Image and Video Retrieval Techniques · Video Analysis and Summarization
