CityGuessr: City-Level Video Geo-Localization on a Global Scale
Parth Parag Kulkarni, Gaurav Kumar Nayak, Mubarak Shah

TL;DR
This paper introduces CityGuessr, a new large-scale dataset and a transformer-based model for global video geolocalization, predicting city, state, country, and continent from videos.
Contribution
It presents the first comprehensive approach for worldwide video geolocalization, including a new dataset and a novel transformer architecture with scene and text label integration.
Findings
Achieved state-of-the-art performance on CityGuessr68k and Mapillary datasets.
Demonstrated the effectiveness of self-cross attention and text label alignment.
Provided a new benchmark for global video geolocalization.
Abstract
Video geolocalization is a crucial problem in current times. Given just a video, ascertaining where it was captured from can have a plethora of advantages. The problem of worldwide geolocalization has been tackled before, but only using the image modality. Its video counterpart remains relatively unexplored. Meanwhile, video geolocalization has also garnered some attention in the recent past, but the existing methods are all restricted to specific regions. This motivates us to explore the problem of video geolocalization at a global scale. Hence, we propose a novel problem of worldwide video geolocalization with the objective of hierarchically predicting the correct city, state/province, country, and continent, given a video. However, no large scale video datasets that have extensive worldwide coverage exist, to train models for solving this problem. To this end, we introduce a new…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsGeographic Information Systems Studies
MethodsSoftmax · Attention Is All You Need
