Audio Geolocation: A Natural Sounds Benchmark

Mustafa Chasmai; Wuao Liu; Subhransu Maji; Grant Van Horn

arXiv:2505.18726·cs.SD·July 23, 2025

Audio Geolocation: A Natural Sounds Benchmark

Mustafa Chasmai, Wuao Liu, Subhransu Maji, Grant Van Horn

PDF

Open Access 1 Repo

TL;DR

This paper explores the feasibility of determining geographic location solely from audio recordings, using spectrograms and species vocalization cues, and introduces multimodal approaches combining audio and visual data.

Contribution

It formalizes the global-scale audio geolocation problem, benchmarks image geolocation techniques on audio data, and proposes methods leveraging species ranges and multimodal cues for improved localization.

Findings

01

Spectrogram-based methods provide baseline geolocation performance.

02

Species vocalizations offer strong geographic cues due to their range restrictions.

03

Multimodal audio-visual approaches enhance geolocation accuracy.

Abstract

Can we determine someone's geographic location purely from the sounds they hear? Are acoustic signals enough to localize within a country, state, or even city? We tackle the challenge of global-scale audio geolocation, formalize the problem, and conduct an in-depth analysis with wildlife audio from the iNatSounds dataset. Adopting a vision-inspired approach, we convert audio recordings to spectrograms and benchmark existing image geolocation techniques. We hypothesize that species vocalizations offer strong geolocation cues due to their defined geographic ranges and propose an approach that integrates species range prediction with retrieval-based geolocation. We further evaluate whether geolocation improves when analyzing species-rich recordings or when aggregating across spatiotemporal neighborhoods. Finally, we introduce case studies from movies to explore multimodal geolocation using…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

cvl-umass/nat-sound2loc-code
pytorchOfficial

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsMusic and Audio Processing