Geo-ATBench: A Benchmark for Geospatial Audio Tagging with Geospatial Semantic Context
Yuanbo Hou, Yanru Wu, Qiaoqiao Ren, Shengchen Li, Stephen Roberts, Dick Botteldooren

TL;DR
This paper introduces Geo-ATBench, a new benchmark dataset for geospatial audio tagging that incorporates geographic semantic context to improve multi-label environmental sound recognition.
Contribution
It presents the Geo-AT benchmark, a novel dataset with geospatial annotations, and proposes GeoFusion-AT, a framework for fusing audio and geospatial data for better sound event tagging.
Findings
GSC improves audio tagging accuracy, especially for confounded labels.
Geo-ATBench aligns closely with human performance in sound recognition.
GeoFusion-AT effectively integrates geospatial information with audio data.
Abstract
Environmental sound understanding in computational auditory scene analysis (CASA) is often formulated as an audio-only recognition problem. This formulation leaves a persistent drawback in multi-label audio tagging (AT): acoustic similarity can make certain events difficult to separate from waveforms alone. In such cases, disambiguating cues often lie outside the waveform. Geospatial semantic context (GSC), derived from geographic information system data, e.g., points of interest (POI), provides location-tied environmental priors that can help reduce this ambiguity. A systematic study of this direction is enabled through the proposed geospatial audio tagging (Geo-AT) task, which conditions multi-label sound event tagging on GSC alongside audio. To benchmark Geo-AT, Geo-ATBench is introduced as a polyphonic audio benchmark with geographical annotations, containing 10.71 hours of audio…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsMusic and Audio Processing · Animal Vocal Communication and Behavior · Speech and Audio Processing
