An evaluation of data augmentation methods for sound scene geotagging
Helen L. Bear, Veronica Morfi, Emmanouil Benetos

TL;DR
This paper evaluates various data augmentation techniques to enhance the accuracy of sound scene geotagging, significantly improving city-level geolocation performance in audio classification tasks.
Contribution
It systematically compares common audio data augmentation methods and demonstrates a 23% accuracy improvement over the existing state-of-the-art city geotagging approach.
Findings
Data augmentation methods can significantly improve geotagging accuracy.
The best augmentation method increased accuracy by 23%.
Enhanced geotagging performance advances audio surveillance applications.
Abstract
Sound scene geotagging is a new topic of research which has evolved from acoustic scene classification. It is motivated by the idea of audio surveillance. Not content with only describing a scene in a recording, a machine which can locate where the recording was captured would be of use to many. In this paper we explore a series of common audio data augmentation methods to evaluate which best improves the accuracy of audio geotagging classifiers. Our work improves on the state-of-the-art city geotagging method by 23% in terms of classification accuracy.
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsMusic and Audio Processing · Speech and Audio Processing · Music Technology and Sound Studies
