An evaluation of data augmentation methods for sound scene geotagging

Helen L. Bear; Veronica Morfi; Emmanouil Benetos

arXiv:2110.04585·eess.AS·October 12, 2021

An evaluation of data augmentation methods for sound scene geotagging

Helen L. Bear, Veronica Morfi, Emmanouil Benetos

PDF

Open Access

TL;DR

This paper evaluates various data augmentation techniques to enhance the accuracy of sound scene geotagging, significantly improving city-level geolocation performance in audio classification tasks.

Contribution

It systematically compares common audio data augmentation methods and demonstrates a 23% accuracy improvement over the existing state-of-the-art city geotagging approach.

Findings

01

Data augmentation methods can significantly improve geotagging accuracy.

02

The best augmentation method increased accuracy by 23%.

03

Enhanced geotagging performance advances audio surveillance applications.

Abstract

Sound scene geotagging is a new topic of research which has evolved from acoustic scene classification. It is motivated by the idea of audio surveillance. Not content with only describing a scene in a recording, a machine which can locate where the recording was captured would be of use to many. In this paper we explore a series of common audio data augmentation methods to evaluate which best improves the accuracy of audio geotagging classifiers. Our work improves on the state-of-the-art city geotagging method by 23% in terms of classification accuracy.

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsMusic and Audio Processing · Speech and Audio Processing · Music Technology and Sound Studies