Generative AI-based data augmentation for improved bioacoustic classification in noisy environments

Anthony Gibbons; Emma King; Ian Donohue; Andrew Parnell

arXiv:2412.01530·cs.SD·December 16, 2025·2 cites

Generative AI-based data augmentation for improved bioacoustic classification in noisy environments

Anthony Gibbons, Emma King, Ian Donohue, Andrew Parnell

PDF

Open Access

TL;DR

This paper explores the use of generative AI models, specifically ACGAN and DDPM, to synthesize spectrograms for augmenting training data in bioacoustic classification, improving accuracy especially in noisy environments.

Contribution

It introduces the use of Denoising Diffusion Probabilistic Models for spectrogram synthesis and provides a new bird call dataset from wind farm sites for robust AI training.

Findings

01

DDPM generated more realistic spectrograms and improved classification accuracy.

02

Synthetic data augmentation enhanced classifier performance across models.

03

The approach is effective for rare species detection in noisy environments.

Abstract

Obtaining data to train robust artificial intelligence (AI)-based models for species classification can be challenging, particularly for rare species. Data augmentation can boost classification accuracy by increasing the diversity of training data and is cheaper to obtain than expert-labelled data. However, many classic image-based augmentation techniques are not suitable for audio spectrograms. We investigate two generative AI models as data augmentation tools to synthesise spectrograms and supplement audio data: Auxiliary Classifier Generative Adversarial Networks (ACGAN) and Denoising Diffusion Probabilistic Models (DDPMs). The latter performed particularly well in terms of both realism of generated spectrograms and accuracy in a resulting classification task. Alongside these new approaches, we present a new audio data set of 640 hours of bird calls from wind farm sites in Ireland,…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsMusic and Audio Processing · Animal Vocal Communication and Behavior · Underwater Acoustics Research