Robust Bioacoustic Detection via Richly Labelled Synthetic Soundscape Augmentation
Kaspar Soltero, Tadeu Siqueira, Stefanie Gutschmidt

TL;DR
This paper presents a synthetic soundscape generation framework that enhances bioacoustic detection robustness, reduces manual labelling effort, and improves model generalisation from limited data.
Contribution
It introduces a novel synthetic data augmentation method that creates richly labelled soundscapes, enabling robust bioacoustic detection with minimal source data.
Findings
Models trained on synthetic data generalise well to real-world soundscapes.
Performance remains high despite reduced diversity in source vocalisations.
Synthetic data generation significantly reduces manual labelling effort.
Abstract
Passive Acoustic Monitoring (PAM) analysis is often hindered by the intensive manual effort needed to create labelled training data. This study introduces a synthetic data framework to generate large volumes of richly labelled training data from very limited source material, improving the robustness of bioacoustic detection models. Our framework synthesises realistic soundscapes by combining clean background noise with isolated target vocalisations (little owl), automatically generating dynamic labels like bounding boxes during synthesis. A model fine-tuned on this data generalised well to real-world soundscapes, with performance remaining high even when the diversity of source vocalisations was drastically reduced, indicating the model learned generalised features without overfitting. This demonstrates that synthetic data generation is a highly effective strategy for training robust…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsAnimal Vocal Communication and Behavior · Music and Audio Processing · Speech and Audio Processing
