BioSEN: A Bio-acoustic Signal Enhancement Network for Animal Vocalizations
Tianyu Song, Ton Viet Ta, Ngamta Thamwattana, Hisako Nomura, Linh Thi Hoai Nguyen

TL;DR
BioSEN is a specialized neural network designed to enhance animal vocalizations in noisy recordings, outperforming existing speech models with less computation, aiding biodiversity monitoring.
Contribution
The paper introduces BioSEN, a novel bioacoustic signal enhancement network with unique modules tailored for animal sounds, filling a gap in bioacoustic audio processing.
Findings
BioSEN matches or exceeds state-of-the-art speech enhancement models.
BioSEN requires significantly less computation.
Effective in diverse bioacoustic datasets.
Abstract
Most work in audio enhancement targets human speech, while bioacoustics is less studied due to noisy recordings and the distinct traits of animal sounds. To fill this gap, we adapt speech enhancement methods and build BioSEN, a model made for bioacoustic signals. BioSEN has three modules: a multi-scale dual-axis attention unit for time-frequency feature extraction, a bio-harmonic multi-scale enhancement unit for capturing harmonic structures, and an energy-adaptive gating connection unit that uses frequency weights to keep vocalizations from being removed as noise. Tests on three bioacoustic datasets show that BioSEN matches or exceeds state-of-the-art speech enhancement models while using far less computation. These results show BioSEN's strength for bioacoustic audio enhancement and its promise for biodiversity monitoring and conservation.
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
