Learnable Acoustic Frontends in Bird Activity Detection
Mark Anderson, Naomi Harte

TL;DR
This paper evaluates traditional and learnable acoustic frontends for bird activity detection, showing learnable frontends generally outperform fixed methods, with Per-Channel Energy Normalization achieving 89.9% accuracy.
Contribution
It introduces a benchmarking of learnable acoustic frontends against traditional methods for bird activity detection using DCASE2018 data.
Findings
Learnable frontends outperform traditional fixed-parameter methods.
Per-Channel Energy Normalization achieves the highest accuracy of 89.9%.
Challenges exist in learning filterbanks for bird audio.
Abstract
Autonomous recording units and passive acoustic monitoring present minimally intrusive methods of collecting bioacoustics data. Combining this data with species agnostic bird activity detection systems enables the monitoring of activity levels of bird populations. Unfortunately, variability in ambient noise levels and subject distance contribute to difficulties in accurately detecting bird activity in recordings. The choice of acoustic frontend directly affects the impact these issues have on system performance. In this paper, we benchmark traditional fixed-parameter acoustic frontends against the new generation of learnable frontends on a wide-ranging bird audio detection task using data from the DCASE2018 BAD Challenge. We observe that Per-Channel Energy Normalization is the best overall performer, achieving an accuracy of 89.9%, and that in general learnable frontends significantly…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsAnimal Vocal Communication and Behavior · Music and Audio Processing · Speech and Audio Processing
