Long-distance Detection of Bioacoustic Events with Per-channel Energy Normalization
Vincent Lostanlen, Kaitlin Palmer, Elly Knight, Christopher Clark,, Holger Klinck, Andrew Farnsworth, Tina Wong, Jason Cramer, Juan Pablo Bello

TL;DR
This paper demonstrates that per-channel energy normalization (PCEN) effectively detects bioacoustic events by enhancing vocalizations and reducing false alarms, outperforming traditional methods in noisy environments with minimal computational cost.
Contribution
It introduces the application of PCEN for unsupervised bioacoustic event detection, showing its advantages over logarithmic spectral flux in various noise conditions.
Findings
PCEN reduces false alarm rates by up to 50x in near field and 5x in far field.
PCEN generalizes spectral flux with a tunable time scale.
Method requires no human intervention and has moderate computational cost.
Abstract
This paper proposes to perform unsupervised detection of bioacoustic events by pooling the magnitudes of spectrogram frames after per-channel energy normalization (PCEN). Although PCEN was originally developed for speech recognition, it also has beneficial effects in enhancing animal vocalizations, despite the presence of atmospheric absorption and intermittent noise. We prove that PCEN generalizes logarithm-based spectral flux, yet with a tunable time scale for background noise estimation. In comparison with pointwise logarithm, PCEN reduces false alarm rate by 50x in the near field and 5x in the far field, both on avian and marine bioacoustic datasets. Such improvements come at moderate computational cost and require no human intervention, thus heralding a promising future for PCEN in bioacoustics.
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
