Detection and classification of vocal productions in large scale audio recordings
Guillem Bonafos, Pierre Pudlo, Jean-Marc Freyermuth, Thierry Legou,, Jo\"el Fagot, Samuel Tron\c{c}on, Arnaud Rey

TL;DR
This paper introduces an automated deep learning pipeline for detecting and classifying vocalizations in large-scale, noisy natural audio recordings, effectively handling diverse conditions with minimal labeled data.
Contribution
The novel end-to-end pipeline combines multiple computational techniques to train neural networks on limited data, enabling scalable vocal production analysis in natural environments.
Findings
Achieved over 94% accuracy on two datasets
Processed hundreds of hours of recordings to create new vocal databases
Effective in noisy, real-world recording conditions
Abstract
We propose an automatic data processing pipeline to extract vocal productions from large-scale natural audio recordings and classify these vocal productions. The pipeline is based on a deep neural network and adresses both issues simultaneously. Though a series of computationel steps (windowing, creation of a noise class, data augmentation, re-sampling, transfer learning, Bayesian optimisation), it automatically trains a neural network without requiring a large sample of labeled data and important computing resources. Our end-to-end methodology can handle noisy recordings made under different recording conditions. We test it on two different natural audio data sets, one from a group of Guinea baboons recorded from a primate research center and one from human babies recorded at home. The pipeline trains a model on 72 and 77 minutes of labeled audio recordings, with an accuracy of 94.58%…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsMusic and Audio Processing · Speech and Audio Processing · Music Technology and Sound Studies
MethodsTest
