Detection and classification of vocal productions in large scale audio   recordings

Guillem Bonafos; Pierre Pudlo; Jean-Marc Freyermuth; Thierry Legou,; Jo\"el Fagot; Samuel Tron\c{c}on; Arnaud Rey

arXiv:2302.07640·cs.SD·August 14, 2023·1 cites

Detection and classification of vocal productions in large scale audio recordings

Guillem Bonafos, Pierre Pudlo, Jean-Marc Freyermuth, Thierry Legou,, Jo\"el Fagot, Samuel Tron\c{c}on, Arnaud Rey

PDF

Open Access 1 Repo

TL;DR

This paper introduces an automated deep learning pipeline for detecting and classifying vocalizations in large-scale, noisy natural audio recordings, effectively handling diverse conditions with minimal labeled data.

Contribution

The novel end-to-end pipeline combines multiple computational techniques to train neural networks on limited data, enabling scalable vocal production analysis in natural environments.

Findings

01

Achieved over 94% accuracy on two datasets

02

Processed hundreds of hours of recordings to create new vocal databases

03

Effective in noisy, real-world recording conditions

Abstract

We propose an automatic data processing pipeline to extract vocal productions from large-scale natural audio recordings and classify these vocal productions. The pipeline is based on a deep neural network and adresses both issues simultaneously. Though a series of computationel steps (windowing, creation of a noise class, data augmentation, re-sampling, transfer learning, Bayesian optimisation), it automatically trains a neural network without requiring a large sample of labeled data and important computing resources. Our end-to-end methodology can handle noisy recordings made under different recording conditions. We test it on two different natural audio data sets, one from a group of Guinea baboons recorded from a primate research center and one from human babies recorded at home. The pipeline trains a model on 72 and 77 minutes of labeled audio recordings, with an accuracy of 94.58%…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

https://gitlab.com/papers4375727/detection-and-classification-of-vocal-productions
tfOfficial

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsMusic and Audio Processing · Speech and Audio Processing · Music Technology and Sound Studies

MethodsTest