Artificially Synthesising Data for Audio Classification and Segmentation   to Improve Speech and Music Detection in Radio Broadcast

Satvik Venkatesh; David Moffat; Alexis Kirke; G\"ozel Shakeri; Stephen; Brewster; J\"org Fachner; Helen Odell-Miller; Alex Street; Nicolas Farina,; Sube Banerjee; and Eduardo Reck Miranda

arXiv:2102.09959·eess.AS·February 22, 2021

Artificially Synthesising Data for Audio Classification and Segmentation to Improve Speech and Music Detection in Radio Broadcast

Satvik Venkatesh, David Moffat, Alexis Kirke, G\"ozel Shakeri, Stephen, Brewster, J\"org Fachner, Helen Odell-Miller, Alex Street, Nicolas Farina,, Sube Banerjee, and Eduardo Reck Miranda

PDF

2 Repos

TL;DR

This paper introduces a novel data synthesis method that mimics radio signals to train deep learning models for audio segmentation, significantly reducing the need for real annotated data and improving detection accuracy.

Contribution

The study presents a new data synthesis procedure for radio-like audio signals, enhancing training datasets for deep neural networks in speech and music detection tasks.

Findings

01

Synthesised data improves model performance over existing methods.

02

CRNN trained on synthetic data outperforms state-of-the-art algorithms.

03

The approach reduces reliance on costly annotated datasets.

Abstract

Segmenting audio into homogeneous sections such as music and speech helps us understand the content of audio. It is useful as a pre-processing step to index, store, and modify audio recordings, radio broadcasts and TV programmes. Deep learning models for segmentation are generally trained on copyrighted material, which cannot be shared. Annotating these datasets is time-consuming and expensive and therefore, it significantly slows down research progress. In this study, we present a novel procedure that artificially synthesises data that resembles radio signals. We replicate the workflow of a radio DJ in mixing audio and investigate parameters like fade curves and audio ducking. We trained a Convolutional Recurrent Neural Network (CRNN) on this synthesised data and outperformed state-of-the-art algorithms for music-speech detection. This paper demonstrates the data synthesis procedure as…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.