MusicNet: Compact Convolutional Neural Network for Real-time Background   Music Detection

Chandan K.A. Reddy; Vishak Gopa; Harishchandra Dubey; Sergiy; Matusevych; Ross Cutler; Robert Aichner

arXiv:2110.04331·eess.AS·April 18, 2022

MusicNet: Compact Convolutional Neural Network for Real-time Background Music Detection

Chandan K.A. Reddy, Vishak Gopa, Harishchandra Dubey, Sergiy, Matusevych, Ross Cutler, Robert Aichner

PDF

Open Access

TL;DR

MusicNet is a compact neural network designed for real-time background music detection in online meetings, outperforming larger models in accuracy and speed, thus enhancing user experience in noisy audio environments.

Contribution

The paper introduces MusicNet, a small, efficient CNN model that accurately detects background music without complex feature extraction, suitable for real-time communication systems.

Findings

01

MusicNet achieves 81.3% TPR at 0.1% FPR.

02

It is 10 times smaller and 4 times faster than comparable models.

03

Outperforms 20 state-of-the-art models in accuracy and efficiency.

Abstract

With the recent growth of remote work, online meetings often encounter challenging audio contexts such as background noise, music, and echo. Accurate real-time detection of music events can help to improve the user experience. In this paper, we present MusicNet, a compact neural model for detecting background music in the real-time communications pipeline. In video meetings, music frequently co-occurs with speech and background noises, making the accurate classification quite challenging. We propose a compact convolutional neural network core preceded by an in-model featurization layer. MusicNet takes 9 seconds of raw audio as input and does not require any model-specific featurization in the product stack. We train our model on the balanced subset of the Audio Set~\cite{gemmeke2017audio} data and validate it on 1000 crowd-sourced real test clips. Finally, we compare MusicNet…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsMusic and Audio Processing · Speech and Audio Processing · Speech Recognition and Synthesis

MethodsTest