MusicNet: Compact Convolutional Neural Network for Real-time Background Music Detection
Chandan K.A. Reddy, Vishak Gopa, Harishchandra Dubey, Sergiy, Matusevych, Ross Cutler, Robert Aichner

TL;DR
MusicNet is a compact neural network designed for real-time background music detection in online meetings, outperforming larger models in accuracy and speed, thus enhancing user experience in noisy audio environments.
Contribution
The paper introduces MusicNet, a small, efficient CNN model that accurately detects background music without complex feature extraction, suitable for real-time communication systems.
Findings
MusicNet achieves 81.3% TPR at 0.1% FPR.
It is 10 times smaller and 4 times faster than comparable models.
Outperforms 20 state-of-the-art models in accuracy and efficiency.
Abstract
With the recent growth of remote work, online meetings often encounter challenging audio contexts such as background noise, music, and echo. Accurate real-time detection of music events can help to improve the user experience. In this paper, we present MusicNet, a compact neural model for detecting background music in the real-time communications pipeline. In video meetings, music frequently co-occurs with speech and background noises, making the accurate classification quite challenging. We propose a compact convolutional neural network core preceded by an in-model featurization layer. MusicNet takes 9 seconds of raw audio as input and does not require any model-specific featurization in the product stack. We train our model on the balanced subset of the Audio Set~\cite{gemmeke2017audio} data and validate it on 1000 crowd-sourced real test clips. Finally, we compare MusicNet…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsMusic and Audio Processing · Speech and Audio Processing · Speech Recognition and Synthesis
MethodsTest
