Semi-Supervised NMF-CNN For Sound Event Detection
Chan Teck Kai, Chin Cheng Siong, and Li Ye

TL;DR
This paper introduces a semi-supervised approach combining NMF and CNNs for sound event detection, achieving significant improvements over baseline models on DCASE datasets.
Contribution
It proposes a novel semi-supervised framework using NMF to generate strong labels for CNN training in sound event detection.
Findings
Achieved an event-based F1-score of 45.7% on validation dataset.
Ensembling increased F1-score to 48.6%.
Outperformed baseline models by over 8%.
Abstract
In this paper, a combinative approach using Nonnegative Matrix Factorization (NMF) and Convolutional Neural Network (CNN) is proposed for audio clip Sound Event Detection (SED). The main idea begins with the use of NMF to approximate strong labels for the weakly labeled data. Subsequently, using the approximated strongly labeled data, two different CNNs are trained in a semi-supervised framework where one CNN is used for clip-level prediction and the other for frame-level prediction. Based on this idea, our model can achieve an event-based F1-score of 45.7% on the Detection and Classification of Acoustic Scenes and Events (DCASE) 2020 Challenge Task 4 validation dataset. By ensembling models through averaging the posterior outputs, event-based F1-score can be increased to 48.6%. By comparing with the baseline model, our proposed models outperform the baseline model by over 8%. By…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsMusic and Audio Processing · Speech and Audio Processing · Music Technology and Sound Studies
