Non-Negative Matrix Factorization-Convolutional Neural Network (NMF-CNN) For Sound Event Detection
Teck Kai Chan, Cheng Siong Chin, Ye Li

TL;DR
This paper introduces a novel deep learning model combining NMF and CNN for sound event detection, leveraging NMF to generate strong labels from weakly labeled data, resulting in improved performance in the DCASE challenge.
Contribution
The integration of NMF with CNN to enhance weakly labeled sound event detection is a novel approach that improves F1-score performance.
Findings
Higher event-based F1-score compared to baseline (30.39% vs. 23.7%)
Achieved 8th place among 19 teams in DCASE challenge
Demonstrated effectiveness of NMF-guided labeling in deep learning models
Abstract
The main scientific question of this year DCASE challenge, Task 4 - Sound Event Detection in Domestic Environments, is to investigate the types of data (strongly labeled synthetic data, weakly labeled data, unlabeled in domain data) required to achieve the best performing system. In this paper, we proposed a deep learning model that integrates Non-Negative Matrix Factorization (NMF) with Convolutional Neural Network (CNN). The key idea of such integration is to use NMF to provide an approximate strong label to the weakly labeled data. Such integration was able to achieve a higher event-based F1-score as compared to the baseline system (Evaluation Dataset: 30.39% vs. 23.7%, Validation Dataset: 31% vs. 25.8%). By comparing the validation results with other participants, the proposed system was ranked 8th among 19 teams (inclusive of the baseline system) in this year Task 4 challenge.
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsMusic and Audio Processing · Speech and Audio Processing · Speech Recognition and Synthesis
