Non-Negative Matrix Factorization-Convolutional Neural Network (NMF-CNN)   For Sound Event Detection

Teck Kai Chan; Cheng Siong Chin; Ye Li

arXiv:2001.07874·cs.SD·January 23, 2020·1 cites

Non-Negative Matrix Factorization-Convolutional Neural Network (NMF-CNN) For Sound Event Detection

Teck Kai Chan, Cheng Siong Chin, Ye Li

PDF

Open Access

TL;DR

This paper introduces a novel deep learning model combining NMF and CNN for sound event detection, leveraging NMF to generate strong labels from weakly labeled data, resulting in improved performance in the DCASE challenge.

Contribution

The integration of NMF with CNN to enhance weakly labeled sound event detection is a novel approach that improves F1-score performance.

Findings

01

Higher event-based F1-score compared to baseline (30.39% vs. 23.7%)

02

Achieved 8th place among 19 teams in DCASE challenge

03

Demonstrated effectiveness of NMF-guided labeling in deep learning models

Abstract

The main scientific question of this year DCASE challenge, Task 4 - Sound Event Detection in Domestic Environments, is to investigate the types of data (strongly labeled synthetic data, weakly labeled data, unlabeled in domain data) required to achieve the best performing system. In this paper, we proposed a deep learning model that integrates Non-Negative Matrix Factorization (NMF) with Convolutional Neural Network (CNN). The key idea of such integration is to use NMF to provide an approximate strong label to the weakly labeled data. Such integration was able to achieve a higher event-based F1-score as compared to the baseline system (Evaluation Dataset: 30.39% vs. 23.7%, Validation Dataset: 31% vs. 25.8%). By comparing the validation results with other participants, the proposed system was ranked 8th among 19 teams (inclusive of the baseline system) in this year Task 4 challenge.

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsMusic and Audio Processing · Speech and Audio Processing · Speech Recognition and Synthesis