Detecting Sound Events Using Convolutional Macaron Net With Pseudo Strong Labels
Teck Kai Chan, Cheng Siong Chin

TL;DR
This paper introduces a semi-supervised approach for sound event detection using pseudo labels generated by Nonnegative Matrix Factorization and a novel Convolutional Macaron Net architecture trained with dual models and curriculum consistency, achieving state-of-the-art results.
Contribution
The paper presents a new semi-supervised framework combining pseudo labeling and a dual-model training scheme with curriculum consistency for sound event detection.
Findings
Outperforms DCASE 2020 baseline by over 10%
Achieves higher accuracy than DCASE 2019 top submission by 1.8%
Maintains robustness on unseen YouTube data
Abstract
In this paper, we propose addressing the lack of strongly labeled data by using pseudo strongly labeled data approximated using Convolutive Nonnegative Matrix Factorization. Using this set of data, we then train a novel architecture called the Convolutional Macaron Net (CMN), which combines Convolutional Neural Network (CNN) with MN, in a semi-supervised manner. Instead of training only a single model or using the Mean-teacher approach, we train two different CMNs synchronously using a curriculum consistency cost and a curriculum interpolated consistency cost. In the inference stage, one of the models will provide the frame-level prediction while the other model will provide the clip-level prediction. Our system outperforms the baseline system of the Detection and Classification of Acoustic Scenes and Events (DCASE) 2020 Challenge Task 4 by a margin of over 10% based on our proposed…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsMusic and Audio Processing · Speech and Audio Processing · Water Systems and Optimization
MethodsMulti-Head Attention · Attention Is All You Need · Linear Layer · Absolute Position Encodings · Position-Wise Feed-Forward Layer · Dense Connections · Label Smoothing · Residual Connection · Softmax · Adam
