A Hybrid System of Sound Event Detection Transformer and Frame-wise Model for DCASE 2022 Task 4
Yiming Li, Zhifang Guo, Zhirong Ye, Xiangdong Wang, Hong Liu, Yueliang, Qian, Rui Tao, Long Yan, Kazushige Ouchi

TL;DR
This paper presents a hybrid sound event detection system combining a transformer-based model and a frame-wise CNN, leveraging semi-supervised learning to improve detection accuracy for DCASE 2022 Task 4.
Contribution
It introduces a novel hybrid system that integrates an end-to-end transformer model with a frame-wise CNN, utilizing semi-supervised learning for enhanced sound event detection.
Findings
Hybrid system outperforms individual models.
Achieves psds1 of 0.420 and psds2 of 0.783 on validation set.
Utilizes self-supervised pre-training and semi-supervised learning.
Abstract
In this paper, we describe in detail our system for DCASE 2022 Task4. The system combines two considerably different models: an end-to-end Sound Event Detection Transformer (SEDT) and a frame-wise model, Metric Learning and Focal Loss CNN (MLFL-CNN). The former is an event-wise model which learns event-level representations and predicts sound event categories and boundaries directly, while the latter is based on the widely adopted frame-classification scheme, under which each frame is classified into event categories and event boundaries are obtained by post-processing such as thresholding and smoothing. For SEDT, self-supervised pre-training using unlabeled data is applied, and semi-supervised learning is adopted by using an online teacher, which is updated from the student model using the Exponential Moving Average (EMA) strategy and generates reliable pseudo labels for weakly-labeled…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsMusic and Audio Processing · Anomaly Detection Techniques and Applications · Machine Learning and Data Classification
MethodsMulti-Head Attention · Attention Is All You Need · Linear Layer · Dense Connections · Label Smoothing · Softmax · Adam · Byte Pair Encoding · Absolute Position Encodings · Layer Normalization
