A Multi-grained based Attention Network for Semi-supervised Sound Event Detection
Ying Hu, Xiujuan Zhu, Yunlong Li, Hao Huang, and Liang He

TL;DR
This paper introduces MGA-Net, a semi-supervised sound event detection model that uses multi-grained attention and hybrid convolution to improve detection accuracy, especially with limited data.
Contribution
The paper proposes MGA-Net with a residual hybrid convolution block and a multi-grained attention module, enhancing feature extraction and temporal resolution for better sound event detection.
Findings
Outperforms state-of-the-art methods on benchmark datasets.
Achieves 53.27% and 56.96% macro F1 scores on validation and test sets.
Demonstrates effectiveness of the spatial shift data augmentation.
Abstract
Sound event detection (SED) is an interesting but challenging task due to the scarcity of data and diverse sound events in real life. This paper presents a multi-grained based attention network (MGA-Net) for semi-supervised sound event detection. To obtain the feature representations related to sound events, a residual hybrid convolution (RH-Conv) block is designed to boost the vanilla convolution's ability to extract the time-frequency features. Moreover, a multi-grained attention (MGA) module is designed to learn temporal resolution features from coarse-level to fine-level. With the MGA module,the network could capture the characteristics of target events with short- or long-duration, resulting in more accurately determining the onset and offset of sound events. Furthermore, to effectively boost the performance of the Mean Teacher (MT) method, a spatial shift (SS) module as a data…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
