Semi-supervised Sound Event Detection with Local and Global Consistency Regularization
Yiming Li, Xiangdong Wang, Hong Liu, Rui Tao, Long Yan, Kazushige, Ouchi

TL;DR
This paper introduces a semi-supervised sound event detection method using local and global consistency regularization, improving feature learning on partially labeled datasets by leveraging unlabeled data more effectively.
Contribution
The work proposes a novel Local and Global Consistency (LGC) regularization scheme that enhances semi-supervised sound event detection by combining frame-level and feature-level consistency with audio CutMix.
Findings
LGC outperforms baseline and existing methods on DESED dataset.
Combining LGC with other methods yields further improvements.
LGC effectively leverages unlabeled data for better feature representation.
Abstract
Learning meaningful frame-wise features on a partially labeled dataset is crucial to semi-supervised sound event detection. Prior works either maintain consistency on frame-level predictions or seek feature-level similarity among neighboring frames, which cannot exploit the potential of unlabeled data. In this work, we design a Local and Global Consistency (LGC) regularization scheme to enhance the model on both label- and feature-level. The audio CutMix is introduced to change the contextual information of clips. Then, the local consistency is adopted to encourage the model to leverage local features for frame-level predictions, and the global consistency is applied to force features to align with global prototypes through a specially designed contrastive loss. Experiments on the DESED dataset indicate the superiority of LGC, surpassing its respective competitors largely with the same…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsMusic and Audio Processing · Speech and Audio Processing · Music Technology and Sound Studies
MethodsCutMix · ALIGN
