Semi-supervised Sound Event Detection with Local and Global Consistency   Regularization

Yiming Li; Xiangdong Wang; Hong Liu; Rui Tao; Long Yan; Kazushige; Ouchi

arXiv:2309.08355·eess.AS·September 18, 2023

Semi-supervised Sound Event Detection with Local and Global Consistency Regularization

Yiming Li, Xiangdong Wang, Hong Liu, Rui Tao, Long Yan, Kazushige, Ouchi

PDF

Open Access

TL;DR

This paper introduces a semi-supervised sound event detection method using local and global consistency regularization, improving feature learning on partially labeled datasets by leveraging unlabeled data more effectively.

Contribution

The work proposes a novel Local and Global Consistency (LGC) regularization scheme that enhances semi-supervised sound event detection by combining frame-level and feature-level consistency with audio CutMix.

Findings

01

LGC outperforms baseline and existing methods on DESED dataset.

02

Combining LGC with other methods yields further improvements.

03

LGC effectively leverages unlabeled data for better feature representation.

Abstract

Learning meaningful frame-wise features on a partially labeled dataset is crucial to semi-supervised sound event detection. Prior works either maintain consistency on frame-level predictions or seek feature-level similarity among neighboring frames, which cannot exploit the potential of unlabeled data. In this work, we design a Local and Global Consistency (LGC) regularization scheme to enhance the model on both label- and feature-level. The audio CutMix is introduced to change the contextual information of clips. Then, the local consistency is adopted to encourage the model to leverage local features for frame-level predictions, and the global consistency is applied to force features to align with global prototypes through a specially designed contrastive loss. Experiments on the DESED dataset indicate the superiority of LGC, surpassing its respective competitors largely with the same…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsMusic and Audio Processing · Speech and Audio Processing · Music Technology and Sound Studies

MethodsCutMix · ALIGN