Cross-Referencing Self-Training Network for Sound Event Detection in   Audio Mixtures

Sangwook Park; David K. Han; Mounya Elhilali

arXiv:2105.13392·cs.SD·December 31, 2024

Cross-Referencing Self-Training Network for Sound Event Detection in Audio Mixtures

Sangwook Park, David K. Han, Mounya Elhilali

PDF

Open Access 1 Repo

TL;DR

This paper introduces a semi-supervised self-training network with cross-referencing for sound event detection, reducing the need for extensive labeled data and improving detection accuracy in audio mixtures.

Contribution

It proposes a novel student-teacher semi-supervised framework with cross-training and post-processing for enhanced sound event detection performance.

Findings

01

Significant improvement over state-of-the-art semi-supervised methods.

02

Effective pseudo-label generation from unlabeled data.

03

Enhanced detection accuracy on DCASE2020 challenge dataset.

Abstract

Sound event detection is an important facet of audio tagging that aims to identify sounds of interest and define both the sound category and time boundaries for each sound event in a continuous recording. With advances in deep neural networks, there has been tremendous improvement in the performance of sound event detection systems, although at the expense of costly data collection and labeling efforts. In fact, current state-of-the-art methods employ supervised training methods that leverage large amounts of data samples and corresponding labels in order to facilitate identification of sound category and time stamps of events. As an alternative, the current study proposes a semi-supervised method for generating pseudo-labels from unsupervised data using a student-teacher scheme that balances self-training and cross-training. Additionally, this paper explores post-processing which…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

JHU-LCAP/CRSTmodel
pytorchOfficial

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsMusic and Audio Processing · Diverse Musicological Studies · Speech and Audio Processing