Selective Pseudo-labeling and Class-wise Discriminative Fusion for Sound Event Detection
Yunhao Liang, Yanhua Long, Yijie Li, Jiaen Liang

TL;DR
This paper introduces a novel selective pseudo-labeling method and class-wise discriminative fusion to enhance sound event detection by effectively utilizing sound separation outputs, significantly improving performance on a standard dataset.
Contribution
The study proposes a new selective pseudo-labeling approach and a class-wise discriminative fusion method to improve sound event detection accuracy using sound separation outputs.
Findings
Significant improvements in F1, PSDS1, and PSDS2 scores on DCASE 2021 dataset.
Effective use of separated sound signals for better event detection.
Outperforms the official baseline in all evaluated metrics.
Abstract
In recent years, exploring effective sound separation (SSep) techniques to improve overlapping sound event detection (SED) attracts more and more attention. Creating accurate separation signals to avoid the catastrophic error accumulation during SED model training is very important and challenging. In this study, we first propose a novel selective pseudo-labeling approach, termed SPL, to produce high confidence separated target events from blind sound separation outputs. These target events are then used to fine-tune the original SED model that pre-trained on the sound mixtures in a multi-objective learning style. Then, to further leverage the SSep outputs, a class-wise discriminative fusion is proposed to improve the final SED performances, by combining multiple frame-level event predictions of both sound mixtures and their separated signals. All experiments are performed on the public…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsSpeech and Audio Processing · Music and Audio Processing · Speech Recognition and Synthesis
