Noise-Robust Sound Event Detection and Counting via Language-Queried Sound Separation
Yuanjian Chen, Yang Xiao, Han Yin, Yadong Guan, and Xubo Liu

TL;DR
This paper introduces a novel multi-task learning framework combining event appearance detection and sound event detection to improve robustness in noisy environments, leveraging language-queried sound separation.
Contribution
It proposes a co-training-based multi-task framework with event counting and explicit constraints, enhancing SED performance under noisy conditions, which is a new approach in this domain.
Findings
Outperforms existing methods on DESED and WildDESED datasets.
Shows increased robustness at higher noise levels.
Provides more reliable clip-level and timestamp predictions.
Abstract
Most sound event detection (SED) systems perform well on clean datasets but degrade significantly in noisy environments. Language-queried audio source separation (LASS) models show promise for robust SED by separating target events; existing methods require elaborate multi-stage training and lack explicit guidance for target events. To address these challenges, we introduce event appearance detection (EAD), a counting-based approach that counts event occurrences at both the clip and frame levels. Based on EAD, we propose a co-training-based multi-task learning framework for EAD and SED to enhance SED's performance in noisy environments. First, SED struggles to learn the same patterns as EAD. Then, a task-based constraint is designed to improve prediction consistency between SED and EAD. This framework provides more reliable clip-level predictions for LASS models and strengthens…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsSpeech and Audio Processing · Music and Audio Processing · Speech Recognition and Synthesis
