Exploring Text-Queried Sound Event Detection with Audio Source   Separation

Han Yin; Jisheng Bai; Yang Xiao; Hui Wang; Siqi Zheng; Yafeng Chen,; Rohan Kumar Das; Chong Deng; Jianfeng Chen

arXiv:2409.13292·eess.AS·January 13, 2025

Exploring Text-Queried Sound Event Detection with Audio Source Separation

Han Yin, Jisheng Bai, Yang Xiao, Hui Wang, Siqi Zheng, Yafeng Chen,, Rohan Kumar Das, Chong Deng, Jianfeng Chen

PDF

Open Access 1 Repo

TL;DR

This paper introduces a text-queried sound event detection framework that leverages a pre-trained language-queried source separation model, enhanced with a dual-path RNN, to improve detection accuracy in overlapping sound scenarios.

Contribution

It proposes a novel TQ-SED framework combining language-queried source separation with a dual-path RNN, achieving state-of-the-art results in language-queried audio source separation.

Findings

01

TQ-SED improves F1 score by 7.22% over conventional methods.

02

AudioSep-DP achieves first place in DCASE 2024 Task 9.

03

Enhanced model complexity impacts separation performance.

Abstract

In sound event detection (SED), overlapping sound events pose a significant challenge, as certain events can be easily masked by background noise or other events, resulting in poor detection performance. To address this issue, we propose the text-queried SED (TQ-SED) framework. Specifically, we first pre-train a language-queried audio source separation (LASS) model to separate the audio tracks corresponding to different events from the input audio. Then, multiple target SED branches are employed to detect individual events. AudioSep is a state-of-the-art LASS model, but has limitations in extracting dynamic audio information because of its pure convolutional structure for separation. To address this, we integrate a dual-path recurrent neural network block into the model. We refer to this structure as AudioSep-DP, which achieves the first place in DCASE 2024 Task 9 on language-queried…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

apple-yinhan/tq-sed
pytorchOfficial

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsMusic and Audio Processing · Speech Recognition and Synthesis · Advanced Text Analysis Techniques