Leveraging LLM and Text-Queried Separation for Noise-Robust Sound Event   Detection

Han Yin; Yang Xiao; Jisheng Bai; Rohan Kumar Das

arXiv:2411.01174·eess.AS·January 14, 2025

Leveraging LLM and Text-Queried Separation for Noise-Robust Sound Event Detection

Han Yin, Yang Xiao, Jisheng Bai, Rohan Kumar Das

PDF

Open Access 1 Repo

TL;DR

This paper introduces a novel approach combining large language models with text-queried separation to improve sound event detection in noisy environments, addressing challenges of overlapping sounds and unknown target events.

Contribution

It leverages LLMs for noise analysis and augmentation, enhancing the robustness of sound event detection models in noisy conditions.

Findings

01

Improved SED performance in noisy environments.

02

Effective noise augmentation using LLMs.

03

First application of LLMs in noise-robust SED.

Abstract

Sound Event Detection (SED) is challenging in noisy environments where overlapping sounds obscure target events. Language-queried audio source separation (LASS) aims to isolate the target sound events from a noisy clip. However, this approach can fail when the exact target sound is unknown, particularly in noisy test sets, leading to reduced performance. To address this issue, we leverage the capabilities of large language models (LLMs) to analyze and summarize acoustic data. By using LLMs to identify and select specific noise types, we implement a noise augmentation method for noise-robust fine-tuning. The fine-tuned model is applied to predict clip-wise event predictions as text queries for the LASS model. Our studies demonstrate that the proposed method improves SED performance in noisy environments. This work represents an early application of LLMs in noise-robust SED and suggests a…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

apple-yinhan/noise-robust-sed
pytorchOfficial

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsMusic and Audio Processing · Speech Recognition and Synthesis · Music Technology and Sound Studies