Text-Queried Target Sound Event Localization
Jinzheng Zhao, Xinyuan Qian, Yong Xu, Haohe Liu, Yin Cao, Davide, Berghi, Wenwu Wang

TL;DR
This paper introduces a novel text-queried sound event localization method that enables users to specify sound events via text, allowing flexible detection and localization beyond predefined classes, validated through experiments on simulated and real data.
Contribution
It proposes a new paradigm for sound source localization driven by text queries, expanding the capabilities of existing SELD systems and providing a benchmark for future research.
Findings
Effective localization of sound events based on text queries.
Validated on simulated and real RIR datasets.
Benchmark dataset provided for future research.
Abstract
Sound event localization and detection (SELD) aims to determine the appearance of sound classes, together with their Direction of Arrival (DOA). However, current SELD systems can only predict the activities of specific classes, for example, 13 classes in DCASE challenges. In this paper, we propose text-queried target sound event localization (SEL), a new paradigm that allows the user to input the text to describe the sound event, and the SEL model can predict the location of the related sound event. The proposed task presents a more user-friendly way for human-computer interaction. We provide a benchmark study for the proposed task and perform experiments on datasets created by simulated room impulse response (RIR) and real RIR to validate the effectiveness of the proposed methods. We hope that our benchmark will inspire the interest and additional research for text-queried sound source…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsMusic and Audio Processing · Speech Recognition and Synthesis · Music Technology and Sound Studies
