Spatial Data Augmentation with Simulated Room Impulse Responses for Sound Event Localization and Detection
Yuichiro Koyama, Kazuhide Shigemi, Masafumi Takahashi, Kazuki Shimada,, Naoya Takahashi, Emiru Tsunoo, Shusuke Takahashi, Yuki Mitsufuji

TL;DR
This paper introduces a simulation-based data augmentation method using room impulse responses to improve sound event localization and detection, especially when real data is scarce.
Contribution
The paper presents a novel impulse response simulation framework (IRS) that enhances spatial data for SELD tasks by accurately simulating room acoustics and augmenting training data.
Findings
IRS improves SELD performance on the TAU-NIGENS dataset.
Simulated RIRs effectively augment spatial characteristics in training data.
Ablation study highlights the importance of each IRS component.
Abstract
Recording and annotating real sound events for a sound event localization and detection (SELD) task is time consuming, and data augmentation techniques are often favored when the amount of data is limited. However, how to augment the spatial information in a dataset, including unlabeled directional interference events, remains an open research question. Furthermore, directional interference events make it difficult to accurately extract spatial characteristics from target sound events. To address this problem, we propose an impulse response simulation framework (IRS) that augments spatial characteristics using simulated room impulse responses (RIR). RIRs corresponding to a microphone array assumed to be placed in various rooms are accurately simulated, and the source signals of the target sound events are extracted from a mixture. The simulated RIRs are then convolved with the extracted…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsSpeech and Audio Processing · Music and Audio Processing · Hearing Loss and Rehabilitation
