Spatial Data Augmentation with Simulated Room Impulse Responses for   Sound Event Localization and Detection

Yuichiro Koyama; Kazuhide Shigemi; Masafumi Takahashi; Kazuki Shimada,; Naoya Takahashi; Emiru Tsunoo; Shusuke Takahashi; Yuki Mitsufuji

arXiv:2110.06501·cs.SD·April 29, 2022

Spatial Data Augmentation with Simulated Room Impulse Responses for Sound Event Localization and Detection

Yuichiro Koyama, Kazuhide Shigemi, Masafumi Takahashi, Kazuki Shimada,, Naoya Takahashi, Emiru Tsunoo, Shusuke Takahashi, Yuki Mitsufuji

PDF

Open Access

TL;DR

This paper introduces a simulation-based data augmentation method using room impulse responses to improve sound event localization and detection, especially when real data is scarce.

Contribution

The paper presents a novel impulse response simulation framework (IRS) that enhances spatial data for SELD tasks by accurately simulating room acoustics and augmenting training data.

Findings

01

IRS improves SELD performance on the TAU-NIGENS dataset.

02

Simulated RIRs effectively augment spatial characteristics in training data.

03

Ablation study highlights the importance of each IRS component.

Abstract

Recording and annotating real sound events for a sound event localization and detection (SELD) task is time consuming, and data augmentation techniques are often favored when the amount of data is limited. However, how to augment the spatial information in a dataset, including unlabeled directional interference events, remains an open research question. Furthermore, directional interference events make it difficult to accurately extract spatial characteristics from target sound events. To address this problem, we propose an impulse response simulation framework (IRS) that augments spatial characteristics using simulated room impulse responses (RIR). RIRs corresponding to a microphone array assumed to be placed in various rooms are accurately simulated, and the source signals of the target sound events are extracted from a mixture. The simulated RIRs are then convolved with the extracted…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsSpeech and Audio Processing · Music and Audio Processing · Hearing Loss and Rehabilitation