TL;DR
StoRIR is a novel stochastic method for generating room impulse responses that simplifies data augmentation for audio tasks by avoiding complex geometric modeling, leading to improved speech enhancement results.
Contribution
This paper introduces StoRIR, a geometry-independent, easy-to-implement stochastic approach for generating room impulse responses for audio data augmentation.
Findings
StoRIR improves speech enhancement metrics by over 5%.
It outperforms traditional geometrical methods in data augmentation.
The method is simple, flexible, and effective for complex enclosures.
Abstract
In this paper we introduce StoRIR - a stochastic room impulse response generation method dedicated to audio data augmentation in machine learning applications. This technique, in contrary to geometrical methods like image-source or ray tracing, does not require prior definition of room geometry, absorption coefficients or microphone and source placement and is dependent solely on the acoustic parameters of the room. The method is intuitive, easy to implement and allows to generate RIRs of very complicated enclosures. We show that StoRIR, when used for audio data augmentation in a speech enhancement task, allows deep learning models to achieve better results on a wide range of metrics than when using the conventional image-source method, effectively improving many of them by more than 5 %. We publish a Python implementation of StoRIR online
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
