DARAS: Dynamic Audio-Room Acoustic Synthesis for Blind Room Impulse Response Estimation
Chunxi Wang, Maoshen Jia, Wenyu Jin

TL;DR
DARAS is a deep learning framework that accurately estimates room impulse responses from monaural speech signals, improving acoustic modeling for AR/VR applications through innovative feature extraction, parameter estimation, and adaptive synthesis.
Contribution
The paper introduces DARAS, a novel deep learning model combining a dedicated encoder, self-supervised parameter estimation, and adaptive acoustic tuning for blind RIR estimation.
Findings
DARAS outperforms existing models in subjective listening tests.
The system effectively captures room acoustic parameters from monaural speech.
Experimental results show improved realism in synthesized RIRs.
Abstract
Room Impulse Responses (RIRs) accurately characterize acoustic properties of indoor environments and play a crucial role in applications such as speech enhancement, speech recognition, and audio rendering in augmented reality (AR) and virtual reality (VR). Existing blind estimation methods struggle to achieve practical accuracy. To overcome this challenge, we propose the dynamic audio-room acoustic synthesis (DARAS) model, a novel deep learning framework that is explicitly designed for blind RIR estimation from monaural reverberant speech signals. First, a dedicated deep audio encoder effectively extracts relevant nonlinear latent space features. Second, the Mamba-based self-supervised blind room parameter estimation (MASS-BRPE) module, utilizing the efficient Mamba state space model (SSM), accurately estimates key room acoustic parameters and features. Third, the system incorporates a…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
