Matching Reverberant Speech Through Learned Acoustic Embeddings and Feedback Delay Networks
Philipp G\"otz, Gloria Dal Santo, Sebastian J. Schlecht, Vesa V\"alim\"aki, Emanu\"el A.P. Habets

TL;DR
This paper introduces a method for real-time, perceptually plausible reverberation generation in auditory augmented reality by using learned acoustic embeddings and a feedback delay network to match reverberant speech characteristics.
Contribution
It presents a novel approach combining learned acoustic priors with a feedback delay network for blind reverberation parameter estimation and reproduction.
Findings
Improved estimation of room-acoustic parameters.
Enhanced perceptual plausibility of artificial reverberation.
Demonstrated effectiveness against existing automatic tuning methods.
Abstract
Reverberation conveys critical acoustic cues about the environment, supporting spatial awareness and immersion. For auditory augmented reality (AAR) systems, generating perceptually plausible reverberation in real time remains a key challenge, especially when explicit acoustic measurements are unavailable. We address this by formulating blind estimation of artificial reverberation parameters as a reverberant signal matching task, leveraging a learned room-acoustic prior. Furthermore, we propose a feedback delay network (FDN) structure that reproduces both frequency-dependent decay times and the direct-to-reverberation ratio of a target space. Experimental evaluation against a leading automatic FDN tuning method demonstrates improvements in estimated room-acoustic parameters and perceptual plausibility of artificial reverberant speech. These results highlight the potential of our…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
