Mod\`ele physique variationnel pour l'estimation de r\'eponses impulsionnelles de salles

Louis Lalay (LTCI; IP Paris; S2A); Mathieu Fontaine (LTCI; IP Paris; S2A); Roland Badeau (S2A; LTCI; IP Paris)

arXiv:2507.08051·cs.SD·July 14, 2025

Mod\`ele physique variationnel pour l'estimation de r\'eponses impulsionnelles de salles

Louis Lalay (LTCI, IP Paris, S2A), Mathieu Fontaine (LTCI, IP Paris, S2A), Roland Badeau (S2A, LTCI, IP Paris)

PDF

Open Access

TL;DR

This paper introduces a physically grounded variational model for room impulse response estimation, combining statistical and physical insights to improve speech dereverberation, especially in noisy conditions.

Contribution

It presents a novel variational approach that integrates physical room acoustics modeling with statistical estimation for RIR, outperforming classical methods.

Findings

01

Outperforms classical deconvolution in noisy environments

02

Provides interpretable parameters for room acoustics modeling

03

Validated with objective metrics on speech signals

Abstract

Room impulse response estimation is essential for tasks like speech dereverberation, which improves automatic speech recognition. Most existing methods rely on either statistical signal processing or deep neural networks designed to replicate signal processing principles. However, combining statistical and physical modeling for RIR estimation remains largely unexplored. This paper proposes a novel approach integrating both aspects through a theoretically grounded model. The RIR is decomposed into interpretable parameters: white Gaussian noise filtered by a frequency-dependent exponential decay (e.g. modeling wall absorption) and an autoregressive filter (e.g. modeling microphone response). A variational free-energy cost function enables practical parameter estimation. As a proof of concept, we show that given dry and reverberant speech signals, the proposed method outperforms classical…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsSpeech and Audio Processing · Hearing Loss and Rehabilitation · Speech Recognition and Synthesis