Blind Acoustic Parameter Estimation Through Task-Agnostic Embeddings   Using Latent Approximations

Philipp G\"otz; Cagdas Tuna; Andreas Brendel; Andreas Walther and; Emanu\"el A. P. Habets

arXiv:2407.19989·eess.AS·July 30, 2024·IWAENC

Blind Acoustic Parameter Estimation Through Task-Agnostic Embeddings Using Latent Approximations

Philipp G\"otz, Cagdas Tuna, Andreas Brendel, Andreas Walther and, Emanu\"el A. P. Habets

PDF

Open Access

TL;DR

This paper introduces a three-stage method using latent representations and auto-encoders for blind acoustic parameter estimation from reverberant speech, outperforming baseline models.

Contribution

It proposes a novel approach combining variational auto-encoders and task-agnostic speech embeddings for acoustic parameter estimation.

Findings

01

Outperforms end-to-end baseline models.

02

Effective in estimating acoustic parameters from reverberant speech.

03

Uses latent representations to improve estimation accuracy.

Abstract

We present a method for blind acoustic parameter estimation from single-channel reverberant speech. The method is structured into three stages. In the first stage, a variational auto-encoder is trained to extract latent representations of acoustic impulse responses represented as mel-spectrograms. In the second stage, a separate speech encoder is trained to estimate low-dimensional representations from short segments of reverberant speech. Finally, the pre-trained speech encoder is combined with a small regression model and evaluated on two parameter regression tasks. Experimentally, the proposed method is shown to outperform a fully end-to-end trained baseline model.

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsSpeech Recognition and Synthesis · Speech and Audio Processing · Music and Audio Processing