Contrastive Representation Learning for Acoustic Parameter Estimation
Philipp G\"otz, Cagdas Tuna, Andreas Walther, Emanu\"el A. P. Habets

TL;DR
This paper introduces a contrastive learning method to derive low-dimensional acoustic environment representations from reverberant speech, enabling effective parameter estimation and room classification with generalization capabilities.
Contribution
It presents a novel contrastive learning approach utilizing RIR convolution for data augmentation, improving acoustic parameter estimation and room classification from single-channel speech.
Findings
Embeddings perform well on unseen data
Comparable to fully-supervised methods
Effective across multiple downstream tasks
Abstract
A study is presented in which a contrastive learning approach is used to extract low-dimensional representations of the acoustic environment from single-channel, reverberant speech signals. Convolution of room impulse responses (RIRs) with anechoic source signals is leveraged as a data augmentation technique that offers considerable flexibility in the design of the upstream task. We evaluate the embeddings across three different downstream tasks, which include the regression of acoustic parameters reverberation time RT60 and clarity index C50, and the classification into small and large rooms. We demonstrate that the learned representations generalize well to unseen data and perform similarly to a fully-supervised baseline.
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsSpeech and Audio Processing · Music and Audio Processing · Hearing Loss and Rehabilitation
MethodsConvolution · Contrastive Learning
