Contrastive Representation Learning for Acoustic Parameter Estimation

Philipp G\"otz; Cagdas Tuna; Andreas Walther; Emanu\"el A. P. Habets

arXiv:2302.11205·eess.AS·March 14, 2023·1 cites

Contrastive Representation Learning for Acoustic Parameter Estimation

Philipp G\"otz, Cagdas Tuna, Andreas Walther, Emanu\"el A. P. Habets

PDF

Open Access

TL;DR

This paper introduces a contrastive learning method to derive low-dimensional acoustic environment representations from reverberant speech, enabling effective parameter estimation and room classification with generalization capabilities.

Contribution

It presents a novel contrastive learning approach utilizing RIR convolution for data augmentation, improving acoustic parameter estimation and room classification from single-channel speech.

Findings

01

Embeddings perform well on unseen data

02

Comparable to fully-supervised methods

03

Effective across multiple downstream tasks

Abstract

A study is presented in which a contrastive learning approach is used to extract low-dimensional representations of the acoustic environment from single-channel, reverberant speech signals. Convolution of room impulse responses (RIRs) with anechoic source signals is leveraged as a data augmentation technique that offers considerable flexibility in the design of the upstream task. We evaluate the embeddings across three different downstream tasks, which include the regression of acoustic parameters reverberation time RT60 and clarity index C50, and the classification into small and large rooms. We demonstrate that the learned representations generalize well to unseen data and perform similarly to a fully-supervised baseline.

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsSpeech and Audio Processing · Music and Audio Processing · Hearing Loss and Rehabilitation

MethodsConvolution · Contrastive Learning