Variational Bayesian Adaptive Learning of Deep Latent Variables for   Acoustic Knowledge Transfer

Hu Hu; Sabato Marco Siniscalchi; Chao-Han Huck Yang; Chin-Hui Lee

arXiv:2501.15496·eess.AS·January 28, 2025

Variational Bayesian Adaptive Learning of Deep Latent Variables for Acoustic Knowledge Transfer

Hu Hu, Sabato Marco Siniscalchi, Chao-Han Huck Yang, Chin-Hui Lee

PDF

Open Access

TL;DR

This paper introduces a variational Bayesian adaptive learning method for deep neural networks that improves cross-domain acoustic recognition by effectively transferring knowledge despite domain mismatches.

Contribution

It proposes a novel Bayesian approach focusing on latent variables for acoustic knowledge transfer, addressing high-dimensional parameter issues and handling different data availability scenarios.

Findings

01

Achieved significant improvements in device and noise adaptation tasks.

02

Outperformed existing state-of-the-art knowledge transfer methods.

03

Validated on acoustic scene classification and spoken command recognition.

Abstract

In this work, we propose a novel variational Bayesian adaptive learning approach for cross-domain knowledge transfer to address acoustic mismatches between training and testing conditions, such as recording devices and environmental noise. Different from the traditional Bayesian approaches that impose uncertainties on model parameters risking the curse of dimensionality due to the huge number of parameters, we focus on estimating a manageable number of latent variables in deep neural models. Knowledge learned from a source domain is thus encoded in prior distributions of deep latent variables and optimally combined, in a Bayesian sense, with a small set of adaptation data from a target domain to approximate the corresponding posterior distributions. Two different strategies are proposed and investigated to estimate the posterior distributions: Gaussian mean-field variational inference,…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsSpeech Recognition and Synthesis · Music and Audio Processing · Speech and Audio Processing

MethodsSparse Evolutionary Training · Focus