A Variational Bayesian Approach to Learning Latent Variables for   Acoustic Knowledge Transfer

Hu Hu; Sabato Marco Siniscalchi; Chao-Han Huck Yang; Chin-Hui Lee

arXiv:2110.08598·eess.AS·February 22, 2022

A Variational Bayesian Approach to Learning Latent Variables for Acoustic Knowledge Transfer

Hu Hu, Sabato Marco Siniscalchi, Chao-Han Huck Yang, Chin-Hui Lee

PDF

Open Access 1 Repo

TL;DR

This paper introduces a variational Bayesian method for learning latent variables in deep neural networks to improve cross-domain acoustic knowledge transfer, effectively addressing domain mismatches and outperforming existing algorithms.

Contribution

It presents a novel VB inference framework for estimating latent variables in DNNs, enabling effective transfer of acoustic knowledge across domains with limited adaptation data.

Findings

01

Significant improvement in device adaptation accuracy.

02

Outperforms 13 state-of-the-art transfer algorithms.

03

Effective handling of acoustic mismatches.

Abstract

We propose a variational Bayesian (VB) approach to learning distributions of latent variables in deep neural network (DNN) models for cross-domain knowledge transfer, to address acoustic mismatches between training and testing conditions. Instead of carrying out point estimation in conventional maximum a posteriori estimation with a risk of having a curse of dimensionality in estimating a huge number of model parameters, we focus our attention on estimating a manageable number of latent variables of DNNs via a VB inference framework. To accomplish model transfer, knowledge learnt from a source domain is encoded in prior distributions of latent variables and optimally combined, in a Bayesian sense, with a small set of adaptation data from a target domain to approximate the corresponding posterior distributions. Experimental results on device adaptation in acoustic scene classification…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

mihawkhu/asc_knowledge_transfer
tfOfficial

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsMusic and Audio Processing · Speech Recognition and Synthesis · Speech and Audio Processing