Robust Speech Representation Learning via Flow-based Embedding   Regularization

Woo Hyun Kang; Jahangir Alam; Abderrahim Fathan

arXiv:2112.03454·eess.AS·December 8, 2021·1 cites

Robust Speech Representation Learning via Flow-based Embedding Regularization

Woo Hyun Kang, Jahangir Alam, Abderrahim Fathan

PDF

Open Access

TL;DR

This paper introduces a flow-based regularization method for speech embedding learning that minimizes nuisance attribute information, improving robustness across various speech processing tasks.

Contribution

It proposes a novel training strategy integrating the information bottleneck with flow-based mutual information estimation for more robust speech embeddings.

Findings

01

Improved performance over standard training in multiple speech tasks.

02

Effective reduction of nuisance attribute information in embeddings.

03

Consistent gains across different experimental conditions.

Abstract

Over the recent years, various deep learning-based methods were proposed for extracting a fixed-dimensional embedding vector from speech signals. Although the deep learning-based embedding extraction methods have shown good performance in numerous tasks including speaker verification, language identification and anti-spoofing, their performance is limited when it comes to mismatched conditions due to the variability within them unrelated to the main task. In order to alleviate this problem, we propose a novel training strategy that regularizes the embedding network to have minimum information about the nuisance attributes. To achieve this, our proposed method directly incorporates the information bottleneck scheme into the training process, where the mutual information is estimated using the main task classifier and an auxiliary normalizing flow network. The proposed method was…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsSpeech Recognition and Synthesis · Speech and Audio Processing · Music and Audio Processing