Robust Speech Representation Learning via Flow-based Embedding Regularization
Woo Hyun Kang, Jahangir Alam, Abderrahim Fathan

TL;DR
This paper introduces a flow-based regularization method for speech embedding learning that minimizes nuisance attribute information, improving robustness across various speech processing tasks.
Contribution
It proposes a novel training strategy integrating the information bottleneck with flow-based mutual information estimation for more robust speech embeddings.
Findings
Improved performance over standard training in multiple speech tasks.
Effective reduction of nuisance attribute information in embeddings.
Consistent gains across different experimental conditions.
Abstract
Over the recent years, various deep learning-based methods were proposed for extracting a fixed-dimensional embedding vector from speech signals. Although the deep learning-based embedding extraction methods have shown good performance in numerous tasks including speaker verification, language identification and anti-spoofing, their performance is limited when it comes to mismatched conditions due to the variability within them unrelated to the main task. In order to alleviate this problem, we propose a novel training strategy that regularizes the embedding network to have minimum information about the nuisance attributes. To achieve this, our proposed method directly incorporates the information bottleneck scheme into the training process, where the mutual information is estimated using the main task classifier and an auxiliary normalizing flow network. The proposed method was…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsSpeech Recognition and Synthesis · Speech and Audio Processing · Music and Audio Processing
