Voice Conversion Based Speaker Normalization for Acoustic Unit Discovery
Thomas Glarner, Janek Ebbers, Reinhold H\"ab-Umbach

TL;DR
This paper introduces an unsupervised, multilingual speaker normalization method using adversarial contrastive predictive coding to improve acoustic unit discovery without requiring transcriptions or speaker labels.
Contribution
It presents a novel unsupervised speaker normalization technique that enhances acoustic unit discovery across multiple languages without labeled data.
Findings
Improved acoustic unit discovery performance on English, Yoruba, and Mboshi.
Effective speaker normalization achieved even with limited target language data.
Method is compatible with various unit discovery systems.
Abstract
Discovering speaker independent acoustic units purely from spoken input is known to be a hard problem. In this work we propose an unsupervised speaker normalization technique prior to unit discovery. It is based on separating speaker related from content induced variations in a speech signal with an adversarial contrastive predictive coding approach. This technique does neither require transcribed speech nor speaker labels, and, furthermore, can be trained in a multilingual fashion, thus achieving speaker normalization even if only few unlabeled data is available from the target language. The speaker normalization is done by mapping all utterances to a medoid style which is representative for the whole database. We demonstrate the effectiveness of the approach by conducting acoustic unit discovery with a hidden Markov model variational autoencoder noting, however, that the proposed…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsSpeech Recognition and Synthesis · Music and Audio Processing · Speech and Audio Processing
MethodsInfoNCE · Contrastive Predictive Coding
