Bayesian Subspace HMM for the Zerospeech 2020 Challenge

Bolaji Yusuf; Lucas Ondel

arXiv:2005.09282·eess.AS·July 28, 2020

Bayesian Subspace HMM for the Zerospeech 2020 Challenge

Bolaji Yusuf, Lucas Ondel

PDF

Open Access

TL;DR

This paper presents a Bayesian Subspace Hidden Markov Model approach for unsupervised speech unit discovery, improving speech synthesis quality and reducing bitrate in the Zerospeech 2020 challenge.

Contribution

It introduces the Bayesian SHMM for discovering speech units in an unsupervised manner, effectively modeling phonetic variability with low-dimensional parameter constraints.

Findings

01

Outperforms baseline in character error rate

02

Achieves lower unit bitrate

03

Maintains high synthesis quality

Abstract

In this paper we describe our submission to the Zerospeech 2020 challenge, where the participants are required to discover latent representations from unannotated speech, and to use those representations to perform speech synthesis, with synthesis quality used as a proxy metric for the unit quality. In our system, we use the Bayesian Subspace Hidden Markov Model (SHMM) for unit discovery. The SHMM models each unit as an HMM whose parameters are constrained to lie in a low dimensional subspace of the total parameter space which is trained to model phonetic variability. Our system compares favorably with the baseline on the human-evaluated character error rate while maintaining significantly lower unit bitrate.

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsSpeech Recognition and Synthesis · Music and Audio Processing · Speech and Audio Processing