Bayesian Subspace HMM for the Zerospeech 2020 Challenge
Bolaji Yusuf, Lucas Ondel

TL;DR
This paper presents a Bayesian Subspace Hidden Markov Model approach for unsupervised speech unit discovery, improving speech synthesis quality and reducing bitrate in the Zerospeech 2020 challenge.
Contribution
It introduces the Bayesian SHMM for discovering speech units in an unsupervised manner, effectively modeling phonetic variability with low-dimensional parameter constraints.
Findings
Outperforms baseline in character error rate
Achieves lower unit bitrate
Maintains high synthesis quality
Abstract
In this paper we describe our submission to the Zerospeech 2020 challenge, where the participants are required to discover latent representations from unannotated speech, and to use those representations to perform speech synthesis, with synthesis quality used as a proxy metric for the unit quality. In our system, we use the Bayesian Subspace Hidden Markov Model (SHMM) for unit discovery. The SHMM models each unit as an HMM whose parameters are constrained to lie in a low dimensional subspace of the total parameter space which is trained to model phonetic variability. Our system compares favorably with the baseline on the human-evaluated character error rate while maintaining significantly lower unit bitrate.
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsSpeech Recognition and Synthesis · Music and Audio Processing · Speech and Audio Processing
