Simple and Effective Unsupervised Speech Synthesis

Alexander H. Liu; Cheng-I Jeff Lai; Wei-Ning Hsu; Michael Auli; Alexei; Baevski; James Glass

arXiv:2204.02524·cs.SD·April 21, 2022·1 cites

Simple and Effective Unsupervised Speech Synthesis

Alexander H. Liu, Cheng-I Jeff Lai, Wei-Ning Hsu, Michael Auli, Alexei, Baevski, James Glass

PDF

Open Access

TL;DR

This paper presents the first unsupervised speech synthesis system that generates natural and intelligible speech using only unlabeled audio, text, and a lexicon, eliminating the need for labeled datasets.

Contribution

It introduces a novel unsupervised speech synthesis framework combining recent speech recognition and neural synthesis techniques, advancing the field without labeled data.

Findings

01

Synthesizes speech comparable to supervised systems in naturalness

02

Achieves high intelligibility in synthesized speech

03

Operates effectively with only unlabeled data

Abstract

We introduce the first unsupervised speech synthesis system based on a simple, yet effective recipe. The framework leverages recent work in unsupervised speech recognition as well as existing neural-based speech synthesis. Using only unlabeled speech audio and unlabeled text as well as a lexicon, our method enables speech synthesis without the need for a human-labeled corpus. Experiments demonstrate the unsupervised system can synthesize speech similar to a supervised counterpart in terms of naturalness and intelligibility measured by human evaluation.

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsSpeech Recognition and Synthesis · Topic Modeling · Speech and dialogue systems