Speaker Generation

Daisy Stanton; Matt Shannon; Soroosh Mariooryad; RJ Skerry-Ryan; Eric; Battenberg; Tom Bagby; David Kao

arXiv:2111.05095·cs.SD·November 10, 2021·1 cites

Speaker Generation

Daisy Stanton, Matt Shannon, Soroosh Mariooryad, RJ Skerry-Ryan, Eric, Battenberg, Tom Bagby, David Kao

PDF

Open Access

TL;DR

This paper introduces TacoSpawn, a recurrent attention-based text-to-speech system capable of generating diverse, human-like voices for nonexistent speakers, with evaluation metrics correlating well with human perception.

Contribution

It presents TacoSpawn, a novel speaker generation model that learns a speaker embedding distribution without transfer learning, enabling diverse voice synthesis.

Findings

01

TacoSpawn performs competitively on speaker generation tasks.

02

Objective metrics correlate with human perception.

03

The system is easy to implement without transfer learning.

Abstract

This work explores the task of synthesizing speech in nonexistent human-sounding voices. We call this task "speaker generation", and present TacoSpawn, a system that performs competitively at this task. TacoSpawn is a recurrent attention-based text-to-speech model that learns a distribution over a speaker embedding space, which enables sampling of novel and diverse speakers. Our method is easy to implement, and does not require transfer learning from speaker ID systems. We present objective and subjective metrics for evaluating performance on this task, and demonstrate that our proposed objective metrics correlate with human perception of speaker similarity. Audio samples are available on our demo page.

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsSpeech Recognition and Synthesis · Speech and dialogue systems · Music and Audio Processing