HyperTTS: Parameter Efficient Adaptation in Text to Speech using   Hypernetworks

Yingting Li; Rishabh Bhardwaj; Ambuj Mehrish; Bo Cheng; Soujanya Poria

arXiv:2404.04645·cs.CL·April 9, 2024·2 cites

HyperTTS: Parameter Efficient Adaptation in Text to Speech using Hypernetworks

Yingting Li, Rishabh Bhardwaj, Ambuj Mehrish, Bo Cheng, Soujanya Poria

PDF

Open Access 1 Repo

TL;DR

HyperTTS introduces a hypernetwork-based approach to dynamically generate adapter parameters for text-to-speech, enabling efficient and effective speaker adaptation without full model fine-tuning.

Contribution

This work presents HyperTTS, a novel hypernetwork framework that conditions adapter parameters on speaker representations for parameter-efficient multi-speaker TTS adaptation.

Findings

01

Achieves state-of-the-art performance in parameter-efficient speaker adaptation.

02

Demonstrates effectiveness of hypernetwork-generated adapter parameters in TTS.

03

Enables dynamic, speaker-specific adaptation without full model retraining.

Abstract

Neural speech synthesis, or text-to-speech (TTS), aims to transform a signal from the text domain to the speech domain. While developing TTS architectures that train and test on the same set of speakers has seen significant improvements, out-of-domain speaker performance still faces enormous limitations. Domain adaptation on a new set of speakers can be achieved by fine-tuning the whole model for each new domain, thus making it parameter-inefficient. This problem can be solved by Adapters that provide a parameter-efficient alternative to domain adaptation. Although famous in NLP, speech synthesis has not seen much improvement from Adapters. In this work, we present HyperTTS, which comprises a small learnable network, "hypernetwork", that generates parameters of the Adapter blocks, allowing us to condition Adapters on speaker representations and making them dynamic. Extensive evaluations…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

declare-lab/hypertts
pytorchOfficial

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsSpeech and Audio Processing · Speech Recognition and Synthesis

MethodsSparse Evolutionary Training · Adapter