Pruning Self-Attention for Zero-Shot Multi-Speaker Text-to-Speech

Hyungchan Yoon; Changhwan Kim; Eunwoo Song; Hyun-Wook Yoon; Hong-Goo; Kang

arXiv:2308.14909·cs.SD·August 30, 2023

Pruning Self-Attention for Zero-Shot Multi-Speaker Text-to-Speech

Hyungchan Yoon, Changhwan Kim, Eunwoo Song, Hyun-Wook Yoon, Hong-Goo, Kang

PDF

TL;DR

This paper introduces a differentiable pruning method for transformer-based TTS models, enhancing their ability to generalize to unseen speakers in zero-shot multi-speaker speech synthesis.

Contribution

It proposes a novel differentiable sparse attention pruning technique that automatically learns optimal thresholds to improve out-of-domain generalization in TTS.

Findings

01

Improved voice quality in zero-shot multi-speaker TTS.

02

Enhanced speaker similarity with the proposed pruning method.

03

Effective reduction of redundant self-attention connections.

Abstract

For personalized speech generation, a neural text-to-speech (TTS) model must be successfully implemented with limited data from a target speaker. To this end, the baseline TTS model needs to be amply generalized to out-of-domain data (i.e., target speaker's speech). However, approaches to address this out-of-domain generalization problem in TTS have yet to be thoroughly studied. In this work, we propose an effective pruning method for a transformer known as sparse attention, to improve the TTS model's generalization abilities. In particular, we prune off redundant connections from self-attention layers whose attention weights are below the threshold. To flexibly determine the pruning strength for searching optimal degree of generalization, we also propose a new differentiable pruning method that allows the model to automatically learn the thresholds. Evaluations on zero-shot…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

MethodsPruning