SNIPER Training: Single-Shot Sparse Training for Text-to-Speech
Perry Lam, Huayun Zhang, Nancy F. Chen, Berrak Sisman, Dorien, Herremans

TL;DR
SNIPER training introduces a decaying sparsity approach for TTS models, accelerating training and achieving better performance with less computational cost compared to traditional methods.
Contribution
The paper proposes SNIPER training, a novel decaying sparsity method for TTS models that improves training efficiency and final performance over constant-sparsity and dense models.
Findings
SNIPER training accelerates early training loss reduction.
SNIPER models outperform constant-sparsity and dense models.
Training time remains comparable to dense models.
Abstract
Text-to-speech (TTS) models have achieved remarkable naturalness in recent years, yet like most deep neural models, they have more parameters than necessary. Sparse TTS models can improve on dense models via pruning and extra retraining, or converge faster than dense models with some performance loss. Thus, we propose training TTS models using decaying sparsity, i.e. a high initial sparsity to accelerate training first, followed by a progressive rate reduction to obtain better eventual performance. This decremental approach differs from current methods of incrementing sparsity to a desired target, which costs significantly more time than dense training. We call our method SNIPER training: Single-shot Initialization Pruning Evolving-Rate training. Our experiments on FastSpeech2 show that we were able to obtain better losses in the first few training epochs with SNIPER, and that the final…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsSpeech Recognition and Synthesis · Topic Modeling · Natural Language Processing Techniques
MethodsPruning · SNIPER
