Training Universal Vocoders with Feature Smoothing-Based Augmentation Methods for High-Quality TTS Systems
Jeongmin Liu, Eunwoo Song

TL;DR
This paper introduces a feature smoothing-based augmentation method for training universal vocoders, improving their generalization and naturalness in high-quality TTS systems without architectural changes.
Contribution
The proposed augmentation technique enhances universal vocoder training by applying random linear smoothing to acoustic features, reducing mismatch and improving synthetic speech quality.
Findings
Achieved approximately 12% improvement in mean opinion scores with Tacotron 2.
Achieved approximately 12% improvement in mean opinion scores with FastSpeech 2.
Method is applicable to any vocoder without architectural modifications.
Abstract
While universal vocoders have achieved proficient waveform generation across diverse voices, their integration into text-to-speech (TTS) tasks often results in degraded synthetic quality. To address this challenge, we present a novel augmentation technique for training universal vocoders. Our training scheme randomly applies linear smoothing filters to input acoustic features, facilitating vocoder generalization across a wide range of smoothings. It significantly mitigates the training-inference mismatch, enhancing the naturalness of synthetic output even when the acoustic model produces overly smoothed features. Notably, our method is applicable to any vocoder without requiring architectural modifications or dependencies on specific acoustic models. The experimental results validate the superiority of our vocoder over conventional methods, achieving 11.99% and 12.05% improvements in…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsAdvanced Data Compression Techniques · Advanced Algorithms and Applications
MethodsAttention Is All You Need · Sigmoid Activation · Dilated Causal Convolution · Softmax · Long Short-Term Memory · Layer Normalization · Zoneout · WaveNet · Position-Wise Feed-Forward Layer · Highway Layer
