Training Universal Vocoders with Feature Smoothing-Based Augmentation   Methods for High-Quality TTS Systems

Jeongmin Liu; Eunwoo Song

arXiv:2409.02517·cs.SD·September 5, 2024

Training Universal Vocoders with Feature Smoothing-Based Augmentation Methods for High-Quality TTS Systems

Jeongmin Liu, Eunwoo Song

PDF

Open Access

TL;DR

This paper introduces a feature smoothing-based augmentation method for training universal vocoders, improving their generalization and naturalness in high-quality TTS systems without architectural changes.

Contribution

The proposed augmentation technique enhances universal vocoder training by applying random linear smoothing to acoustic features, reducing mismatch and improving synthetic speech quality.

Findings

01

Achieved approximately 12% improvement in mean opinion scores with Tacotron 2.

02

Achieved approximately 12% improvement in mean opinion scores with FastSpeech 2.

03

Method is applicable to any vocoder without architectural modifications.

Abstract

While universal vocoders have achieved proficient waveform generation across diverse voices, their integration into text-to-speech (TTS) tasks often results in degraded synthetic quality. To address this challenge, we present a novel augmentation technique for training universal vocoders. Our training scheme randomly applies linear smoothing filters to input acoustic features, facilitating vocoder generalization across a wide range of smoothings. It significantly mitigates the training-inference mismatch, enhancing the naturalness of synthetic output even when the acoustic model produces overly smoothed features. Notably, our method is applicable to any vocoder without requiring architectural modifications or dependencies on specific acoustic models. The experimental results validate the superiority of our vocoder over conventional methods, achieving 11.99% and 12.05% improvements in…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsAdvanced Data Compression Techniques · Advanced Algorithms and Applications

MethodsAttention Is All You Need · Sigmoid Activation · Dilated Causal Convolution · Softmax · Long Short-Term Memory · Layer Normalization · Zoneout · WaveNet · Position-Wise Feed-Forward Layer · Highway Layer