Aligner-Guided Training Paradigm: Advancing Text-to-Speech Models with   Aligner Guided Duration

Haowei Lou; Helen Paik; Wen Hu; Lina Yao

arXiv:2412.08112·cs.SD·December 12, 2024

Aligner-Guided Training Paradigm: Advancing Text-to-Speech Models with Aligner Guided Duration

Haowei Lou, Helen Paik, Wen Hu, Lina Yao

PDF

Open Access

TL;DR

This paper introduces an Aligner-Guided Training Paradigm for TTS models that improves duration accuracy and speech naturalness by training an aligner beforehand, reducing reliance on external tools.

Contribution

The paper presents a novel aligner-guided training approach that enhances duration labelling accuracy and alignment in TTS systems, outperforming existing methods.

Findings

01

Up to 16% reduction in word error rate

02

Improved phoneme and tone alignment accuracy

03

Enhanced speech naturalness and intelligibility

Abstract

Recent advancements in text-to-speech (TTS) systems, such as FastSpeech and StyleSpeech, have significantly improved speech generation quality. However, these models often rely on duration generated by external tools like the Montreal Forced Aligner, which can be time-consuming and lack flexibility. The importance of accurate duration is often underestimated, despite their crucial role in achieving natural prosody and intelligibility. To address these limitations, we propose a novel Aligner-Guided Training Paradigm that prioritizes accurate duration labelling by training an aligner before the TTS model. This approach reduces dependence on external tools and enhances alignment accuracy. We further explore the impact of different acoustic features, including Mel-Spectrograms, MFCCs, and latent features, on TTS model performance. Our experimental results show that aligner-guided duration…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsNatural Language Processing Techniques · Topic Modeling