ParaStyleTTS: Toward Efficient and Robust Paralinguistic Style Control for Expressive Text-to-Speech Generation

Haowei Lou; Hye-Young Paik; Wen Hu; Lina Yao

arXiv:2510.18308·cs.SD·October 22, 2025

ParaStyleTTS: Toward Efficient and Robust Paralinguistic Style Control for Expressive Text-to-Speech Generation

Haowei Lou, Hye-Young Paik, Wen Hu, Lina Yao

PDF

1 Models

TL;DR

ParaStyleTTS is a lightweight, interpretable TTS system that enables expressive style control from text prompts alone, achieving high-quality speech with robustness, efficiency, and real-time applicability, surpassing LLM-based methods in speed and resource usage.

Contribution

It introduces a novel two-level style adaptation architecture for controllable, robust, and efficient expressive TTS from text prompts without relying on reference audio or large language models.

Findings

01

Generates high-quality speech comparable to state-of-the-art LLM-based systems.

02

Operates 30x faster with 8x fewer parameters and less memory.

03

Exhibits superior robustness and controllability over paralinguistic styles.

Abstract

Controlling speaking style in text-to-speech (TTS) systems has become a growing focus in both academia and industry. While many existing approaches rely on reference audio to guide style generation, such methods are often impractical due to privacy concerns and limited accessibility. More recently, large language models (LLMs) have been used to control speaking style through natural language prompts; however, their high computational cost, lack of interpretability, and sensitivity to prompt phrasing limit their applicability in real-time and resource-constrained environments. In this work, we propose ParaStyleTTS, a lightweight and interpretable TTS framework that enables expressive style control from text prompts alone. ParaStyleTTS features a novel two-level style adaptation architecture that separates prosodic and paralinguistic speech style modeling. It allows fine-grained and…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Models

🤗
haoweilou/ParaStyleTTS
model· 10 dl· ♡ 3
10 dl♡ 3

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.