Improving French Synthetic Speech Quality via SSML Prosody Control

Nassima Ould Ouali; Awais Hussain Sani; Ruben Bueno; Jonah Dauvet; Tim Luka Horstmann; Eric Moulines

arXiv:2508.17494·cs.CL·August 26, 2025

Improving French Synthetic Speech Quality via SSML Prosody Control

Nassima Ould Ouali, Awais Hussain Sani, Ruben Bueno, Jonah Dauvet, Tim Luka Horstmann, Eric Moulines

PDF

4 Models

TL;DR

This paper presents an end-to-end pipeline that inserts SSML tags into French text to improve prosody control in TTS, significantly enhancing speech naturalness and expressiveness.

Contribution

It introduces a novel cascaded architecture with fine-tuned LLMs for prosody prediction and SSML generation, advancing expressiveness in French speech synthesis.

Findings

01

Achieved 99.2% F1 in break placement

02

Reduced pitch, rate, volume errors by 25-40%

03

Significant perceptual quality improvement (MOS from 3.20 to 3.87)

Abstract

Despite recent advances, synthetic voices often lack expressiveness due to limited prosody control in commercial text-to-speech (TTS) systems. We introduce the first end-to-end pipeline that inserts Speech Synthesis Markup Language (SSML) tags into French text to control pitch, speaking rate, volume, and pause duration. We employ a cascaded architecture with two QLoRA-fine-tuned Qwen 2.5-7B models: one predicts phrase-break positions and the other performs regression on prosodic targets, generating commercial TTS-compatible SSML markup. Evaluated on a 14-hour French podcast corpus, our method achieves 99.2% F1 for break placement and reduces mean absolute error on pitch, rate, and volume by 25-40% compared with prompting-only large language models (LLMs) and a BiLSTM baseline. In perceptual evaluation involving 18 participants across over 9 hours of synthesized audio, SSML-enhanced…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Models

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.