MAGIC-TTS: Fine-Grained Controllable Speech Synthesis with Explicit Local Duration and Pause Control

Jialong Mai; Xiaofen Xing; Xiangmin Xu

arXiv:2604.21164·cs.SD·April 28, 2026

MAGIC-TTS: Fine-Grained Controllable Speech Synthesis with Explicit Local Duration and Pause Control

Jialong Mai, Xiaofen Xing, Xiangmin Xu

PDF

1 Models 2 Datasets

TL;DR

MAGIC-TTS introduces a novel text-to-speech model enabling explicit token-level timing control, improving local duration and pause manipulation while maintaining high speech quality.

Contribution

It is the first TTS system with explicit local timing control over token duration and pauses, enhancing fine-grained speech editing capabilities.

Findings

01

Substantially improves token-level duration and pause control.

02

Maintains natural high-quality synthesis without timing controls.

03

Effective in local editing scenarios like navigation and code reading.

Abstract

Fine-grained local timing control is still absent from modern text-to-speech systems: existing approaches typically provide only utterance-level duration or global speaking-rate control, while precise token-level timing manipulation remains unavailable. To the best of our knowledge, MAGIC-TTS is the first TTS model with explicit local timing control over token-level content duration and pause. MAGIC-TTS is enabled by explicit token-level duration conditioning, carefully prepared high-confidence duration supervision, and training mechanisms that correct zero-value bias and make the model robust to missing local controls. On our timing-control benchmark, MAGIC-TTS substantially improves token-level duration and pause following over spontaneous synthesis. Even when no timing control is provided, MAGIC-TTS maintains natural high-quality synthesis. We further evaluate practical local editing…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Models

🤗
maimai11/MAGIC-TTS
model

Datasets

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.