Kinetic-Optimal Scheduling with Moment Correction for Metric-Induced Discrete Flow Matching in Zero-Shot Text-to-Speech

Dong Yang; Yiyi Cai; Haoyu Zhang; Yuki Saito; Hiroshi Saruwatari

arXiv:2605.09386·eess.AS·May 12, 2026

Kinetic-Optimal Scheduling with Moment Correction for Metric-Induced Discrete Flow Matching in Zero-Shot Text-to-Speech

Dong Yang, Yiyi Cai, Haoyu Zhang, Yuki Saito, Hiroshi Saruwatari

PDF

1 Repo

TL;DR

GibbsTTS introduces a kinetic-optimal scheduler and moment correction for metric-induced discrete flow matching, significantly improving zero-shot text-to-speech quality and speaker similarity.

Contribution

The paper develops a training-free numerical scheduler and finite-step moment correction for MI-DFM, enhancing discrete generation in TTS without hyperparameter tuning.

Findings

01

GibbsTTS achieves the best naturalness in subjective evaluations.

02

It shows superior speaker similarity on most test sets.

03

Outperforms masked discrete generative baselines and other SOTA TTS systems.

Abstract

Metric-induced discrete flow matching (MI-DFM) exploits token-latent geometry for discrete generation, but its practical use is limited by two issues: heuristic schedulers requiring hyperparameter search, and finite-step path-tracking error from its first-order continuous-time Markov chain (CTMC) solver. We address both issues. First, we derive a kinetic-optimal scheduler for prescribed scalar-parameterized probability paths, and instantiate it for MI-DFM as a training-free numerical schedule that traverses the path at constant Fisher-Rao speed. Second, we introduce a finite-step moment correction that adjusts the jump probability while preserving the CTMC jump destination distribution. We validate the resulting method, GibbsTTS, on codec-based zero-shot text-to-speech (TTS). Under controlled comparisons with a unified architecture and large-scale dataset, GibbsTTS achieves the best…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

https://ydqmkkx.github.io/GibbsTTSProject
github

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.