TL;DR
GibbsTTS introduces a kinetic-optimal scheduler and moment correction for metric-induced discrete flow matching, significantly improving zero-shot text-to-speech quality and speaker similarity.
Contribution
The paper develops a training-free numerical scheduler and finite-step moment correction for MI-DFM, enhancing discrete generation in TTS without hyperparameter tuning.
Findings
GibbsTTS achieves the best naturalness in subjective evaluations.
It shows superior speaker similarity on most test sets.
Outperforms masked discrete generative baselines and other SOTA TTS systems.
Abstract
Metric-induced discrete flow matching (MI-DFM) exploits token-latent geometry for discrete generation, but its practical use is limited by two issues: heuristic schedulers requiring hyperparameter search, and finite-step path-tracking error from its first-order continuous-time Markov chain (CTMC) solver. We address both issues. First, we derive a kinetic-optimal scheduler for prescribed scalar-parameterized probability paths, and instantiate it for MI-DFM as a training-free numerical schedule that traverses the path at constant Fisher-Rao speed. Second, we introduce a finite-step moment correction that adjusts the jump probability while preserving the CTMC jump destination distribution. We validate the resulting method, GibbsTTS, on codec-based zero-shot text-to-speech (TTS). Under controlled comparisons with a unified architecture and large-scale dataset, GibbsTTS achieves the best…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
