SplitMeanFlow: Interval Splitting Consistency in Few-Step Generative Modeling
Yi Guo, Wei Wang, Zhihang Yuan, Rong Cao, Kuan Chen, Zhengyang Chen, Yuanyuan Huo, Yang Zhang, Yuping Wang, Shouda Liu, Yuxuan Wang

TL;DR
SplitMeanFlow introduces an algebraic principle called Interval Splitting Consistency for more efficient, stable, and generalizable few-step generative modeling, demonstrated by significant speedups in speech synthesis.
Contribution
It derives a novel algebraic identity for average velocity fields, generalizing differential-based methods like MeanFlow, and improves efficiency and stability in generative modeling.
Findings
Achieves 20x speedup in speech synthesis applications.
Eliminates the need for JVP computations, simplifying implementation.
Provides a more stable and hardware-compatible training framework.
Abstract
Generative models like Flow Matching have achieved state-of-the-art performance but are often hindered by a computationally expensive iterative sampling process. To address this, recent work has focused on few-step or one-step generation by learning the average velocity field, which directly maps noise to data. MeanFlow, a leading method in this area, learns this field by enforcing a differential identity that connects the average and instantaneous velocities. In this work, we argue that this differential formulation is a limiting special case of a more fundamental principle. We return to the first principles of average velocity and leverage the additivity property of definite integrals. This leads us to derive a novel, purely algebraic identity we term Interval Splitting Consistency. This identity establishes a self-referential relationship for the average velocity field across…
Peer Reviews
Decision·Submitted to ICLR 2026
1. Strong theoretical foundation with elegant algebraic formulation. The paper provides a rigorous mathematical derivation of the Interval Splitting Consistency identity from first principles (Equations 8-11), showing how the additivity property of integrals naturally leads to an algebraic constraint on average velocity fields. The theoretical analysis in Appendix A convincingly demonstrates that MeanFlow's differential identity (Equation 7) emerges as a special limiting case, establishing Split
1. Missing key comparisons weaken empirical contributions. First, the paper lacks direct comparison with the shortcut model (Frans et al., 2025) despite acknowledging its partial equivalence to SplitMeanFlow for the special case s=(r+t)/2 (lines 307-311). The dismissal that "design philosophies differ significantly" is inadequate without empirical comparison showing performance differences. Second, Table 2 (page 8) reports MeanFlow results with reduced batch size (0.1× original) to "accommodate
- SplitMeanFlow replaces the differential MeanFlow identity with a purely algebraic self-consistency derived from additivity of definite integrals: $(t-r)u(z_t,r,t) =(s-r)u(z_s,r,s)+(t-s)u(z_t,s,t)$. This algebraic constraint removes JVP operation and gives a simple, theoretically grounded training objective for learning the average-velocity field. - Another strength is that TTS experiments show that SplitMeanFlow can one-step/few-step generation on top of the multiple different architectures/m
The major limitation of this work as a scientific paper is its experimental evaluation. Specifically, the experimantal validation is limited only to TTS even though, excluding the abstract, this work rarely mention about speech from the introduction until the experiments through the related work. The sudden focus on TTS-only evaluation in the experiments feels abrupt, and even within this evaluation, the comparison against existing diffusion-based, flow-matching-based, and their accelerated TTS
- The core identity comes directly from integral additivity. It's conceptually clear. The derivation that MeanFlow is the limiting special case gives the method theoretical connection to prior works. - Comparing to MeanFLow, SplitMeanFlow avoids JVP calculation, lowering memory consumption. - Relatively strong empirical results in TTS. Achieving 1-4 NFEs with quality at parity with 32-step Flow Matching. - Has a clear reproducibility statement. Experiments leverage public codebases (F5-TTS, Cosy
- All main experiments are in TTS. While compelling, it is unclear how well the identity and training dynamics transfer to image generation where evaluation and inductive biases differ. - Reliance on distillation teacher supervision. The strongest results come from a Flow-Matching teacher → SplitMeanFlow student procedure. Also, the current training need to use the teacher’s velocity $v(z_t, t)$ to ensure boundary conditions, avoiding degenerate solutions. Combined with comparable performance to
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsSemantic Web and Ontologies · Simulation Techniques and Applications · Natural Language Processing Techniques
