Trajectory as the Teacher: Few-Step Discrete Flow Matching via Energy-Navigated Distillation
Amin Karimi Monsefi, Dominic Culver, Nikhil Bhendawade, Manuel R. Ciosici, Yizhe Zhang, Irina Belousova

TL;DR
This paper introduces TS-DFM, a guided navigation approach that improves discrete flow matching by evaluating and selecting better continuations during training, resulting in faster and more accurate language generation.
Contribution
TS-DFM replaces blind stochastic jumps with guided energy-based navigation, significantly enhancing discrete flow matching efficiency and performance without increasing inference cost.
Findings
Shaped student model achieves 32% lower perplexity than the teacher.
TS-DFM outperforms all compared discrete-generation baselines.
Model trained on 170M parameters achieves state-of-the-art perplexity.
Abstract
Discrete flow matching generates text by iteratively transforming noise tokens into coherent language, but may require hundreds of forward passes. Distillation uses the multi-step trajectory to train a student to reproduce the process in a few steps. When the student underperforms, the usual explanation is insufficient capacity. We argue the opposite: the trajectory is the bottleneck, not the student. Each training trajectory is built through a chain of blind stochastic jumps with no evaluation of sequence quality; a single bad decision at an early midpoint propagates through subsequent steps, yet the student must imitate the result. Trajectory-Shaped Discrete Flow Matching (TS-DFM) replaces these blind jumps with guided navigation: a lightweight energy compass evaluates candidate continuations at each midpoint, selecting the most coherent. All shaping is training-only; inference cost…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
