Trajectory as the Teacher: Few-Step Discrete Flow Matching via Energy-Navigated Distillation

Amin Karimi Monsefi; Dominic Culver; Nikhil Bhendawade; Manuel R. Ciosici; Yizhe Zhang; Irina Belousova

arXiv:2605.07924·cs.LG·May 11, 2026

Trajectory as the Teacher: Few-Step Discrete Flow Matching via Energy-Navigated Distillation

Amin Karimi Monsefi, Dominic Culver, Nikhil Bhendawade, Manuel R. Ciosici, Yizhe Zhang, Irina Belousova

PDF

TL;DR

This paper introduces TS-DFM, a guided navigation approach that improves discrete flow matching by evaluating and selecting better continuations during training, resulting in faster and more accurate language generation.

Contribution

TS-DFM replaces blind stochastic jumps with guided energy-based navigation, significantly enhancing discrete flow matching efficiency and performance without increasing inference cost.

Findings

01

Shaped student model achieves 32% lower perplexity than the teacher.

02

TS-DFM outperforms all compared discrete-generation baselines.

03

Model trained on 170M parameters achieves state-of-the-art perplexity.

Abstract

Discrete flow matching generates text by iteratively transforming noise tokens into coherent language, but may require hundreds of forward passes. Distillation uses the multi-step trajectory to train a student to reproduce the process in a few steps. When the student underperforms, the usual explanation is insufficient capacity. We argue the opposite: the trajectory is the bottleneck, not the student. Each training trajectory is built through a chain of blind stochastic jumps with no evaluation of sequence quality; a single bad decision at an early midpoint propagates through subsequent steps, yet the student must imitate the result. Trajectory-Shaped Discrete Flow Matching (TS-DFM) replaces these blind jumps with guided navigation: a lightweight energy compass evaluates candidate continuations at each midpoint, selecting the most coherent. All shaping is training-only; inference cost…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.