TRIMS: Trajectory-Ranked Instruction Masked Supervision for Diffusion Language Models
Lingjie Chen, Ruizhong Qiu, Yuyu Fan, Yanjun Zhao, Hanghang Tong

TL;DR
TRIMS introduces a trajectory-guided supervised fine-tuning method for diffusion language models, improving decoding efficiency and accuracy by explicitly supervising token reveal order with minimal overhead.
Contribution
It presents a simple, low-cost trajectory supervision framework that enhances decoding trajectories in diffusion language models without relying on expensive distillation.
Findings
TRIMS improves accuracy-parallelism trade-off on math and coding benchmarks.
It achieves competitive performance with less training cost than distillation methods.
Decoding trajectories are notably better with TRIMS, validating trajectory-guided supervision.
Abstract
Diffusion language models (DLMs) offer a promising path toward low-latency generation through parallel decoding, but their practical efficiency depends heavily on the decoding trajectory. In practice, this advantage often fails to fully materialize because standard training does not provide explicit supervision over token reveal order, creating a train-inference mismatch that leads to suboptimal decoding behavior. We propose Trajectory-Ranked Instruction Masked Supervision (TRIMS), a simple trajectory-guided supervised fine-tuning framework that injects trajectory supervision into standard Masked Diffusion Language Model (MDLM) training with minimal overhead. Instead of relying on costly DLM-based distillation, TRIMS uses lightweight signals from an autoregressive teacher to guide a trajectory-aware masking strategy, encouraging the model to learn more effective decoding orders.…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
