Beyond Mode-Seeking RL: Trajectory-Balance Post-Training for Diffusion Language Models

Saba Ahmadi; Prasanna Parthasarathi; Yufei Cui

arXiv:2605.13935·cs.LG·May 15, 2026

Beyond Mode-Seeking RL: Trajectory-Balance Post-Training for Diffusion Language Models

Saba Ahmadi, Prasanna Parthasarathi, Yufei Cui

PDF

TL;DR

This paper introduces TraFL, a novel post-training method for diffusion language models that addresses trajectory locking, leading to improved performance across reasoning and code generation benchmarks.

Contribution

TraFL is a new trajectory-balance objective that enhances diffusion language models by promoting diverse solution paths and improving benchmark performance.

Findings

01

TraFL outperforms the base model in all evaluated benchmarks.

02

TraFL maintains improvements as sampling budget increases.

03

TraFL surpasses the base model on Minerva Math and LiveCodeBench evaluations.

Abstract

Diffusion language models are a promising alternative to autoregressive models, yet post-training methods for them largely adapt reward-maximizing objectives. We identify a central failure mode in this setting we call trajectory locking: sampled reward-driven updates over-concentrate probability mass onto a narrow set of denoising paths, reducing coverage of alternative correct solutions under repeated sampling. To address this, we propose TraFL (Trajectory Flow baLancing), a trajectory-balance objective that trains the policy toward a reward-tilted target distribution anchored to a frozen reference model. We make this practical for diffusion language models with a diffusion-compatible sequence-level surrogate and a learned prompt-dependent normalization. Across mathematical reasoning and code generation benchmarks, TraFL is the only evaluated post-training method that improves over the…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.