TMPO: Trajectory Matching Policy Optimization for Diverse and Efficient Diffusion Alignment

Jiaming Li; Chenyu Zhu; Nanxi Yi; Youjun Bao; Li Sun; Quanying Lv; Xiang Fang; Daizong Liu; Jianjun Li; Kun He; Bowen Zhou; Zhiyuan Ma

arXiv:2605.10983·cs.LG·May 14, 2026

TMPO: Trajectory Matching Policy Optimization for Diverse and Efficient Diffusion Alignment

Jiaming Li, Chenyu Zhu, Nanxi Yi, Youjun Bao, Li Sun, Quanying Lv, Xiang Fang, Daizong Liu, Jianjun Li, Kun He, Bowen Zhou, Zhiyuan Ma

PDF

TL;DR

TMPO introduces a trajectory-level reward distribution matching approach with a Softmax-TB objective, enhancing diversity and efficiency in diffusion model alignment tasks compared to reward maximization methods.

Contribution

The paper proposes TMPO, a novel trajectory matching policy optimization method that improves diversity and reduces reward hacking in diffusion model alignment.

Findings

01

TMPO improves generative diversity by 9.1% over state-of-the-art methods.

02

TMPO achieves a better trade-off between reward and diversity.

03

Dynamic Stochastic Tree Sampling reduces training time while maintaining performance.

Abstract

Reinforcement learning (RL) has shown extraordinary potential in aligning diffusion models to downstream tasks, yet most of them still suffer from significant reward hacking, which degrades generative diversity and quality by inducing visual mode collapse and amplifying unreliable rewards. We identify the root cause as the mode-seeking nature of these methods, which maximize expected reward without effectively constraining probability distribution over acceptable trajectories, causing concentration on a few high-reward paths. In contrast, we propose Trajectory Matching Policy Optimization (TMPO), which replaces scalar reward maximization with trajectory-level reward distribution matching. Specifically, TMPO introduces a Softmax Trajectory Balance (Softmax-TB) objective to match the policy probabilities of K trajectories to a reward-induced Boltzmann distribution. We prove that this…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.