Diffusion-APO: Trajectory-Aware Direct Preference Alignment for Video Diffusion Transformers
Jingyuan Zhu, Biaolong Chen, Le Zhang, Aixi Zhang, Hao Jiang, Pipei Huang

TL;DR
Diffusion-APO introduces a trajectory-aware preference alignment method for video diffusion models, improving visual quality and instruction following without relying on scalar rewards.
Contribution
It presents a novel trajectory-aware algorithm and a modular RLHF framework that enhance scalable preference alignment in video diffusion transformers.
Findings
Outperforms standard baselines in visual quality and instruction following.
Effectively preserves generative fidelity during model acceleration.
Provides a scalable, end-to-end pathway for video diffusion alignment.
Abstract
Efficiently aligning large-scale video diffusion models with human intent requires a scalable and trajectory-aware pathway that bridges the inherent discrepancy between training noise distributions and practical inference trajectories. While existing paradigms such as Direct Preference Optimization (DPO) and Group Relative Policy Optimization (GRPO) attempt to address this, they are often hindered by either reliance on bias-prone, complex reward models or suboptimal timestep sampling. In this paper, we propose Diffusion-APO (Aligned Preference Optimization), a trajectory-aware algorithm that resolves this misalignment by synchronizing training noise with inference-time denoising paths to maximize gradient signal efficacy. To translate this algorithmic innovation into a practical solution, we introduce a unified and modular RLHF framework that integrates online ranking, half-online…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
