SHIFT: Motion Alignment in Video Diffusion Models with Adversarial Hybrid Fine-Tuning
Xi Ye, Wenjia Yang, Yangyang Xu, Xiaoyang Liu, Duo Su, Mengfei Xia, Jun Zhu

TL;DR
This paper introduces SHIFT, a novel fine-tuning framework for video diffusion models that enhances motion fidelity and temporal coherence using pixel-motion rewards and adversarial techniques.
Contribution
The paper proposes a scalable reward-driven fine-tuning method combining supervised and advantage weighted fine-tuning with adversarial advantages for better motion alignment.
Findings
Improves motion fidelity in video diffusion models
Enhances convergence speed and reduces reward hacking
Resolves dynamic-degree collapse in fine-tuned models
Abstract
Image-conditioned Video diffusion models achieve impressive visual realism but often suffer from weakened motion fidelity, e.g., reduced motion dynamics or degraded long-term temporal coherence, especially after fine-tuning. We study the problem of motion alignment in video diffusion models post-training. To address this, we introduce pixel-motion rewards based on pixel flux dynamics, capturing both instantaneous and long-term motion consistency. We further propose Smooth Hybrid Fine-tuning (SHIFT), a scalable reward-driven fine-tuning framework for video diffusion models. SHIFT fuses the normal supervised fine-tuning and advantage weighted fine-tuning into a unified framework. Benefiting from novel adversarial advantages, SHIFT improves convergence speed and mitigates reward hacking. Experiments show that our approach efficiently resolves dynamic-degree collapse in modern video…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsGenerative Adversarial Networks and Image Synthesis · Advanced Image Processing Techniques · Advanced Vision and Imaging
