Mind the Generative Details: Direct Localized Detail Preference Optimization for Video Diffusion Models

Zitong Huang; Kaidong Zhang; Yukang Ding; Chao Gao; Rui Ding; Ying Chen; Wangmeng Zuo

arXiv:2601.04068·cs.CV·May 21, 2026

Mind the Generative Details: Direct Localized Detail Preference Optimization for Video Diffusion Models

Zitong Huang, Kaidong Zhang, Yukang Ding, Chao Gao, Rui Ding, Ying Chen, Wangmeng Zuo

PDF

1 Repo

TL;DR

This paper introduces LocalDPO, a post-training framework for aligning text-to-video diffusion models with human preferences by optimizing at the spatio-temporal region level using localized preference pairs.

Contribution

LocalDPO constructs localized preference pairs from real videos, eliminating the need for external critics and manual annotations, and improves video quality and coherence.

Findings

01

LocalDPO enhances video fidelity and temporal coherence.

02

It outperforms other post-training methods in human preference scores.

03

The approach converges rapidly due to region-aware loss.

Abstract

Aligning text-to-video diffusion models with human preferences is crucial for generating high-quality videos. Existing Direct Preference Otimization (DPO) methods rely on multi-sample ranking and task-specific critic models, which is inefficient and often yields ambiguous global supervision. To address these limitations, we propose LocalDPO, a novel post-training framework that constructs localized preference pairs from real videos and optimizes alignment at the spatio-temporal region level. We design an automated pipeline to efficiently collect preference pair data that generates preference pairs with a single inference per prompt, eliminating the need for external critic models or manual annotation. Specifically, we treat high-quality real videos as positive samples and generate corresponding negatives by locally corrupting them with random spatio-temporal masks and restoring only the…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

1170300714/Local-DPO
github

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.