Harness Local Rewards for Global Benefits: Effective Text-to-Video   Generation Alignment with Patch-level Reward Models

Shuting Wang; Haihong Tang; Zhicheng Dou; Chenyan Xiong

arXiv:2502.06812·cs.LG·February 19, 2025

Harness Local Rewards for Global Benefits: Effective Text-to-Video Generation Alignment with Patch-level Reward Models

Shuting Wang, Haihong Tang, Zhicheng Dou, Chenyan Xiong

PDF

Open Access

TL;DR

This paper introduces HALO, a post-training strategy for text-to-video models that leverages patch-level rewards from a GPT-4o distilled reward model, significantly improving local error correction and overall video quality.

Contribution

It proposes a novel patch reward model and a granular DPO algorithm to incorporate local feedback into VGM optimization, enhancing alignment with human preferences.

Findings

01

HALO outperforms baseline models in evaluation metrics.

02

Patch reward model aligns well with human annotations.

03

Method effectively reduces patch defects in generated videos.

Abstract

The emergence of diffusion models (DMs) has significantly improved the quality of text-to-video generation models (VGMs). However, current VGM optimization primarily emphasizes the global quality of videos, overlooking localized errors, which leads to suboptimal generation capabilities. To address this issue, we propose a post-training strategy for VGMs, HALO, which explicitly incorporates local feedback from a patch reward model, providing detailed and comprehensive training signals with the video reward model for advanced VGM optimization. To develop an effective patch reward model, we distill GPT-4o to continuously train our video reward model, which enhances training efficiency and ensures consistency between video and patch reward distributions. Furthermore, to harmoniously integrate patch rewards into VGM optimization, we introduce a granular DPO (Gran-DPO) algorithm for DMs,…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsComputational and Text Analysis Methods · Topic Modeling · Wikis in Education and Collaboration

MethodsDirect Preference Optimization · Diffusion