From Preferences to Prejudice: The Role of Alignment Tuning in Shaping Social Bias in Video Diffusion Models

Zefan Cai; Haoyi Qiu; Haozhe Zhao; Ke Wan; Jiachen Li; Jiuxiang Gu; Wen Xiao; Nanyun Peng; Junjie Hu

arXiv:2510.17247·cs.CL·February 12, 2026

From Preferences to Prejudice: The Role of Alignment Tuning in Shaping Social Bias in Video Diffusion Models

Zefan Cai, Haoyi Qiu, Haozhe Zhao, Ke Wan, Jiachen Li, Jiuxiang Gu, Wen Xiao, Nanyun Peng, Junjie Hu

PDF

Open Access

TL;DR

This paper introduces VideoBiasEval, a framework for diagnosing social biases in video diffusion models, revealing how alignment tuning amplifies and stabilizes biases, emphasizing the need for bias mitigation in video generation.

Contribution

It presents a comprehensive diagnostic framework for evaluating social biases in video diffusion models and analyzes how alignment tuning affects bias propagation and persistence.

Findings

01

Alignment tuning amplifies social biases in video models.

02

Biases become more temporally stable after tuning.

03

Bias-aware evaluation is crucial for fair video generation.

Abstract

Recent advances in video diffusion models have significantly enhanced text-to-video generation, particularly through alignment tuning using reward models trained on human preferences. While these methods improve visual quality, they can unintentionally encode and amplify social biases. To systematically trace how such biases evolve throughout the alignment pipeline, we introduce VideoBiasEval, a comprehensive diagnostic framework for evaluating social representation in video generation. Grounded in established social bias taxonomies, VideoBiasEval employs an event-based prompting strategy to disentangle semantic content (actions and contexts) from actor attributes (gender and ethnicity). It further introduces multi-granular metrics to evaluate (1) overall ethnicity bias, (2) gender bias conditioned on ethnicity, (3) distributional shifts in social attributes across model variants, and…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsGenerative Adversarial Networks and Image Synthesis · Multimodal Machine Learning Applications · Hate Speech and Cyberbullying Detection