VEAT Quantifies Implicit Associations in Text-to-Video Generator Sora and Reveals Challenges in Bias Mitigation
Yongxu Sun, Michael Saxon, Ian Yang, Anna-Maria Gueorguieva, Aylin Caliskan

TL;DR
This paper introduces VEAT, a new method to quantify societal biases in text-to-video generators like Sora, revealing that these models often reflect and can amplify harmful stereotypes despite debiasing efforts.
Contribution
The paper develops VEAT and SC-VEAT to measure biases in video generation, demonstrating their effectiveness and exposing challenges in bias mitigation for T2V models.
Findings
Sora exhibits strong associations of race and gender with valence in generated videos.
Debiasing prompts can sometimes increase biases instead of reducing them.
Biases in T2V models correlate with real-world demographic distributions.
Abstract
Text-to-Video (T2V) generators such as Sora raise concerns about whether generated content reflects societal bias. We extend embedding-association tests from words and images to video by introducing the Video Embedding Association Test (VEAT) and Single-Category VEAT (SC-VEAT). We validate these methods by reproducing the direction and magnitude of associations from widely used baselines, including Implicit Association Test (IAT) scenarios and OASIS image categories. We then quantify race (African American vs. European American) and gender (women vs. men) associations with valence (pleasant vs. unpleasant) across 17 occupations and 7 awards. Sora videos associate European Americans and women more with pleasantness (both d>0.8). Effect sizes correlate with real-world demographic distributions: percent men and White in occupations (r=0.93, r=0.83) and percent male and non-Black among…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsFace Recognition and Perception · Social and Intergroup Psychology · Psychology of Moral and Emotional Judgment
