BBQ-V: Benchmarking Visual Stereotype Bias in Large Multimodal Models

Vishal Narnaware; Ashmal Vayani; Rohit Gupta; Sirnam Swetha; and Mubarak Shah

arXiv:2502.08779·cs.CV·January 19, 2026

BBQ-V: Benchmarking Visual Stereotype Bias in Large Multimodal Models

Vishal Narnaware, Ashmal Vayani, Rohit Gupta, Sirnam Swetha, and Mubarak Shah

PDF

Open Access 1 Datasets

TL;DR

BBQ-V is a comprehensive benchmark designed to evaluate stereotype biases in large multimodal models using real-world images across diverse categories, revealing biases in current models and aiding fairness improvements.

Contribution

Introduces BBQ-V, the most extensive real-image benchmark for assessing stereotype biases in LMMs across multiple categories and sub-categories.

Findings

01

Top models exhibit significant stereotype biases.

02

Bias increases in reasoning chains.

03

Real-world images reveal more nuanced biases.

Abstract

Stereotype biases in Large Multimodal Models (LMMs) perpetuate harmful societal prejudices, undermining the fairness and equity of AI applications. As LMMs grow increasingly influential, addressing and mitigating inherent biases related to stereotypes, harmful generations, and ambiguous assumptions in real-world scenarios has become essential. However, existing datasets evaluating stereotype biases in LMMs often lack diversity, rely on synthetic images, and often have single-actor images, leaving a gap in bias evaluation for real-world visual contexts. To address the gap in bias evaluation using real images, we introduce the BBQ-Vision (BBQ-V), the most comprehensive framework for assessing stereotype biases across nine diverse categories and 50 sub-categories with real and multi-actor images. BBQ-V benchmark contains 14,144 image-question pairs and rigorously evaluates LMMs through…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Datasets

ucf-crcv/SB-Bench
dataset· 203 dl
203 dl

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsNatural Language Processing Techniques · Speech and dialogue systems