BBQ-V: Benchmarking Visual Stereotype Bias in Large Multimodal Models
Vishal Narnaware, Ashmal Vayani, Rohit Gupta, Sirnam Swetha, and Mubarak Shah

TL;DR
BBQ-V is a comprehensive benchmark designed to evaluate stereotype biases in large multimodal models using real-world images across diverse categories, revealing biases in current models and aiding fairness improvements.
Contribution
Introduces BBQ-V, the most extensive real-image benchmark for assessing stereotype biases in LMMs across multiple categories and sub-categories.
Findings
Top models exhibit significant stereotype biases.
Bias increases in reasoning chains.
Real-world images reveal more nuanced biases.
Abstract
Stereotype biases in Large Multimodal Models (LMMs) perpetuate harmful societal prejudices, undermining the fairness and equity of AI applications. As LMMs grow increasingly influential, addressing and mitigating inherent biases related to stereotypes, harmful generations, and ambiguous assumptions in real-world scenarios has become essential. However, existing datasets evaluating stereotype biases in LMMs often lack diversity, rely on synthetic images, and often have single-actor images, leaving a gap in bias evaluation for real-world visual contexts. To address the gap in bias evaluation using real images, we introduce the BBQ-Vision (BBQ-V), the most comprehensive framework for assessing stereotype biases across nine diverse categories and 50 sub-categories with real and multi-actor images. BBQ-V benchmark contains 14,144 image-question pairs and rigorously evaluates LMMs through…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsNatural Language Processing Techniques · Speech and dialogue systems
