A Unified Framework and Dataset for Assessing Societal Bias in Vision-Language Models
Ashutosh Sathe, Prachi Jain, Sunayana Sitaram

TL;DR
This paper introduces a comprehensive framework and dataset for evaluating societal biases in vision-language models across multiple inference modes, aiming to guide future development of less biased AI systems.
Contribution
It presents a unified evaluation framework and a synthetic dataset to systematically assess gender, race, and age biases in various VLMs across different modalities.
Findings
Bias varies with input-output modalities.
Models show distinct biases across attributes.
Synthetic dataset enables bias benchmarking.
Abstract
Vision-language models (VLMs) have gained widespread adoption in both industry and academia. In this study, we propose a unified framework for systematically evaluating gender, race, and age biases in VLMs with respect to professions. Our evaluation encompasses all supported inference modes of the recent VLMs, including image-to-text, text-to-text, text-to-image, and image-to-image. Additionally, we propose an automated pipeline to generate high-quality synthetic datasets that intentionally conceal gender, race, and age information across different professional domains, both in generated text and images. The dataset includes action-based descriptions of each profession and serves as a benchmark for evaluating societal biases in vision-language models (VLMs). In our comparative analysis of widely used VLMs, we have identified that varying input-output modalities lead to discernible…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsReligion and Sociopolitical Dynamics in Nigeria
