From Individuals to Interactions: Benchmarking Gender Bias in Multimodal Large Language Models from the Lens of Social Relationship

Yue Xu; Wenjie Wang

arXiv:2506.23101·cs.CL·July 8, 2025

From Individuals to Interactions: Benchmarking Gender Bias in Multimodal Large Language Models from the Lens of Social Relationship

Yue Xu, Wenjie Wang

PDF

Open Access

TL;DR

This paper introduces Genres, a new benchmark to evaluate gender bias in multimodal large language models by analyzing social relationship narratives, revealing persistent biases in interactions that previous benchmarks overlooked.

Contribution

The paper presents Genres, a novel benchmark for assessing relational gender bias in MLLMs through social relationship narratives, highlighting subtle biases in interpersonal contexts.

Findings

01

MLLMs exhibit persistent, context-sensitive gender biases.

02

Biases are more evident in relational interactions than in isolated scenarios.

03

The benchmark reveals subtle, interaction-driven gender biases in both open- and closed-source models.

Abstract

Multimodal large language models (MLLMs) have shown impressive capabilities across tasks involving both visual and textual modalities. However, growing concerns remain about their potential to encode and amplify gender bias, particularly in socially sensitive applications. Existing benchmarks predominantly evaluate bias in isolated scenarios, overlooking how bias may emerge subtly through interpersonal interactions. We fill this gap by going beyond single-entity evaluation and instead focusing on a deeper examination of relational and contextual gender bias in dual-individual interactions. We introduce Genres, a novel benchmark designed to evaluate gender bias in MLLMs through the lens of social relationships in generated narratives. Genres assesses gender bias through a dual-character profile and narrative generation task that captures rich interpersonal dynamics and supports a…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsNatural Language Processing Techniques · Interpreting and Communication in Healthcare · Computational and Text Analysis Methods