Towards Implicit Bias Detection and Mitigation in Multi-Agent LLM Interactions
Angana Borah, Rada Mihalcea

TL;DR
This paper investigates implicit gender biases in multi-agent LLM interactions, develops a dataset and metric to assess biases, and proposes two mitigation strategies—self-reflection with in-context examples and supervised fine-tuning—that effectively reduce biases.
Contribution
It introduces a novel dataset and metric for detecting implicit biases in multi-agent LLM interactions and evaluates two mitigation strategies, demonstrating their effectiveness.
Findings
LLMs exhibit implicit gender biases in over 50% of outputs.
Biases tend to increase after multi-agent interactions.
Combining fine-tuning with self-reflection best reduces biases.
Abstract
As Large Language Models (LLMs) continue to evolve, they are increasingly being employed in numerous studies to simulate societies and execute diverse social tasks. However, LLMs are susceptible to societal biases due to their exposure to human-generated data. Given that LLMs are being used to gain insights into various societal aspects, it is essential to mitigate these biases. To that end, our study investigates the presence of implicit gender biases in multi-agent LLM interactions and proposes two strategies to mitigate these biases. We begin by creating a dataset of scenarios where implicit gender biases might arise, and subsequently develop a metric to assess the presence of biases. Our empirical analysis reveals that LLMs generate outputs characterized by strong implicit bias associations (>= 50\% of the time). Furthermore, these biases tend to escalate following multi-agent…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
Taxonomy
TopicsMulti-Agent Systems and Negotiation
