Towards Implicit Bias Detection and Mitigation in Multi-Agent LLM   Interactions

Angana Borah; Rada Mihalcea

arXiv:2410.02584·cs.CL·October 4, 2024

Towards Implicit Bias Detection and Mitigation in Multi-Agent LLM Interactions

Angana Borah, Rada Mihalcea

PDF

Open Access 1 Repo 1 Video

TL;DR

This paper investigates implicit gender biases in multi-agent LLM interactions, develops a dataset and metric to assess biases, and proposes two mitigation strategies—self-reflection with in-context examples and supervised fine-tuning—that effectively reduce biases.

Contribution

It introduces a novel dataset and metric for detecting implicit biases in multi-agent LLM interactions and evaluates two mitigation strategies, demonstrating their effectiveness.

Findings

01

LLMs exhibit implicit gender biases in over 50% of outputs.

02

Biases tend to increase after multi-agent interactions.

03

Combining fine-tuning with self-reflection best reduces biases.

Abstract

As Large Language Models (LLMs) continue to evolve, they are increasingly being employed in numerous studies to simulate societies and execute diverse social tasks. However, LLMs are susceptible to societal biases due to their exposure to human-generated data. Given that LLMs are being used to gain insights into various societal aspects, it is essential to mitigate these biases. To that end, our study investigates the presence of implicit gender biases in multi-agent LLM interactions and proposes two strategies to mitigate these biases. We begin by creating a dataset of scenarios where implicit gender biases might arise, and subsequently develop a metric to assess the presence of biases. Our empirical analysis reveals that LLMs generate outputs characterized by strong implicit bias associations (>= 50\% of the time). Furthermore, these biases tend to escalate following multi-agent…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

MichiganNLP/MultiAgent_ImplicitBias
pytorchOfficial

Videos

Towards Implicit Bias Detection and Mitigation in Multi-Agent LLM Interactions· underline

Taxonomy

TopicsMulti-Agent Systems and Negotiation