Reinforcement Learning from Multi-role Debates as Feedback for Bias   Mitigation in LLMs

Ruoxi Cheng; Haoxuan Ma; Shuirong Cao; Jiaqi Li; Aihua Pei; Zhiqiang; Wang; Pengliang Ji; Haoyu Wang; Jiaqi Huo

arXiv:2404.10160·cs.AI·August 19, 2024·1 cites

Reinforcement Learning from Multi-role Debates as Feedback for Bias Mitigation in LLMs

Ruoxi Cheng, Haoxuan Ma, Shuirong Cao, Jiaqi Li, Aihua Pei, Zhiqiang, Wang, Pengliang Ji, Haoyu Wang, Jiaqi Huo

PDF

Open Access

TL;DR

This paper introduces RLDF, a novel bias mitigation method for LLMs that uses multi-role debates among LLMs themselves, replacing human feedback and improving bias reduction.

Contribution

We propose RLDF, a new reinforcement learning approach utilizing multi-role debates among LLMs for bias mitigation, eliminating the need for human feedback.

Findings

01

RLDF effectively reduces bias in LLMs.

02

Multi-role debates enhance bias recognition and mitigation.

03

The approach outperforms traditional bias mitigation methods.

Abstract

Bias in LLMs can harm user experience and societal outcomes. However, current bias mitigation methods often require intensive human feedback, lack transferability to other topics or yield overconfident and random outputs. We find that involving LLMs in role-playing scenario boosts their ability to recognize and mitigate biases. Based on this, we propose Reinforcement Learning from Multi-role Debates as Feedback (RLDF), a novel approach for bias mitigation replacing human feedback in traditional RLHF. We utilize LLMs in multi-role debates to create a dataset that includes both high-bias and low-bias instances for training the reward model in reinforcement learning. Our approach comprises two modes: (1) self-reflection, where the same LLM participates in multi-role debates, and (2) teacher-student, where a more advanced LLM like GPT-3.5-turbo guides the LLM to perform this task.…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsFault Detection and Control Systems · Software Reliability and Analysis Research

MethodsRefunds@Expedia|||How do I get a full refund from Expedia? · {Dispute@FaQ-s}How to file a dispute with Expedia? · 15 Ways to Contact How can i speak to someone at Delta Airlines · Attention Is All You Need · Cosine Annealing · Residual Connection · Softmax · Layer Normalization · Byte Pair Encoding · Adam