Pride and Prejudice: LLM Amplifies Self-Bias in Self-Refinement

Wenda Xu; Guanglei Zhu; Xuandong Zhao; Liangming Pan; Lei Li; William; Yang Wang

arXiv:2402.11436·cs.CL·June 19, 2024·1 cites

Pride and Prejudice: LLM Amplifies Self-Bias in Self-Refinement

Wenda Xu, Guanglei Zhu, Xuandong Zhao, Liangming Pan, Lei Li, William, Yang Wang

PDF

Open Access 1 Repo

TL;DR

This paper investigates how large language models exhibit self-bias in self-refinement, which can both improve and degrade performance, and proposes methods to mitigate this bias for better outcomes.

Contribution

The paper formally defines LLM self-bias, analyzes its prevalence across models and tasks, and identifies strategies like larger models and external feedback to reduce bias.

Findings

01

Self-bias is prevalent across all examined LLMs and tasks.

02

Self-refinement amplifies existing biases despite improving fluency.

03

Larger models and external feedback can significantly reduce self-bias.

Abstract

Recent studies show that large language models (LLMs) improve their performance through self-feedback on certain tasks while degrade on others. We discovered that such a contrary is due to LLM's bias in evaluating their own output. In this paper, we formally define LLM's self-bias - the tendency to favor its own generation - using two statistics. We analyze six LLMs (GPT-4, GPT-3.5, Gemini, LLaMA2, Mixtral and DeepSeek) on translation, constrained text generation, and mathematical reasoning tasks. We find that self-bias is prevalent in all examined LLMs across multiple languages and tasks. Our analysis reveals that while the self-refine pipeline improves the fluency and understandability of model outputs, it further amplifies self-bias. To mitigate such biases, we discover that larger model size and external feedback with accurate assessment can significantly reduce bias in the…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

xu1998hz/llm_self_bias
pytorchOfficial

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsTopic Modeling

MethodsRefunds@Expedia|||How do I get a full refund from Expedia? · Attention Is All You Need · Cosine Annealing · Residual Connection · Softmax · Layer Normalization · Byte Pair Encoding · Adam · Attention Dropout · Weight Decay