LLMs are Biased Evaluators But Not Biased for Retrieval Augmented Generation

Yen-Shan Chen; Jing Jin; Peng-Ting Kuo; Chao-Wei Huang; Yun-Nung Chen

arXiv:2410.20833·cs.CL·December 9, 2025

LLMs are Biased Evaluators But Not Biased for Retrieval Augmented Generation

Yen-Shan Chen, Jing Jin, Peng-Ting Kuo, Chao-Wei Huang, Yun-Nung Chen

PDF

Open Access 1 Repo 1 Video

TL;DR

This study investigates whether large language models exhibit bias in retrieval-augmented generation tasks, finding they are not biased towards self-generated content but are influenced by factual accuracy across multiple datasets and models.

Contribution

The paper demonstrates that LLMs do not show self-preference bias in RAG frameworks and highlights the importance of factual accuracy in their evaluation, contrasting prior bias findings.

Findings

01

No significant self-preference bias in RAG evaluations

02

Factual accuracy influences LLM outputs

03

Consistent results across datasets and models

Abstract

Recent studies have demonstrated that large language models (LLMs) exhibit significant biases in evaluation tasks, particularly in preferentially rating and favoring self-generated content. However, the extent to which this bias manifests in fact-oriented tasks, especially within retrieval-augmented generation (RAG) frameworks, where keyword extraction and factual accuracy take precedence over stylistic elements, remains unclear. Our study addresses this knowledge gap by simulating two critical phases of the RAG framework. In the first phase, LLMs evaluated human-authored and model-generated passages, emulating the \textit{pointwise reranking phase}. The second phase involves conducting pairwise reading comprehension tests to simulate the \textit{generation phase}. Contrary to previous findings indicating a self-preference in rating tasks, our results reveal no significant…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

MiuLab/RAG-Self-Preference
noneOfficial

Videos

LLMs are Biased Evaluators But Not Biased for Retrieval Augmented Generation· underline

Taxonomy

TopicsNatural Language Processing Techniques

MethodsRefunds@Expedia|||How do I get a full refund from Expedia? · Attention Is All You Need · Adam · Linear Layer · Attention Dropout · Dropout · Weight Decay · Dense Connections · Byte Pair Encoding · BART