Data, Not Model: Explaining Bias toward LLM Texts in Neural Retrievers
Wei Huang, Keping Bi, Yinqiong Cai, Wei Chen, Jiafeng Guo, Xueqi Cheng

TL;DR
This paper investigates the source bias in neural retrievers favoring LLM-generated texts, revealing it originates from training data artifacts and proposing methods to mitigate this bias.
Contribution
The study identifies the root cause of source bias as dataset artifacts and introduces two effective strategies to reduce bias in neural retrievers.
Findings
Bias aligns with differences between human and LLM texts in embedding space.
Reducing artifact differences in training data decreases bias.
Adjusting LLM text vectors by removing bias projection also reduces bias.
Abstract
Recent studies show that neural retrievers often display source bias, favoring passages generated by LLMs over human-written ones, even when both are semantically similar. This bias has been considered an inherent flaw of retrievers, raising concerns about the fairness and reliability of modern information access systems. Our work challenges this view by showing that source bias stems from supervision in retrieval datasets rather than the models themselves. We found that non-semantic differences, like fluency and term specificity, exist between positive and negative documents, mirroring differences between LLM and human texts. In the embedding space, the bias direction from negatives to positives aligns with the direction from human-written to LLM-generated texts. We theoretically show that retrievers inevitably absorb the artifact imbalances in the training data during contrastive…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
