Who's Who: Large Language Models Meet Knowledge Conflicts in Practice

Quang Hieu Pham; Hoang Ngo; Anh Tuan Luu; Dat Quoc Nguyen

arXiv:2410.15737·cs.CL·June 5, 2025

Who's Who: Large Language Models Meet Knowledge Conflicts in Practice

Quang Hieu Pham, Hoang Ngo, Anh Tuan Luu, Dat Quoc Nguyen

PDF

Open Access 1 Repo

TL;DR

This paper introduces WhoQA, a benchmark dataset to evaluate how large language models handle conflicting information during retrieval, revealing that conflicts significantly impair model performance.

Contribution

The paper presents WhoQA, a new dataset for analyzing LLM behavior in knowledge conflicts, highlighting the impact of conflicts on retrieval-augmented generation.

Findings

01

Knowledge conflicts degrade LLM performance in RAG.

02

Current LLMs lack effective conflict resolution strategies.

03

WhoQA provides a standardized way to evaluate conflict handling.

Abstract

Retrieval-augmented generation (RAG) methods are viable solutions for addressing the static memory limits of pre-trained language models. Nevertheless, encountering conflicting sources of information within the retrieval context is an inevitable practical challenge. In such situations, the language models are recommended to transparently inform users about the conflicts rather than autonomously deciding what to present based on their inherent biases. To analyze how current large language models (LLMs) align with our recommendation, we introduce WhoQA, a public benchmark dataset to examine model's behavior in knowledge conflict situations. We induce conflicts by asking about a common property among entities having the same name, resulting in questions with up to 8 distinctive answers. WhoQA evaluation set includes 5K questions across 13 Wikidata property types and 150K Wikipedia…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

vinairesearch/whoqa
noneOfficial

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsSemantic Web and Ontologies

MethodsRefunds@Expedia|||How do I get a full refund from Expedia? · Adam · Linear Layer · Dropout · Byte Pair Encoding · Layer Normalization · Residual Connection · Linear Warmup With Linear Decay · Attention Is All You Need · Dense Connections