Exploring Knowledge Conflicts for Faithful LLM Reasoning: Benchmark and Method
Tianzhe Zhao, Jiaoyan Chen, Shuxiu Zhang, Haiping Zhu, Qika Lin, Jun Liu

TL;DR
This paper introduces ConflictQA, a benchmark for evaluating LLM reasoning with conflicting knowledge sources, and proposes XoT, a framework to improve reasoning accuracy over heterogeneous evidence.
Contribution
It presents a new benchmark for cross-source knowledge conflicts and a novel explanation-based reasoning framework to enhance LLM faithfulness.
Findings
LLMs often fail to identify reliable evidence in conflicting knowledge scenarios.
LLMs tend to rely on either KG or textual evidence, leading to incorrect answers.
XoT improves reasoning accuracy over conflicting evidence sources.
Abstract
Large language models (LLMs) have achieved remarkable success across a wide range of applications especially when augmented by external knowledge through retrieval-augmented generation (RAG). Despite their widespread adoption, recent studies have shown that LLMs often struggle to perform faithful reasoning when conflicting knowledge is retrieved. However, existing work primarily focuses on conflicts between external knowledge and the parametric knowledge of LLMs, leaving conflicts across external knowledge largely unexplored. Meanwhile, modern RAG systems increasingly emphasize the integration of unstructured text and (semi-)structured data like knowledge graphs (KGs) to improve knowledge completeness and reasoning faithfulness. To address this gap, we introduce ConflictQA, a novel benchmark that systematically instantiates conflicts between textual evidence and KG evidence. Extensive…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
