ConflictBank: A Benchmark for Evaluating the Influence of Knowledge   Conflicts in LLM

Zhaochen Su; Jun Zhang; Xiaoye Qu; Tong Zhu; Yanshu Li; Jiashuo Sun,; Juntao Li; Min Zhang; Yu Cheng

arXiv:2408.12076·cs.CL·August 23, 2024

ConflictBank: A Benchmark for Evaluating the Influence of Knowledge Conflicts in LLM

Zhaochen Su, Jun Zhang, Xiaoye Qu, Tong Zhu, Yanshu Li, Jiashuo Sun,, Juntao Li, Min Zhang, Yu Cheng

PDF

Open Access 1 Repo

TL;DR

ConflictBank is a comprehensive benchmark designed to evaluate knowledge conflicts in large language models, addressing a critical source of hallucinations by analyzing conflicts in retrieved and encoded knowledge across multiple models.

Contribution

This paper introduces ConflictBank, the first large-scale benchmark for systematically assessing knowledge conflicts in LLMs from various aspects and sources.

Findings

01

Identified key conflict types and causes in LLMs

02

Analyzed conflict patterns across different model scales

03

Created over 7 million claim-evidence pairs for evaluation

Abstract

Large language models (LLMs) have achieved impressive advancements across numerous disciplines, yet the critical issue of knowledge conflicts, a major source of hallucinations, has rarely been studied. Only a few research explored the conflicts between the inherent knowledge of LLMs and the retrieved contextual knowledge. However, a thorough assessment of knowledge conflict in LLMs is still missing. Motivated by this research gap, we present ConflictBank, the first comprehensive benchmark developed to systematically evaluate knowledge conflicts from three aspects: (i) conflicts encountered in retrieved knowledge, (ii) conflicts within the models' encoded knowledge, and (iii) the interplay between these conflict forms. Our investigation delves into four model families and twelve LLM instances, meticulously analyzing conflicts stemming from misinformation, temporal discrepancies, and…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

zhaochen0110/conflictbank
pytorchOfficial

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsArtificial Intelligence in Law