Resolving Knowledge Conflicts in Large Language Models
Yike Wang, Shangbin Feng, Heng Wang, Weijia Shi, Vidhisha, Balachandran, Tianxing He, Yulia Tsvetkov

TL;DR
This paper evaluates how well large language models handle knowledge conflicts, introducing a framework to assess their abilities to identify, locate, and respond to conflicting information, and proposes instruction-based improvements.
Contribution
It provides a comprehensive evaluation framework for knowledge conflicts in LLMs and introduces new instruction-based methods to improve their conflict resolution capabilities.
Findings
LLMs excel at detecting conflicts but struggle to pinpoint specific conflicting information.
LLMs often produce inconsistent responses in conflicting scenarios.
Knowledge domain significantly impacts LLMs' ability to resolve conflicts.
Abstract
Large language models (LLMs) often encounter knowledge conflicts, scenarios where discrepancy arises between the internal parametric knowledge of LLMs and non-parametric information provided in the prompt context. In this work we ask what are the desiderata for LLMs when a knowledge conflict arises and whether existing LLMs fulfill them. We posit that LLMs should 1) identify knowledge conflicts, 2) pinpoint conflicting information segments, and 3) provide distinct answers or viewpoints in conflicting scenarios. To this end, we introduce an evaluation framework for simulating contextual knowledge conflicts and quantitatively evaluating to what extent LLMs achieve these goals. It includes diverse and complex situations of knowledge conflict, knowledge from diverse entities and domains, two synthetic conflict creation methods, and settings with progressively increasing difficulty to…
Peer Reviews
Decision·Submitted to ICLR 2024
1. This article breaks down the evaluation aspects of knowledge conflict issues in a fine-grained manner and proposes a reasonable idea that LLMs should not rely solely on either parametric or non-parametric information, but grant LLM users the agency to make informed decisions based on distinct answers. 2. For the three proposed tasks, this paper designed plenty of experiments for verification. The motivation is clear and the prompt templates are straightforward.
1. The experimental settings are not rigorous. The data sets corresponding to the three knowledge conflict tasks are generated according to several rules (entity substitution and shuffling), and then the proposed approaches (prompt templates) are strongly related to these artificial rules. That is my main concern: with those settings, the experiments in the paper might have limited value and provide limited insights. Besides, this paper seems to lack a connection to previous works in the field o
- The topic is an important open problem of handling knowledge conflicts in LLMs. - Writing is clear and well-presented. - Introduces a comprehensive evaluation framework with diverse, complex test cases
-Framework limited to word-level knowledge edits, more complex conflicts may be harder The hallucination is possible in LLM's answer. It seems that this is not well addressed in the paper.
1) The authors design a series of knowledge conflict tasks to measure the performance of existing LLMs to generate response based on conflicted knowledge 2)The developed framework is technically sound and easy to follow
The most shorting is the orginize of the whole article and detailed questions can be found in the following part
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsTopic Modeling · Natural Language Processing Techniques · Text Readability and Simplification
