Can Knowledge Editing Really Correct Hallucinations?

Baixiang Huang; Canyu Chen; Xiongxiao Xu; Ali Payani; Kai Shu

arXiv:2410.16251·cs.CL·March 4, 2025

Can Knowledge Editing Really Correct Hallucinations?

Baixiang Huang, Canyu Chen, Xiongxiao Xu, Ali Payani, Kai Shu

PDF

Open Access 1 Repo 1 Datasets 1 Video 3 Reviews

TL;DR

This paper introduces HalluEditBench, a comprehensive benchmark for evaluating knowledge editing methods in correcting hallucinations in large language models, highlighting their strengths and limitations across multiple dimensions.

Contribution

It constructs a large-scale hallucination dataset and provides a holistic evaluation framework for knowledge editing methods, addressing a key gap in assessing their effectiveness.

Findings

01

Knowledge editing methods vary in efficacy across different dimensions.

02

The benchmark reveals limitations in current methods' generalization and robustness.

03

Insights from the evaluation can guide future improvements in knowledge editing.

Abstract

Large Language Models (LLMs) suffer from hallucinations, referring to the non-factual information in generated content, despite their superior capacities across tasks. Meanwhile, knowledge editing has been developed as a new popular paradigm to correct erroneous factual knowledge encoded in LLMs with the advantage of avoiding retraining from scratch. However, a common issue of existing evaluation datasets for knowledge editing is that they do not ensure that LLMs actually generate hallucinated answers to the evaluation questions before editing. When LLMs are evaluated on such datasets after being edited by different techniques, it is hard to directly adopt the performance to assess the effectiveness of different knowledge editing methods in correcting hallucinations. Thus, the fundamental question remains insufficiently validated: Can knowledge editing really correct hallucinations in…

Peer Reviews

Decision·ICLR 2025 Poster

Reviewer 01Rating 5Confidence 5

Strengths

The paper offers a foundational contribution to the study of hallucination correction in LLMs, posing a critical question that has been overlooked in the field: Can knowledge editing effectively address LLM hallucinations? The authors’ approach demonstrates originality by establishing five distinct evaluation facets (Efficacy, Generalization, Portability, Locality, and Robustness), each of which extends beyond conventional measures to comprehensively assess knowledge editing impacts. These dime

Weaknesses

The paper could benefit from providing a more comprehensive description of the dataset construction process, which currently lacks sufficient detail. There is limited information on how domains and topics were sampled or selected from the knowledge source. The paper does not indicate whether the dataset predominantly features popular entities or long-tail entities (e.g., [Sun et al., 2023](https://arxiv.org/abs/2308.10168)), which may impact the generalizability of findings across different dist

Reviewer 02Rating 8Confidence 3

Strengths

A major strength of the paper is clearly that the results presented show that the effectiveness of knowledge editing techniques can vary greatly and is not necessarily as great on actual hallucinations of the LLMs as the experiments on the previous existing data sets show. The fact that the domain of the fact and the model itself have a decisive impact on the efficiency (and other scores) is an important point and should be taken into account. The article convincingly and clearly demonstrates

Weaknesses

I missed the authors answering the question from the title of the article more conclusively or at least clearly addressing it again in their conclusion, as this question can basically be answered in the negative (at least in part) based on the results presented. There is also no corresponding outlook. The summary is therefore somewhat abbreviated. Although the structure of the paper is good and clearly laid out, the section on related work seems like an appendix. It could be moved to before the

Reviewer 03Rating 6Confidence 3

Strengths

S1. This paper presents HalluEditBench, a benchmarking framework that evaluates knowledge editing methods against a dataset of over 6,000 hallucinations across 9 domains and 26 topics. This facilitates future research on leveraging knowledge editing techniques to mitigate hallucinations in LLMs. S2. This paper introduces some novel insights, such as: The current assessment of knowledge editing could be unreliable; The manifestation of hallucination depends on question design; Editing methods ma

Weaknesses

W1. The paper mentions that existing datasets for knowledge editing fail to verify whether knowledge editing methods can effectively correct hallucinations in large language models. The paper should provide at least one example to illustrate the shortcomings of other benchmarks in this regard. For instance, it could include statistics from other datasets or highlight differences in dataset construction methods compared to the approach proposed here, underscoring the innovations introduced by thi

Code & Models

Repositories

llm-editing/HalluEditBench
pytorchOfficial

Datasets

llm-editing/HalluEditBench
dataset· 153 dl
153 dl

Videos

Can Knowledge Editing Really Correct Hallucinations?· slideslive

Taxonomy

TopicsMental Health and Psychiatry · Hallucinations in medical conditions · Functional Brain Connectivity Studies