Identifying Knowledge Editing Types in Large Language Models

Xiaopeng Li; Shasha Li; Shangwen Wang; Shezheng Song; Bin Ji; Huijun Liu; Jun Ma; Jie Yu

arXiv:2409.19663·cs.CL·May 27, 2025

Identifying Knowledge Editing Types in Large Language Models

Xiaopeng Li, Shasha Li, Shangwen Wang, Shezheng Song, Bin Ji, Huijun Liu, Jun Ma, Jie Yu

PDF

Open Access 1 Repo

TL;DR

This paper introduces KETI, a new task for identifying different types of knowledge edits in large language models, aiming to detect malicious modifications and prevent harmful content generation.

Contribution

It proposes KETIBench with five harmful and one benign edit types, and develops baseline models demonstrating effective identification of malicious LLM edits.

Findings

01

Baseline models achieve decent performance in identifying malicious edits.

02

Identification performance is independent of editing method reliability.

03

Models generalize across domains and unknown sources.

Abstract

Knowledge editing has emerged as an efficient technique for updating the knowledge of large language models (LLMs), attracting increasing attention in recent years. However, there is a lack of effective measures to prevent the malicious misuse of this technique, which could lead to harmful edits in LLMs. These malicious modifications could cause LLMs to generate toxic content, misleading users into inappropriate actions. In front of this risk, we introduce a new task, $K$ nowledge $E$ diting $T$ ype $I$ dentification (KETI), aimed at identifying different types of edits in LLMs, thereby providing timely alerts to users when encountering illicit edits. As part of this task, we propose KETIBench, which includes five types of harmful edits covering the most popular toxic types, as well as one benign factual edit. We develop five classical classification…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

xpq-tech/keti
pytorchOfficial

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsNatural Language Processing Techniques · Semantic Web and Ontologies · Topic Modeling

MethodsSoftmax · Attention Is All You Need