Position: Editing Large Language Models Poses Serious Safety Risks

Paul Youssef; Zhixue Zhao; Daniel Braun; J\"org Schl\"otterer; Christin Seifert

arXiv:2502.02958·cs.CL·June 18, 2025

Position: Editing Large Language Models Poses Serious Safety Risks

Paul Youssef, Zhixue Zhao, Daniel Braun, J\"org Schl\"otterer, Christin Seifert

PDF

Open Access 1 Video

TL;DR

This paper highlights the overlooked safety risks of knowledge editing in large language models, emphasizing potential malicious uses, ecosystem vulnerabilities, and the need for countermeasures and security measures.

Contribution

It identifies safety concerns related to knowledge editing in LLMs, discusses malicious use cases, ecosystem vulnerabilities, and calls for research on tamper-resistant models and security.

Findings

01

Knowledge editing tools are accessible and efficient, enabling malicious use.

02

Malicious actors can easily adapt KE techniques for harmful purposes.

03

Current AI ecosystem lacks safeguards for model updates and verification.

Abstract

Large Language Models (LLMs) contain large amounts of facts about the world. These facts can become outdated over time, which has led to the development of knowledge editing methods (KEs) that can change specific facts in LLMs with limited side effects. This position paper argues that editing LLMs poses serious safety risks that have been largely overlooked. First, we note the fact that KEs are widely available, computationally inexpensive, highly performant, and stealthy makes them an attractive tool for malicious actors. Second, we discuss malicious use cases of KEs, showing how KEs can be easily adapted for a variety of malicious purposes. Third, we highlight vulnerabilities in the AI ecosystem that allow unrestricted uploading and downloading of updated models without verification. Fourth, we argue that a lack of social and institutional awareness exacerbates this risk, and discuss…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

Position: Editing Large Language Models Poses Serious Safety Risks· slideslive

Taxonomy

TopicsSoftware Reliability and Analysis Research · Topic Modeling · Natural Language Processing Techniques