Reverse-Engineering Model Editing on Language Models

Zhiyu Sun; Minrui Luo; Yu Wang; Zhili Chen; Tianxing He

arXiv:2602.10134·cs.CR·May 19, 2026

Reverse-Engineering Model Editing on Language Models

Zhiyu Sun, Minrui Luo, Yu Wang, Zhili Chen, Tianxing He

PDF

1 Repo

TL;DR

This paper uncovers a vulnerability in model editing methods for large language models, showing that parameter updates can leak sensitive edited data, and proposes attacks and defenses to address this issue.

Contribution

It introduces KSTER, a reverse-engineering attack exploiting the low-rank structure of updates, and proposes subspace camouflage as a mitigation strategy.

Findings

01

High success rate in recovering edited data using KSTER

02

Theoretical analysis of update matrix row space as a fingerprint

03

Subspace camouflage reduces reconstruction risk effectively

Abstract

Large language models (LLMs) are pretrained on corpora containing trillions of tokens and, therefore, inevitably memorize sensitive information. Locate-then-edit methods, as a mainstream paradigm of model editing, offer a promising solution by modifying model parameters without retraining. However, in this work, we reveal a critical vulnerability of this paradigm: the parameter updates inadvertently serve as a side channel, enabling attackers to recover the edited data. We propose a two-stage reverse-engineering attack named \textit{KSTER} (\textbf{K}ey\textbf{S}paceRecons\textbf{T}ruction-then-\textbf{E}ntropy\textbf{R}eduction) that leverages the low-rank structure of these updates. First, we theoretically show that the row space of the update matrix encodes a ``fingerprint" of the edited subjects, enabling accurate subject recovery via spectral analysis. Second, we introduce an…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

reanatom/EditingAttack
github

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsAdversarial Robustness in Machine Learning · Topic Modeling · Advanced Graph Neural Networks