Representation Shattering in Transformers: A Synthetic Study with Knowledge Editing
Kento Nishi, Rahul Ramesh, Maya Okawa, Mikail Khona, Hidenori Tanaka, Ekdeep Singh Lubana

TL;DR
This paper introduces a synthetic task to study how knowledge editing in Transformers causes widespread representation disruption, called 'representation shattering,' which impairs factual recall and reasoning, supported by experiments on synthetic and natural models.
Contribution
It presents a novel synthetic framework to analyze the mechanistic effects of knowledge editing, revealing how targeted edits can cause broad structural distortions in model representations.
Findings
Knowledge editing can cause widespread representation distortions.
Representation shattering degrades factual recall and reasoning.
Findings are validated on both synthetic and natural models.
Abstract
Knowledge Editing (KE) algorithms alter models' weights to perform targeted updates to incorrect, outdated, or otherwise unwanted factual associations. However, recent work has shown that applying KE can adversely affect models' broader factual recall accuracy and diminish their reasoning abilities. Although these studies give insights into the potential harms of KE algorithms, e.g., performance evaluations on benchmarks, little is understood about why such destructive failures occur. Motivated by this, we define a novel synthetic task in which a Transformer is trained from scratch to internalize a "structured" knowledge graph. The structure enforces relationships between entities of the graph, such that editing a factual association has "trickling effects" on other entities (e.g., altering X's parent is Y to Z affects who X's siblings' parent is). Through evaluations of edited models…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
Taxonomy
TopicsSemantic Web and Ontologies · Reinforcement Learning in Robotics · Image Processing and 3D Reconstruction
MethodsAttention Is All You Need · LLaMA · Mamba: Linear-Time Sequence Modeling with Selective State Spaces · Linear Layer · Layer Normalization · Residual Connection · Position-Wise Feed-Forward Layer · Dense Connections · Softmax · Multi-Head Attention
