Why Does New Knowledge Create Messy Ripple Effects in LLMs?
Jiaxin Qin, Zixuan Zhang, Manling Li, Pengfei Yu, Heng Ji

TL;DR
This paper investigates why knowledge editing in language models often causes unpredictable ripple effects, introducing GradSim as an effective indicator to predict and understand these ripple phenomena.
Contribution
The paper identifies GradSim as a key indicator for understanding and predicting ripple effects in knowledge editing of language models, supported by extensive analysis.
Findings
GradSim correlates strongly with ripple effect performance.
Low GradSim is associated with failure cases like Negation and Multi-Lingual.
GradSim effectively predicts when knowledge ripples occur.
Abstract
Extensive previous research has focused on post-training knowledge editing (KE) for language models (LMs) to ensure that knowledge remains accurate and up-to-date. One desired property and open question in KE is to let edited LMs correctly handle ripple effects, where LM is expected to answer its logically related knowledge accurately. In this paper, we answer the question of why most KE methods still create messy ripple effects. We conduct extensive analysis and identify a salient indicator, GradSim, that effectively reveals when and why updated knowledge ripples in LMs. GradSim is computed by the cosine similarity between gradients of the original fact and its related knowledge. We observe a strong positive correlation between ripple effect performance and GradSim across different LMs, KE methods, and evaluation metrics. Further investigations into three counter-intuitive failure…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
Taxonomy
TopicsFinancial Distress and Bankruptcy Prediction · Intellectual Capital and Performance Analysis
