CLaRE-ty Amid Chaos: Quantifying Representational Entanglement to Predict Ripple Effects in LLM Editing

Manit Baser; Alperen Yildiz; Dinil Mon Divakaran; Mohan Gurusamy

arXiv:2603.19297·cs.LG·April 8, 2026

CLaRE-ty Amid Chaos: Quantifying Representational Entanglement to Predict Ripple Effects in LLM Editing

Manit Baser, Alperen Yildiz, Dinil Mon Divakaran, Mohan Gurusamy

PDF

1 Repo

TL;DR

CLaRE is a lightweight, efficient technique that quantifies fact entanglement in LLMs to predict and analyze ripple effects of model edits, improving post-edit evaluation and safety.

Contribution

Introduces CLaRE, a novel, fast, and resource-efficient method to identify potential ripple effects in LLMs using representation-level entanglement analysis.

Findings

01

CLaRE achieves 62.2% better correlation with ripple effects than baselines.

02

CLaRE is 2.74 times faster and uses 2.85 times less GPU memory than previous methods.

03

The approach enables scalable analysis and improved preservation of factual knowledge in LLMs.

Abstract

The static knowledge representations of large language models (LLMs) inevitably become outdated or incorrect over time. While model-editing techniques offer a promising solution by modifying a model's factual associations, they often produce unpredictable ripple effects, which are unintended behavioral changes that propagate even to the hidden space. In this work, we introduce CLaRE, a lightweight representation-level technique to identify where these ripple effects may occur. Unlike prior gradient-based methods, CLaRE quantifies entanglement between facts using forward activations from a single intermediate layer, avoiding costly backward passes. To enable systematic study, we prepare and analyse a corpus of 11,427 facts drawn from three existing datasets. Using CLaRE, we compute large-scale entanglement graphs of this corpus for multiple models, capturing how local edits propagate…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

manitbaser/CLaRE
github

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.