TL;DR
This paper investigates how memorization manifests in transformer models through loss landscape curvature, proposing a weight editing method to reduce memorization while preserving core reasoning abilities.
Contribution
It introduces a curvature-based decomposition to identify and suppress memorized data in transformer weights, improving unlearning methods and analyzing effects on downstream tasks.
Findings
Weight editing reduces memorization more effectively than recent methods.
Fact retrieval and arithmetic tasks are negatively impacted by weight editing.
Open book reasoning remains largely unaffected.
Abstract
We characterize how memorization is represented in transformer models and show that it can be disentangled in the weights of both language models (LMs) and vision transformers (ViTs) using a decomposition based on the loss landscape curvature. This insight is based on prior theoretical and empirical work showing that the curvature for memorized training points is much sharper than non memorized, meaning ordering weight components from high to low curvature can reveal a distinction without explicit labels. This motivates a weight editing procedure that suppresses far more recitation of untargeted memorized data more effectively than a recent unlearning method (BalancedSubnet), while maintaining lower perplexity. Since the basis of curvature has a natural interpretation for shared structure in model weights, we analyze the editing procedure extensively on its effect on downstream tasks in…
Peer Reviews
Decision·Submitted to ICLR 2026
1. The paper raises an important question about how memorization and reasoning differ in weight-space geometry. 2. The use of curvature information (via K-FAC) for weight editing is technically sound and computationally efficient. 3. The idea of connecting curvature spectra with reasoning robustness is conceptually appealing. 4. The study is comprehensive, covering both vision and language domains, and connects curvature geometry to downstream behavioral changes. 5. The analysis of task brittlen
1. The central claim is correlational: curvature and memorization are shown to co-vary. No causal analysis shows that curvature drives memorization. 2. The interpretation of eigenvectors as memorization vs. generalization directions is speculative. No visualization or qualitative evidence showing low-curvature directions actually encodes memorized samples. 3. Only one LM and one ViT are tested, and the memorization benchmarks are simplistic (mainly factual recall). Broader datasets or real-world
- The work connects K-FAC to the semantic decomposition of model weights, being a novel application from its usage in optimisation. The semantic decomposition itself is shown to capture similar curvature-memorisation patterns in both language and vision, suggesting a a domain-agnostic property of deep networks. - Reframes memorization vs generalization as spectral structure in curvature, bridging sharpness-aware optimization and localization studies. - Unlike existing approaches like BSN which r
- Since the layer-wise K-FAC blocks ignore cross-layer effects, can you comment on how relevant such cross-layer correlations and coupling between attention weights and the weights of a layer's MLP would be? Is this a limitation of the proposed approach, or it is not relevant? - How do we know that the Fisher curvature will be approximately the same as the loss curvature? - I am unclear about the strength of the intriguing link between low curvature directions and memorisation, what is the t
- The paper is clearly written and easy to read. - The use of K-FAC to analyze memorization and generalization is novel to me. - The results that memorization aggregated over a population level is actually flatter than generalization direction are interesting.
1. The limitation is not well discussed. For example, in Table 5, BSN is able to preserve higher accuracy than K-FAC edit. Also, the method requires a sweep to find which layers to edit, which is not well acknowledged as a limitation I believe. Are there other relevant limitations? For example, how does the computational cost of the proposed method compare with BSN? 2. While Line 143-149 provides a discussion, there could have been more analysis on per-example curvature analyses and population
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
