Analyzing Memorization in Large Language Models through the Lens of   Model Attribution

Tarun Ram Menta; Susmit Agrawal; Chirag Agarwal

arXiv:2501.05078·cs.LG·January 10, 2025

Analyzing Memorization in Large Language Models through the Lens of Model Attribution

Tarun Ram Menta, Susmit Agrawal, Chirag Agarwal

PDF

Open Access 1 Repo 1 Video

TL;DR

This paper investigates how attention modules in different layers of large language models influence memorization and generalization, providing a theoretical framework and empirical validation to mitigate memorization risks.

Contribution

It introduces an architectural analysis of memorization in LLMs using attribution techniques, highlighting the role of deeper attention modules in memorization and offering mitigation strategies.

Findings

01

Deeper attention modules are mainly responsible for memorization.

02

Earlier layers are crucial for generalization and reasoning.

03

Interventions can reduce memorization while maintaining performance.

Abstract

Large Language Models (LLMs) are prevalent in modern applications but often memorize training data, leading to privacy breaches and copyright issues. Existing research has mainly focused on posthoc analyses, such as extracting memorized content or developing memorization metrics, without exploring the underlying architectural factors that contribute to memorization. In this work, we investigate memorization from an architectural lens by analyzing how attention modules at different layers impact its memorization and generalization performance. Using attribution techniques, we systematically intervene in the LLM architecture by bypassing attention modules at specific blocks while keeping other components like layer normalization and MLP transformations intact. We provide theorems analyzing our intervention mechanism from a mathematical view, bounding the difference in layer outputs with…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

aikyamlab/llm-memorization
jaxOfficial

Videos

Analyzing Memorization in Large Language Models through the Lens of Model Attribution· underline

Taxonomy

TopicsTopic Modeling · Natural Language Processing Techniques

MethodsSoftmax · Attention Is All You Need · Layer Normalization