Data Cartography for Detecting Memorization Hotspots and Guiding Data Interventions in Generative Models

Laksh Patel; Neel Shanbhag

arXiv:2509.00083·cs.LG·September 3, 2025

Data Cartography for Detecting Memorization Hotspots and Guiding Data Interventions in Generative Models

Laksh Patel, Neel Shanbhag

PDF

Open Access

TL;DR

This paper introduces a data-centric framework called GenDataCarto that identifies and mitigates memorization hotspots in generative models, reducing data leakage with minimal impact on performance.

Contribution

The paper presents a novel data cartography method that scores training samples for difficulty and memorization, guiding effective data pruning and weighting strategies.

Findings

01

Reduces synthetic canary extraction success by over 40% with 10% data pruning.

02

Increases validation perplexity by less than 0.5%.

03

Provides theoretical guarantees linking memorization scores to influence and generalization bounds.

Abstract

Modern generative models risk overfitting and unintentionally memorizing rare training examples, which can be extracted by adversaries or inflate benchmark performance. We propose Generative Data Cartography (GenDataCarto), a data-centric framework that assigns each pretraining sample a difficulty score (early-epoch loss) and a memorization score (frequency of ``forget events''), then partitions examples into four quadrants to guide targeted pruning and up-/down-weighting. We prove that our memorization score lower-bounds classical influence under smoothness assumptions and that down-weighting high-memorization hotspots provably decreases the generalization gap via uniform stability bounds. Empirically, GenDataCarto reduces synthetic canary extraction success by over 40\% at just 10\% data pruning, while increasing validation perplexity by less than 0.5\%. These results demonstrate that…

Tables1

Table 1. Table 1: Data Cartography Quadrants: Partitioning by difficulty ( d i d_{i} ) and memorization ( m i m_{i} ).

Quadrant	Condition	Interpretation
Stable–Easy	$d_{i} \leq τ_{d}, m_{i} \leq τ_{m}$	low risk, well-learned
Ambiguous–Hard	$d_{i} > τ_{d}, m_{i} \leq τ_{m}$	difficult, not memorized
Hotspot–Memorized	$d_{i} \leq τ_{d}, m_{i} > τ_{m}$	easy but over-memorized
Noisy–Outlier	$d_{i} > τ_{d}, m_{i} > τ_{m}$	hard and memorized

Equations32

∥ \nabla_{θ} ℓ_{θ} (x) - \nabla_{θ} ℓ_{θ^{'}} (x) ∥ \leq L ∥ θ - θ^{'} ∥, \forall θ, θ^{'}, x .

∥ \nabla_{θ} ℓ_{θ} (x) - \nabla_{θ} ℓ_{θ^{'}} (x) ∥ \leq L ∥ θ - θ^{'} ∥, \forall θ, θ^{'}, x .

ℓ_{α θ + (1 - α) θ^{'}} (x) \leq α ℓ_{θ} (x) + (1 - α) ℓ_{θ^{'}} (x), \forall α \in [0, 1] .

ℓ_{α θ + (1 - α) θ^{'}} (x) \leq α ℓ_{θ} (x) + (1 - α) ℓ_{θ^{'}} (x), \forall α \in [0, 1] .

L_{N} (θ) = \frac{1}{N} i = 1 \sum N ℓ_{θ} (x_{i}), ℓ_{θ} (x_{i}) = - lo g p_{θ} (x_{i}) .

L_{N} (θ) = \frac{1}{N} i = 1 \sum N ℓ_{θ} (x_{i}), ℓ_{θ} (x_{i}) = - lo g p_{θ} (x_{i}) .

θ^{(t + 1)} = θ^{(t)} - η_{t} \nabla_{θ} ℓ_{θ^{(t)}} (x_{i_{t}}), i_{t} \sim Uniform ({1, \dots, N}) .

θ^{(t + 1)} = θ^{(t)} - η_{t} \nabla_{θ} ℓ_{θ^{(t)}} (x_{i_{t}}), i_{t} \sim Uniform ({1, \dots, N}) .

L \in R^{T \times N}, L_{t, i} = ℓ_{θ^{(t)}} (x_{i}), t = 1, \dots, T, i = 1, \dots, N .

L \in R^{T \times N}, L_{t, i} = ℓ_{θ^{(t)}} (x_{i}), t = 1, \dots, T, i = 1, \dots, N .

Δ_{gen} (θ) = L (θ) - L_{N} (θ) .

Δ_{gen} (θ) = L (θ) - L_{N} (θ) .

\bigl{|}\ell_{A(\mathcal{D})}(z)-\ell_{A(\mathcal{D}^{\prime})}(z)\bigr{|}\;\leq\;\beta.

\bigl{|}\ell_{A(\mathcal{D})}(z)-\ell_{A(\mathcal{D}^{\prime})}(z)\bigr{|}\;\leq\;\beta.

\operatorname{Inf}(i)\;=\;\sum_{t=1}^{T}\bigl{\|}\nabla_{\theta}\ell_{\theta^{(t)}}(x_{i})\bigr{\|}^{2}.

\operatorname{Inf}(i)\;=\;\sum_{t=1}^{T}\bigl{\|}\nabla_{\theta}\ell_{\theta^{(t)}}(x_{i})\bigr{\|}^{2}.

d_{i} = \frac{1}{T _{e}} t = 1 \sum T_{e} L_{t, i} .

d_{i} = \frac{1}{T _{e}} t = 1 \sum T_{e} L_{t, i} .

F_{d} (τ) = \frac{1}{N} # {i ∣ d_{i} \leq τ}, τ_{d} = F_{d}^{- 1} (α_{d}),

F_{d} (τ) = \frac{1}{N} # {i ∣ d_{i} \leq τ}, τ_{d} = F_{d}^{- 1} (α_{d}),

L_{t, i} < ε and L_{t + 1, i} > ε .

L_{t, i} < ε and L_{t + 1, i} > ε .

m_{i}\;=\;\frac{1}{T-1}\sum_{t=1}^{T-1}\mathbbm{1}\!\bigl{[}\mathcal{L}_{t,i}<\varepsilon\;\wedge\;\mathcal{L}_{t+1,i}>\varepsilon\bigr{]},

m_{i}\;=\;\frac{1}{T-1}\sum_{t=1}^{T-1}\mathbbm{1}\!\bigl{[}\mathcal{L}_{t,i}<\varepsilon\;\wedge\;\mathcal{L}_{t+1,i}>\varepsilon\bigr{]},

\operatorname{\mathbb{E}}\bigl{[}\Delta_{\rm gen}\bigr{]}\;-\;\operatorname{\mathbb{E}}\bigl{[}\Delta_{\rm gen}^{\rm pruned}\bigr{]}\;\geq\;2\,\beta\,\Delta\alpha\,N_{\mathrm{hot}}\,.

\operatorname{\mathbb{E}}\bigl{[}\Delta_{\rm gen}\bigr{]}\;-\;\operatorname{\mathbb{E}}\bigl{[}\Delta_{\rm gen}^{\rm pruned}\bigr{]}\;\geq\;2\,\beta\,\Delta\alpha\,N_{\mathrm{hot}}\,.

m_{i}\;\geq\;c\,\frac{1}{T}\sum_{t=1}^{T}\bigl{\|}\nabla_{\theta}\ell_{\theta^{(t)}}(x_{i})\bigr{\|}^{2}\;-\;O(\eta).

m_{i}\;\geq\;c\,\frac{1}{T}\sum_{t=1}^{T}\bigl{\|}\nabla_{\theta}\ell_{\theta^{(t)}}(x_{i})\bigr{\|}^{2}\;-\;O(\eta).

Δ ℓ = ℓ_{θ^{(t + 1)}} (x_{i}) - ℓ_{θ^{(t)}} (x_{i}) > 0.

Δ ℓ = ℓ_{θ^{(t + 1)}} (x_{i}) - ℓ_{θ^{(t)}} (x_{i}) > 0.

Δ ℓ \leq - η ∥ \nabla_{θ} ℓ_{θ^{(t)}} (x_{i}) ∥^{2} + \frac{L η ^{2}}{2} ∥ \nabla_{θ} ℓ_{θ^{(t)}} (x_{i}) ∥^{2} .

Δ ℓ \leq - η ∥ \nabla_{θ} ℓ_{θ^{(t)}} (x_{i}) ∥^{2} + \frac{L η ^{2}}{2} ∥ \nabla_{θ} ℓ_{θ^{(t)}} (x_{i}) ∥^{2} .

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsAdversarial Robustness in Machine Learning · Artificial Intelligence in Games · Topic Modeling

Full text

Data Cartography for Detecting Memorization Hotspots and Guiding Data Interventions in Generative Models

Laksh Patel

Neel Shanbhag

Abstract

Modern generative models risk overfitting and unintentionally memorizing rare training examples, which can be extracted by adversaries or inflate benchmark performance. We propose Generative Data Cartography (GenDataCarto), a data-centric framework that assigns each pretraining sample a difficulty score (early-epoch loss) and a memorization score (frequency of “forget events”), then partitions examples into four quadrants to guide targeted pruning and up-/down-weighting. We prove that our memorization score lower-bounds classical influence under smoothness assumptions and that down-weighting high-memorization hotspots provably decreases the generalization gap via uniform stability bounds. Empirically, GenDataCarto reduces synthetic canary extraction success by over 40% at just 10% data pruning, while increasing validation perplexity by less than 0.5%. These results demonstrate that principled data interventions can dramatically mitigate leakage with minimal cost to generative performance.

Generative Models, Data Cartography, Memorization Detection, Privacy Preservation, Uniform Stability, Influence Functions, Data-Centric Interventions, Forget Events

1 Introduction

Generative models have become a cornerstone of modern AI research, achieving unprecedented performance on a wide range of tasks from text completion and code synthesis to image and audio generation. Landmark works such as GPT-3 demonstrated that scaling language models to hundreds of billions of parameters yields emergent capabilities in few-shot learning and knowledge representation (Brown et al., 2020). Diffusion models similarly revolutionized image synthesis by framing generation as a gradual denoising process (Ho et al., 2020; Nichol and Dhariwal, 2021). Despite these breakthroughs, the immense scale and heterogeneity of pretraining corpora—often scraped indiscriminately from the web—pose serious risks relating to privacy, security, and scientific integrity.

Risks of Memorization and Leakage.

Neural networks can unintentionally memorize exact copies of rare or unique training examples, which adversaries can later extract via black-box or white-box attacks (Carlini et al., 2021; Kuang et al., 2021; Song and Mittal, 2022). Such leakage has been demonstrated not only for text but also for images (Carlini et al., 2023; Hayes and Shokri, 2021) and graph data (Sun et al., 2021). Relatedly, membership inference attacks exploit subtle distributional cues to determine whether a particular sample was used during training (Shokri et al., 2017; Yeom et al., 2018; Choquette-Choo and Klimov, 2021). In practice, even large-scale datasets like The Pile contain private or copyrighted passages that can surface verbatim in model outputs (Gao et al., 2022).

Benchmark Contamination and Overestimated Performance.

Generative models are frequently evaluated on benchmarks whose content inadvertently overlaps with training corpora (Zimmermann et al., 2022). Studies have shown that benchmark leakage can artificially inflate zero-shot and few-shot performance metrics (Kandpal et al., 2023), undermining the validity of widely reported scaling laws (Kaplan et al., 2020) and hampering reproducibility.

Model-Centric versus Data-Centric Defenses.

Model-centric defenses—differentially private training (Abadi et al., 2016; Papernot et al., 2018), modified objectives , and post-hoc output filters (Dubiński et al., 2024)—often incur utility trade-offs and significant engineering complexity. By contrast, data-centric strategies have proven effective in supervised settings: dataset cartography uses early-epoch loss and training variance to identify difficult or noisy examples (Swayamdipta et al., 2020; Gao et al., 2021), while influence functions estimate each sample’s impact on model parameters (Koh and Liang, 2017; Pruthi et al., 2020). Yet these techniques have not been systematically adapted to the unsupervised, sequential objectives of generative pretraining.

Our Contributions.

To bridge this gap, we introduce Generative Data Cartography (GenDataCarto), a framework that maps each pretraining example into a two-dimensional space defined by:

•

Difficulty score $d_{i}$ : the mean per-sample loss over an initial burn-in period.

•

Memorization score $m_{i}$ : the normalized count of “forget events,” where a sample’s loss rises above a small threshold after earlier fitting.

We prove that $m_{i}$ lower-bounds per-sample influence under standard smoothness and convexity assumptions (Bousquet and Elisseeff, 2002; Koh and Liang, 2017), and derive a uniform-stability bound showing that down-weighting high- $m_{i}$ examples reduces the expected generalization gap in proportion to the total pruned weight (Bousquet and Elisseeff, 2002; Mukherjee and Zhou, 2006). Empirically, GenDataCarto achieves:

•

A $>40\%$ reduction in synthetic “canary” extraction success for LSTM pretraining.

•

A $30\%$ drop in GPT-2 memorization on Wikitext-103 at negligible perplexity cost.

By focusing on data dynamics rather than purely model internals, GenDataCarto offers a scalable, theoretically grounded toolkit for enhancing the safety and robustness of state-of-the-art generative models.

2 Preliminaries

Assumption 2.1 (Uniform Stability).

The training algorithm is $\beta$ –uniformly stable: for any two datasets differing in one example, the change in loss on any test point is at most $\beta$ (Bousquet and Elisseeff, 2002).

Assumption 2.2 (Smoothness).

Each per-sample loss $\ell_{\theta}(x)$ is $L$ –smooth in $\theta$ , i.e.

[TABLE]

Assumption 2.3 (Convexity).

Each loss $\ell_{\theta}(x)$ is convex in $\theta$ , i.e.

[TABLE]

We begin by fixing notation, stating our learning objectives, and recalling key notions from stability and influence theory.

2.1 Training Objective and Notation

Let $\mathcal{D}=\{x_{1},\dots,x_{N}\}\subset\mathcal{X}$ be the training set of $N$ i.i.d. examples drawn from an unknown population distribution $\mathbb{P}$ . We train a generative model $p_{\theta}$ with parameters $\theta\in\Theta$ by minimizing the empirical negative log-likelihood

[TABLE]

Let $\theta^{(0)}$ be the random initialization. We perform $T$ epochs of mini-batch stochastic gradient descent with (possibly time-varying) stepsizes $\{\eta_{t}\}$ , yielding iterates

[TABLE]

We record the epoch-sample loss matrix

[TABLE]

This matrix underlies our data-centric analysis.

2.2 Generalization and Stability

Define the population risk $\;L(\theta)=\operatorname{\mathbb{E}}_{x\sim\mathbb{P}}[\ell_{\theta}(x)]$ , and the generalization gap

[TABLE]

A standard tool for bounding $\Delta_{\rm gen}$ is uniform stability (Bousquet and Elisseeff, 2002).

Definition 2.4 (Uniform Stability).

An algorithm $A$ mapping datasets to parameters is $\beta$ –uniformly stable if, for any two training sets $\mathcal{D},\mathcal{D}^{\prime}$ differing in one example, and for all $z\in\mathcal{X}$ ,

[TABLE]

Under $\beta$ -stability, one shows $\operatorname{\mathbb{E}}[\Delta_{\rm gen}(A(\mathcal{D}))]\leq\beta$ and with high-probability bounds via McDiarmid’s inequality (McDiarmid, 1989).

2.3 Influence Functions

Influence functions estimate the effect of up-weighting one training point on the learned parameters or on predictions (Koh and Liang, 2017). For sufficiently smooth losses one may approximate the per-sample influence by the cumulative squared gradient norm:

[TABLE]

This quantity is costly to compute in deep models, motivating our more efficient proxy based on “forget events.”

—

3 Generative Data Cartography

We now introduce Generative Data Cartography, a method to map each training example into a two-dimensional plane of difficulty vs. memorization, enabling targeted data interventions.

3.1 Difficulty Score

Define a burn-in period $T_{e}<T$ . The difficulty score of $x_{i}$ is

[TABLE]

Intuitively, $d_{i}$ measures how hard $x_{i}$ is to fit during early training. We further examine its empirical distribution:

[TABLE]

where $\alpha_{d}\in(0,1)$ is a chosen percentile (e.g. 75%).

3.2 Memorization Score

Let $\varepsilon>0$ be a small threshold (e.g. a fraction above the minimum achievable loss). A forget event for $x_{i}$ between epochs $t$ and $t+1$ occurs if

[TABLE]

We define the memorization score

[TABLE]

so $m_{i}\in[0,1]$ captures the fraction of epochs in which $x_{i}$ is “rediscovered” after being forgotten. As with $d_{i}$ , let $\tau_{m}=F_{m}^{-1}(\alpha_{m})$ be the $\alpha_{m}$ -percentile of $\{m_{i}\}$ .

3.3 Quadrant Partitioning

Each example $x_{i}$ maps to the point $(d_{i},m_{i})$ . We partition into four regions via thresholds $\tau_{d},\tau_{m}$ :

3.4 Data-Centric Interventions

After labeling each $x_{i}$ with quadrant $Q_{i}\in\{0,1,2,3\}$ , we adjust the sampling distribution for the remaining $T-T_{e}$ epochs:

•

Up-sample Ambiguous–Hard (1): increase sampling probability by factor $\gamma>1$ to improve model robustness on rare but challenging patterns.

•

Down-weight Hotspot–Memorized (2): multiply loss contribution by $\alpha<1$ (or remove entirely) to mitigate over-memorization.

•

Remove Noisy–Outliers (3): optionally drop from $\mathcal{D}$ to eliminate corrupted or adversarial examples.

•

Stable–Easy (0): keep or lightly up-sample to reinforce core patterns.

3.5 Algorithmic Outline

—

4 Theoretical Guarantees

We now formalize two central theorems: (i) down-weighting memorization hotspots reduces generalization gap under stability, and (ii) our memorization score lower-bounds classical influence.

4.1 Generalization Improvement via Stability

Theorem 4.1 (Generalization–Stability Bound).

Under Assumption 2.1 ( $\beta$ –uniform stability), suppose we decrease sampling weight by $\Delta\alpha$ on each of the $N_{\mathrm{hot}}$ Hotspot–Memorized examples. Then the reduction in expected generalization gap satisfies

[TABLE]

Proof Sketch.

By uniform stability, up-weighting (or down-weighting) one example by $\delta$ changes the population loss by at most $\beta\delta$ . Pruning $N_{\rm hot}$ examples by total weight $\Delta\alpha$ thus lowers the gap by at least $2\beta\Delta\alpha N_{\rm hot}$ . ∎

4.2 Memorization Score as an Influence Proxy

Theorem 4.2 (Memorization–Influence Lower Bound).

Under standard $L$ -smoothness and convexity assumptions (Bousquet and Elisseeff, 2002; Koh and Liang, 2017), and using SGD step-size $\eta$ , there exists a constant $c>0$ such that for every example $x_{i}$ :

[TABLE]

Proof Sketch.

A forget event between epochs $t$ and $t+1$ requires the loss to increase by

[TABLE]

By $L$ -smoothness (Bousquet and Elisseeff, 2002), we have

[TABLE]

Rearranging shows each forget event lower-bounds the squared gradient norm up to $O(\eta)$ , and summing over $T$ epochs yields the stated result. ∎

*Remark 4.3**.*

Theorem 4.1 ensures that our memorization score $m_{i}$ identifies high-influence examples and down-weighting provably tightens the generalization gap. In practice, this translates to measurable reductions in canary extraction success and membership inference attacks.

4.3 Experimental Results

To validate the efficacy of Generative Data Cartography, we conduct two main experiments:

1. Synthetic Canary Extraction Test.

We pretrain a small LSTM language model (Hochreiter and Schmidhuber, 1997) on a synthetic corpus augmented with unique “canary” sequences. Using GenDataCarto, we compute difficulty ( $d_{i}$ ) and memorization ( $m_{i}$ ) scores for each example and prune the top 5% highest- $m_{i}$ samples. Under this intervention, the canary extraction success rate drops from 100% to 40%, a 60% reduction at only a 0.5% increase in perplexity.

2. GPT-2 Pretraining on Wikitext-103.

We train GPT-2 Small (Radford et al., 2019) for 3 epochs on the Wikitext-103 dataset (Merity et al., 2017), injecting two distinct canaries. Applying GenDataCarto with $\varepsilon=4.7055$ and $\tau_{m}=25\%$ , we down-weight hotspot samples by a factor of 0.5. This yields:

•

30% reduction in benchmark leakage (measured by recall of held-out validation sequences).

•

15% reduction in membership-inference AUC.

•

less than 1% perplexity increase, demonstrating minimal impact on model quality.

Figures 1 and 2 illustrate these trade-offs.

4.4 Implementation Details

Our public implementation integrates seamlessly with standard PyTorch training loops. Given per-sample losses, GenDataCarto adds only $O(N)$ overhead for score computation and incurs an $O(N\log N)$ sort for pruning decisions. All code, hyperparameter settings, and data processing scripts are provided in the supplementary material.

5 Impact Statement

Generative Data Cartography (GenDataCarto) advances the safety and reliability of large-scale generative models by providing a principled, data-centric toolkit for identifying and mitigating memorization and leakage risks. By surgically down-weighting or pruning high-memorization “hotspot” examples, our method reduces the chance that sensitive or proprietary content will be inadvertently regurgitated—protecting individuals’ privacy and respecting copyright. At the same time, GenDataCarto imposes only minimal utility cost (sub-percent perplexity increases in practice), ensuring that model quality remains high. Moreover, our stability-based theoretical guarantees transparently quantify the trade-offs between data removal and generalization, supporting responsible deployment in domains such as healthcare, finance, and legal text generation. Finally, by exposing structurally important or noisy samples in massive pretraining corpora, GenDataCarto empowers data custodians and policymakers to audit and curate datasets, fostering greater accountability and trust in AI systems.

Bibliography35

The reference list from the paper itself. Each links out to its DOI / PubMed record.

1Abadi et al. (2016) Abadi, M., Chu, A., Goodfellow, I., Mc Mahan, H. B., Mironov, I., Talwar, K., & Zhang, L. (2016). Deep learning with differential privacy. Proceedings of the 2016 ACM SIGSAC Conference on Computer and Communications Security , 308–318.
2Bengio et al. (2009) Bengio, Y., Louradour, J., Collobert, R., & Weston, J. (2009). Curriculum learning. Proceedings of the 26th International Conference on Machine Learning , 41–48.
3Brown et al. (2020) Brown, T. B., Mann, B., Ryder, N., Subbiah, M., Kaplan, J., Dhariwal, P., Neelakantan, A., Shyam, P., Sastry, G., Askell, A., et al. (2020). Language models are few-shot learners. Advances in Neural Information Processing Systems , 33, 1877–1901.
4Carlini et al. (2021) Carlini, N., Tramer, F., Wallace, E., Jagielski, M., Herbert-Voss, A., Lee, K., Roberts, A., Brown, T., Song, D., Erlingsson, U., et al. (2021). Extracting training data from large language models. USENIX Security Symposium .
5Carlini et al. (2023) Carlini, N., Liu, C., Kos, J., Zhang, C., Bair, T., Kosman, N., & Savage, S. (2023). Extracting training data from diffusion models. ar Xiv preprint ar Xiv:2302.07826 .
6Choquette-Choo and Klimov (2021) Choquette-Choo, C., & Klimov, O. (2021). Label-only membership inference attacks. NDSS .
7Dubiński et al. (2024) Dubiński, M., Tramer, F., & Carlini, N. (2024). Training data attribution for large language models. ar Xiv preprint ar Xiv:2403.06187 .
8Dodge et al. (2022) Dodge, J., Ilharco, G., Min, S., Gardner, M., et al. (2022). Documenting training data of foundation models. Neur IPS Datasets and Benchmarks .

TL;DR

Contribution

Findings

Abstract

Peer Reviews

Videos

Taxonomy

Data Cartography for Detecting Memorization Hotspots and Guiding Data Interventions in Generative Models

Abstract

1 Introduction

Risks of Memorization and Leakage.

Benchmark Contamination and Overestimated Performance.

Model-Centric versus Data-Centric Defenses.

Our Contributions.

2 Preliminaries

Assumption 2.1** (Uniform Stability).**

Assumption 2.2** (Smoothness).**

Assumption 2.3** (Convexity).**

2.1 Training Objective and Notation

2.2 Generalization and Stability

Definition 2.4** (Uniform Stability).**

2.3 Influence Functions

3 Generative Data Cartography

3.1 Difficulty Score

3.2 Memorization Score

3.3 Quadrant Partitioning

3.4 Data-Centric Interventions

3.5 Algorithmic Outline

4 Theoretical Guarantees

4.1 Generalization Improvement via Stability

Theorem 4.1** (Generalization–Stability Bound).**

Proof Sketch.

4.2 Memorization Score as an Influence Proxy

Theorem 4.2** (Memorization–Influence Lower Bound).**

Proof Sketch.

Remark 4.3*.*

4.3 Experimental Results

1. Synthetic Canary Extraction Test.

2. GPT-2 Pretraining on Wikitext-103.

4.4 Implementation Details

5 Impact Statement

Assumption 2.1 (Uniform Stability).

Assumption 2.2 (Smoothness).

Assumption 2.3 (Convexity).

Definition 2.4 (Uniform Stability).

Theorem 4.1 (Generalization–Stability Bound).

Theorem 4.2 (Memorization–Influence Lower Bound).

*Remark 4.3**.*