Is Memorization Helpful or Harmful? Prior Information Sets the Threshold

Chen Cheng; Rina Foygel Barber

arXiv:2602.09405·stat.ML·February 11, 2026

Is Memorization Helpful or Harmful? Prior Information Sets the Threshold

Chen Cheng, Rina Foygel Barber

PDF

Open Access

TL;DR

This paper explores how prior information influences the relationship between training and generalization errors in overparameterized linear models, revealing conditions where memorization is either beneficial or detrimental based on noise thresholds.

Contribution

It provides explicit conditions linking prior distributions to the necessity of memorization for optimal generalization in Bayesian linear models.

Findings

01

Memorization is necessary when training error is near interpolation relative to noise.

02

Overfitting becomes harmful when training error exceeds noise levels.

03

Thresholds depend on Fisher information and prior variance parameters.

Abstract

We examine the connection between training error and generalization error for arbitrary estimating procedures, working in an overparameterized linear model under general priors in a Bayesian setup. We find determining factors inherent to the prior distribution $π$ , giving explicit conditions under which optimal generalization necessitates that the training error be (i) near interpolating relative to the noise size (i.e., memorization is necessary), or (ii) close to the noise level (i.e., overfitting is harmful). Remarkably, these phenomena occur when the noise reaches thresholds determined by the Fisher information and the variance parameters of the prior $π$ .

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsStatistical Methods and Inference · Stochastic Gradient Optimization Techniques · Bayesian Methods and Mixture Models