Harmful Overfitting in Sobolev Spaces

Kedar Karhadkar; Alexander Sietsema; Deanna Needell; Guido Montufar

arXiv:2602.00825·stat.ML·February 3, 2026

Harmful Overfitting in Sobolev Spaces

Kedar Karhadkar, Alexander Sietsema, Deanna Needell, Guido Montufar

PDF

Open Access

TL;DR

This paper investigates how functions in Sobolev spaces that perfectly fit noisy data can still generalize poorly, showing that norm-minimizing interpolators suffer from harmful overfitting even with large sample sizes.

Contribution

It extends the understanding of overfitting in Sobolev spaces for all p in [1, ∞), beyond the Hilbert space case, using geometric Sobolev inequalities.

Findings

01

Norm-minimizing interpolators exhibit persistent generalization error.

02

Harmful overfitting occurs despite increasing sample size.

03

Results apply to a broad range of Sobolev spaces, not just Hilbert spaces.

Abstract

Motivated by recent work on benign overfitting in overparameterized machine learning, we study the generalization behavior of functions in Sobolev spaces $W^{k, p} (R^{d})$ that perfectly fit a noisy training data set. Under assumptions of label noise and sufficient regularity in the data distribution, we show that approximately norm-minimizing interpolators, which are canonical solutions selected by smoothness bias, exhibit harmful overfitting: even as the training sample size $n \to \infty$ , the generalization error remains bounded below by a positive constant with high probability. Our results hold for arbitrary values of $p \in [1, \infty)$ , in contrast to prior results studying the Hilbert space case ( $p = 2$ ) using kernel methods. Our proof uses a geometric argument which identifies harmful neighborhoods of the training data using Sobolev inequalities.

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsMachine Learning and Data Classification · Stochastic Gradient Optimization Techniques · Statistical Methods and Inference