Characterizing the Generalization Error of Random Feature Regression with Arbitrary Data-Augmentation

Lucas Morisset; Alain Durmus; Adrien Hardy

arXiv:2605.10290·stat.ML·May 12, 2026

Characterizing the Generalization Error of Random Feature Regression with Arbitrary Data-Augmentation

Lucas Morisset, Alain Durmus, Adrien Hardy

PDF

TL;DR

This paper provides a precise analysis of how data augmentation influences the generalization error in supervised regression, especially in high-dimensional settings with various network architectures.

Contribution

It offers a tight, population-based characterization of test error under arbitrary data augmentation schemes, including misspecified models and different network architectures.

Findings

01

Test error characterized in terms of population data and augmentation statistics.

02

Results valid for misspecified feature maps and networks with frozen or random last layers.

03

Asymptotic characterization is tight for Gaussian data.

Abstract

This paper aims at analyzing the regularization effect that data augmentation induces on supervised regression methods in the proportional regime, where the number of covariates grows proportionally to the number of samples. We provide a tight characterization of the test error, measured in mean squared error, in terms only of the population quantities of the true data, as well as first and second order statistics of the augmentation scheme. Our results are valid under misspecified feature maps, and for any network architecture where only the last readout layer is trained, and the rest of the network is either frozen or randomly initialized. We specify our results in the case of Gaussian data, and show that our asymptotic characterization is tight in this setting.

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.