The Slow Deterioration of the Generalization Error of the Random Feature Model
Chao Ma, Lei Wu, Weinan E

TL;DR
This paper investigates how the generalization error in the random feature model slowly worsens near the critical parameter regime, revealing a self-correction mechanism that allows early stopping for better generalization.
Contribution
It provides a theoretical and experimental analysis of the dynamic behavior of gradient descent in the random feature model, highlighting a self-correction mechanism for generalization error.
Findings
Large generalization gap occurs near the critical regime.
Small eigenvalues slow down the development of the generalization gap.
Early stopping can exploit the self-correction to improve generalization.
Abstract
The random feature model exhibits a kind of resonance behavior when the number of parameters is close to the training sample size. This behavior is characterized by the appearance of large generalization gap, and is due to the occurrence of very small eigenvalues for the associated Gram matrix. In this paper, we examine the dynamic behavior of the gradient descent algorithm in this regime. We show, both theoretically and experimentally, that there is a dynamic self-correction mechanism at work: The larger the eventual generalization gap, the slower it develops, both because of the small eigenvalues. This gives us ample time to stop the training process and obtain solutions with good generalization property.
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsRandom Matrices and Applications · Face and Expression Recognition · Bayesian Methods and Mixture Models
