The Slow Deterioration of the Generalization Error of the Random Feature   Model

Chao Ma; Lei Wu; Weinan E

arXiv:2008.05621·cs.LG·August 14, 2020·5 cites

The Slow Deterioration of the Generalization Error of the Random Feature Model

Chao Ma, Lei Wu, Weinan E

PDF

Open Access

TL;DR

This paper investigates how the generalization error in the random feature model slowly worsens near the critical parameter regime, revealing a self-correction mechanism that allows early stopping for better generalization.

Contribution

It provides a theoretical and experimental analysis of the dynamic behavior of gradient descent in the random feature model, highlighting a self-correction mechanism for generalization error.

Findings

01

Large generalization gap occurs near the critical regime.

02

Small eigenvalues slow down the development of the generalization gap.

03

Early stopping can exploit the self-correction to improve generalization.

Abstract

The random feature model exhibits a kind of resonance behavior when the number of parameters is close to the training sample size. This behavior is characterized by the appearance of large generalization gap, and is due to the occurrence of very small eigenvalues for the associated Gram matrix. In this paper, we examine the dynamic behavior of the gradient descent algorithm in this regime. We show, both theoretically and experimentally, that there is a dynamic self-correction mechanism at work: The larger the eventual generalization gap, the slower it develops, both because of the small eigenvalues. This gives us ample time to stop the training process and obtain solutions with good generalization property.

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsRandom Matrices and Applications · Face and Expression Recognition · Bayesian Methods and Mixture Models