Rethinking Benign Overfitting in Two-Layer Neural Networks
Ruichen Xu, Kexin Chen

TL;DR
This paper refines the feature-noise data model to include class-dependent noise, revealing that neural networks can utilize data noise to improve classification in long-tailed distributions, supported by theoretical analysis and experiments.
Contribution
It introduces a refined heterogenous noise model and analyzes neural network training dynamics, showing how data noise can enhance generalization in long-tailed data.
Findings
Neural networks leverage data noise to improve classification accuracy.
Test loss bounds are established for the refined noise model.
Experimental results validate the theoretical insights.
Abstract
Recent theoretical studies (Kou et al., 2023; Cao et al., 2022) have revealed a sharp phase transition from benign to harmful overfitting when the noise-to-feature ratio exceeds a threshold-a situation common in long-tailed data distributions where atypical data is prevalent. However, harmful overfitting rarely happens in overparameterized neural networks. Further experimental results suggested that memorization is necessary for achieving near-optimal generalization error in long-tailed data distributions (Feldman & Zhang, 2020). We argue that this discrepancy between theoretical predictions and empirical observations arises because previous feature-noise data models overlook the heterogeneous nature of noise across different data classes. In this paper, we refine the feature-noise data model by incorporating class-dependent heterogeneous noise and re-examine the overfitting phenomenon…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsNeural Networks and Applications
