Initialization Matters: On the Benign Overfitting of Two-Layer ReLU CNN with Fully Trainable Layers
Shuning Shang, Xuran Meng, Yuan Cao, Difan Zou

TL;DR
This paper investigates how initialization scaling affects the training dynamics and generalization of fully trainable two-layer ReLU CNNs, providing theoretical bounds and insights into benign overfitting in practical neural networks.
Contribution
It extends the analysis of benign overfitting to fully trainable two-layer CNNs, revealing the impact of initialization scale on training behavior and generalization.
Findings
Large initialization scales lead to fixed-output-like training dynamics.
Small scales cause complex layer interactions and joint growth.
Sharp bounds on test errors identify conditions for benign overfitting.
Abstract
Benign overfitting refers to how over-parameterized neural networks can fit training data perfectly and generalize well to unseen data. While this has been widely investigated theoretically, existing works are limited to two-layer networks with fixed output layers, where only the hidden weights are trained. We extend the analysis to two-layer ReLU convolutional neural networks (CNNs) with fully trainable layers, which is closer to the practice. Our results show that the initialization scaling of the output layer is crucial to the training dynamics: large scales make the model training behave similarly to that with the fixed output, the hidden layer grows rapidly while the output layer remains largely unchanged; in contrast, small scales result in more complex layer interactions, the hidden layer initially grows to a specific ratio relative to the output layer, after which both layers…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsRobotics and Sensor-Based Localization
Methods*Communicated@Fast*How Do I Communicate to Expedia?
