Generalization Bounds of Stochastic Gradient Descent in Homogeneous Neural Networks
Wenquan Ma, Yang Sui, Jiaye Teng, Bohan Wang, Jing Xu, Jingqin Yang

TL;DR
This paper establishes new generalization bounds for homogeneous neural networks trained with stochastic gradient descent, showing that slower stepsize decay rates are possible, which better align with practical training scenarios.
Contribution
It proves that homogeneous neural networks allow for a slower stepsize decay of order 1/a0 extbackslash sqrt{t}a0, extending stability-based generalization bounds beyond previous constraints.
Findings
Slower stepsize decay 1/a0 extbackslash sqrt{t}a0 is sufficient for generalization.
Bounds are applicable to ReLU and LeakyReLU networks.
Theoretical extension to non-Lipschitz regimes.
Abstract
Algorithmic stability is among the most potent techniques in generalization analysis. However, its derivation usually requires a stepsize under non-convex training regimes, where denotes iterations. This rigid decay of the stepsize potentially impedes optimization and may not align with practical scenarios. In this paper, we derive the generalization bounds under the homogeneous neural network regimes, proving that this regime enables slower stepsize decay of order under mild assumptions. We further extend the theoretical results from several aspects, e.g., non-Lipschitz regimes. This finding is broadly applicable, as homogeneous neural networks encompass fully-connected and convolutional neural networks with ReLU and LeakyReLU activations.
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsStochastic Gradient Optimization Techniques · Model Reduction and Neural Networks · Neural Networks and Applications
