The role of optimization geometry in single neuron learning
Nicholas M. Boffi, Stephen Tu, and Jean-Jacques E. Slotine

TL;DR
This paper investigates how the choice of optimization geometry affects the generalization performance in learning models, analyzing simplified models and providing theoretical bounds supported by experiments.
Contribution
It introduces a theoretical analysis of pseudogradient methods for generalized linear models, revealing how optimization geometry influences out-of-sample performance.
Findings
Optimization geometry impacts generalization error bounds.
Theoretical bounds characterize the interplay between geometry and feature space.
Experimental results show improved performance with geometry-informed methods.
Abstract
Recent numerical experiments have demonstrated that the choice of optimization geometry used during training can impact generalization performance when learning expressive nonlinear model classes such as deep neural networks. These observations have important implications for modern deep learning but remain poorly understood due to the difficulty of the associated nonconvex optimization problem. Towards an understanding of this phenomenon, we analyze a family of pseudogradient methods for learning generalized linear models under the square loss - a simplified problem containing both nonlinearity in the model parameters and nonconvexity of the optimization which admits a single neuron as a special case. We prove non-asymptotic bounds on the generalization error that sharply characterize how the interplay between the optimization geometry and the feature space geometry sets the…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsSparse and Compressive Sensing Techniques · Neural Networks and Applications · Machine Learning and ELM
MethodsLinear Regression
