Gradient Descent as a Perceptron Algorithm: Understanding Dynamics and Implicit Acceleration
Alexander Tyurin

TL;DR
This paper reinterprets gradient descent in neural network training as a perceptron algorithm, revealing insights into its dynamics and implicit acceleration, supported by theoretical analysis and numerical experiments.
Contribution
It establishes a reduction of gradient descent steps to perceptron algorithms for nonlinear models, providing a new perspective on optimization dynamics and implicit acceleration.
Findings
Nonlinearity in two-layer models can achieve faster iteration complexity $ ilde{O}( oot{2}rom d)$.
The reduction simplifies analysis of GD dynamics using classical linear algebra.
Theoretical insights are validated through extensive numerical experiments.
Abstract
Even for the gradient descent (GD) method applied to neural network training, understanding its optimization dynamics, including convergence rate, iterate trajectories, function value oscillations, and especially its implicit acceleration, remains a challenging problem. We analyze nonlinear models with the logistic loss and show that the steps of GD reduce to those of generalized perceptron algorithms (Rosenblatt, 1958), providing a new perspective on the dynamics. This reduction yields significantly simpler algorithmic steps, which we analyze using classical linear algebra tools. Using these tools, we demonstrate on a minimalistic example that the nonlinearity in a two-layer model can provably yield a faster iteration complexity compared to achieved by linear models, where is the number of features. This helps explain the optimization dynamics and…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsStochastic Gradient Optimization Techniques · Neural Networks and Applications · Neural Networks and Reservoir Computing
