From Logistic Regression to the Perceptron Algorithm: Exploring Gradient Descent with Large Step Sizes
Alexander Tyurin

TL;DR
This paper investigates the behavior of logistic regression with gradient descent using large step sizes, revealing its connection to the perceptron algorithm, and proposes a new method with improved convergence guarantees.
Contribution
It uncovers the link between large step size logistic regression and the perceptron algorithm, analyzes convergence behavior, and introduces a normalized method with better theoretical guarantees.
Findings
LR+GD with large step sizes reduces to perceptron algorithm
Larger step sizes lead to faster convergence despite higher logistic loss
Proposed normalized LR+GD has improved iteration complexity guarantees
Abstract
We focus on the classification problem with a separable dataset, one of the most important and classical problems from machine learning. The standard approach to this task is logistic regression with gradient descent (LR+GD). Recent studies have observed that LR+GD can find a solution with arbitrarily large step sizes, defying conventional optimization theory. Our work investigates this phenomenon and makes three interconnected key observations about LR+GD with large step sizes. First, we find a remarkably simple explanation of why LR+GD with large step sizes solves the classification problem: LR+GD reduces to a batch version of the celebrated perceptron algorithm when the step size Second, we observe that larger step sizes lead LR+GD to higher logistic losses when it tends to the perceptron algorithm, but larger step sizes also lead to faster convergence to a…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsNeural Networks and Applications
MethodsFocus · Logistic Regression
