From Logistic Regression to the Perceptron Algorithm: Exploring Gradient   Descent with Large Step Sizes

Alexander Tyurin

arXiv:2412.08424·cs.LG·December 12, 2024

From Logistic Regression to the Perceptron Algorithm: Exploring Gradient Descent with Large Step Sizes

Alexander Tyurin

PDF

Open Access

TL;DR

This paper investigates the behavior of logistic regression with gradient descent using large step sizes, revealing its connection to the perceptron algorithm, and proposes a new method with improved convergence guarantees.

Contribution

It uncovers the link between large step size logistic regression and the perceptron algorithm, analyzes convergence behavior, and introduces a normalized method with better theoretical guarantees.

Findings

01

LR+GD with large step sizes reduces to perceptron algorithm

02

Larger step sizes lead to faster convergence despite higher logistic loss

03

Proposed normalized LR+GD has improved iteration complexity guarantees

Abstract

We focus on the classification problem with a separable dataset, one of the most important and classical problems from machine learning. The standard approach to this task is logistic regression with gradient descent (LR+GD). Recent studies have observed that LR+GD can find a solution with arbitrarily large step sizes, defying conventional optimization theory. Our work investigates this phenomenon and makes three interconnected key observations about LR+GD with large step sizes. First, we find a remarkably simple explanation of why LR+GD with large step sizes solves the classification problem: LR+GD reduces to a batch version of the celebrated perceptron algorithm when the step size $γ \to \infty.$ Second, we observe that larger step sizes lead LR+GD to higher logistic losses when it tends to the perceptron algorithm, but larger step sizes also lead to faster convergence to a…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsNeural Networks and Applications

MethodsFocus · Logistic Regression