A Mirror Descent Perspective of Smoothed Sign Descent

Shuyang Wang; Diego Klabjan

arXiv:2410.14158·cs.LG·October 21, 2024

A Mirror Descent Perspective of Smoothed Sign Descent

Shuyang Wang, Diego Klabjan

PDF

Open Access

TL;DR

This paper extends the mirror descent framework to analyze smoothed sign descent algorithms, revealing how tuning stability constants influences convergence and solution quality in overparameterized regression problems.

Contribution

It introduces a mirror map for smoothed sign descent, linking its dynamics to dual space and characterizing convergence as approximate KKT points.

Findings

01

Tuning the stability constant reduces KKT error.

02

The mirror map establishes equivalence to dual dynamics.

03

Convergence characterized as approximate KKT points.

Abstract

Recent work by Woodworth et al. (2020) shows that the optimization dynamics of gradient descent for overparameterized problems can be viewed as low-dimensional dual dynamics induced by a mirror map, explaining the implicit regularization phenomenon from the mirror descent perspective. However, the methodology does not apply to algorithms where update directions deviate from true gradients, such as ADAM. We use the mirror descent framework to study the dynamics of smoothed sign descent with a stability constant $ε$ for regression problems. We propose a mirror map that establishes equivalence to dual dynamics under some assumptions. By studying dual dynamics, we characterize the convergent solution as an approximate KKT point of minimizing a Bregman divergence style function, and show the benefit of tuning the stability constant $ε$ to reduce the KKT error.

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsStochastic Gradient Optimization Techniques · Gaussian Processes and Bayesian Inference · Sparse and Compressive Sensing Techniques