Implicit Regularization of Bregman Proximal Point Algorithm and Mirror   Descent on Separable Data

Yan Li; Caleb Ju; Ethan X. Fang; Tuo Zhao

arXiv:2108.06808·cs.LG·August 28, 2023

Implicit Regularization of Bregman Proximal Point Algorithm and Mirror Descent on Separable Data

Yan Li, Caleb Ju, Ethan X. Fang, Tuo Zhao

PDF

Open Access

TL;DR

This paper investigates the implicit regularization effects of the Bregman proximal point algorithm and mirror descent on learning linear classifiers with separable data, providing theoretical bounds on the margin and demonstrating the influence of divergence choice.

Contribution

It offers the first theoretical analysis linking Bregman divergence to classifier margin bounds and extends these insights to mirror descent, supported by numerical experiments.

Findings

01

Margin lower bound depends on the condition number of the divergence

02

Dependence on the condition number is shown to be tight

03

Numerical experiments validate theoretical predictions

Abstract

Bregman proximal point algorithm (BPPA) has witnessed emerging machine learning applications, yet its theoretical understanding has been largely unexplored. We study the computational properties of BPPA through learning linear classifiers with separable data, and demonstrate provable algorithmic regularization of BPPA. For any BPPA instantiated with a fixed Bregman divergence, we provide a lower bound of the margin obtained by BPPA with respect to an arbitrarily chosen norm. The obtained margin lower bound differs from the maximal margin by a multiplicative factor, which inversely depends on the condition number of the distance-generating function measured in the dual norm. We show that the dependence on the condition number is tight, thus demonstrating the importance of divergence in affecting the quality of the learned classifiers. We then extend our findings to mirror descent, for…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsSparse and Compressive Sensing Techniques · Stochastic Gradient Optimization Techniques · Statistical Mechanics and Entropy