Implicit Regularization of Bregman Proximal Point Algorithm and Mirror Descent on Separable Data
Yan Li, Caleb Ju, Ethan X. Fang, Tuo Zhao

TL;DR
This paper investigates the implicit regularization effects of the Bregman proximal point algorithm and mirror descent on learning linear classifiers with separable data, providing theoretical bounds on the margin and demonstrating the influence of divergence choice.
Contribution
It offers the first theoretical analysis linking Bregman divergence to classifier margin bounds and extends these insights to mirror descent, supported by numerical experiments.
Findings
Margin lower bound depends on the condition number of the divergence
Dependence on the condition number is shown to be tight
Numerical experiments validate theoretical predictions
Abstract
Bregman proximal point algorithm (BPPA) has witnessed emerging machine learning applications, yet its theoretical understanding has been largely unexplored. We study the computational properties of BPPA through learning linear classifiers with separable data, and demonstrate provable algorithmic regularization of BPPA. For any BPPA instantiated with a fixed Bregman divergence, we provide a lower bound of the margin obtained by BPPA with respect to an arbitrarily chosen norm. The obtained margin lower bound differs from the maximal margin by a multiplicative factor, which inversely depends on the condition number of the distance-generating function measured in the dual norm. We show that the dependence on the condition number is tight, thus demonstrating the importance of divergence in affecting the quality of the learned classifiers. We then extend our findings to mirror descent, for…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsSparse and Compressive Sensing Techniques · Stochastic Gradient Optimization Techniques · Statistical Mechanics and Entropy
