The Statistical Complexity of Early-Stopped Mirror Descent
Tomas Va\v{s}kevi\v{c}ius, Varun Kanade, Patrick Rebeschini

TL;DR
This paper analyzes the statistical properties of early-stopped mirror descent algorithms, linking complexity measures to excess risk guarantees for linear models and kernel methods, and improves upon recent implicit regularization results.
Contribution
It establishes a novel connection between offset Rademacher complexities and mirror descent convergence, providing new excess risk bounds and simplifying proofs of existing results.
Findings
Provides excess risk guarantees based on offset complexities.
Recovers recent implicit regularization results with shorter proofs.
Shows potential improvements over existing bounds in certain settings.
Abstract
Recently there has been a surge of interest in understanding implicit regularization properties of iterative gradient-based optimization algorithms. In this paper, we study the statistical guarantees on the excess risk achieved by early-stopped unconstrained mirror descent algorithms applied to the unregularized empirical risk with the squared loss for linear models and kernel methods. By completing an inequality that characterizes convexity for the squared loss, we identify an intrinsic link between offset Rademacher complexities and potential-based convergence analysis of mirror descent methods. Our observation immediately yields excess risk guarantees for the path traced by the iterates of mirror descent in terms of offset complexities of certain function classes depending only on the choice of the mirror map, initialization point, step-size, and the number of iterations. We apply…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
Taxonomy
TopicsStochastic Gradient Optimization Techniques · Sparse and Compressive Sensing Techniques · Machine Learning and ELM
