A Unified View of Regularized Dual Averaging and Mirror Descent with Implicit Updates
H. Brendan McMahan

TL;DR
This paper unifies and analyzes several online convex optimization algorithms, revealing their equivalence and providing improved regret bounds, while also extending their applicability to more general objectives and implicit updates.
Contribution
It proves the equivalence of FTRL-Proximal, RDA, and composite mirror descent, and offers a unified analysis with improved regret bounds and extensions to implicit updates.
Findings
FTRL-Proximal outperforms FOBOS and RDA on real datasets.
Unified analysis yields regret bounds matching or surpassing previous results.
Implicit updates extend the algorithms' applicability to more general settings.
Abstract
We study three families of online convex optimization algorithms: follow-the-proximally-regularized-leader (FTRL-Proximal), regularized dual averaging (RDA), and composite-objective mirror descent. We first prove equivalence theorems that show all of these algorithms are instantiations of a general FTRL update. This provides theoretical insight on previous experimental observations. In particular, even though the FOBOS composite mirror descent algorithm handles L1 regularization explicitly, it has been observed that RDA is even more effective at producing sparsity. Our results demonstrate that FOBOS uses subgradient approximations to the L1 penalty from previous rounds, leading to less sparsity than RDA, which handles the cumulative penalty in closed form. The FTRL-Proximal algorithm can be seen as a hybrid of these two, and outperforms both on a large, real-world dataset. Our second…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsSparse and Compressive Sensing Techniques · Numerical methods in inverse problems · Model Reduction and Neural Networks
