Entropic Risk-Averse Generalized Momentum Methods
Bugra Can, Mert G\"urb\"uzbalaban

TL;DR
This paper introduces a risk-averse framework for momentum methods in optimization, balancing convergence speed with the probability of suboptimality using entropic risk measures, and demonstrates improved practical performance.
Contribution
It develops a unified convergence theory for generalized momentum methods and proposes risk-averse parameter selection strategies based on entropic risk measures.
Findings
New convergence bounds for GMM, AGD, and HB methods.
Explicit risk bounds for suboptimality using entropic measures.
Enhanced performance of risk-averse methods on optimization tasks.
Abstract
In the context of first-order algorithms subject to random gradient noise, we study the trade-offs between the convergence rate (which quantifies how fast the initial conditions are forgotten) and the "risk" of suboptimality, i.e. deviations from the expected suboptimality. We focus on a general class of momentum methods (GMM) which recover popular methods such as gradient descent (GD), accelerated gradient descent (AGD), and heavy-ball (HB) method as special cases depending on the choice of GMM parameters. We use well-known risk measures "entropic risk" and "entropic value at risk" to quantify the risk of suboptimality. For strongly convex smooth minimization, we first obtain new convergence rate results for GMM with a unified theory that is also applicable to both AGD and HB, improving some of the existing results for HB. We then provide explicit bounds on the entropic risk and…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsSparse and Compressive Sensing Techniques · Stochastic Gradient Optimization Techniques · Statistical Methods and Inference
