# Accelerated Extra-Gradient Descent: A Novel Accelerated First-Order   Method

**Authors:** Jelena Diakonikolas, Lorenzo Orecchia

arXiv: 1706.04680 · 2018-02-13

## TL;DR

This paper introduces AXGD, a novel accelerated first-order method based on a predictor-corrector approach, achieving optimal convergence rates for smooth functions and demonstrating robustness and versatility compared to traditional methods like Nesterov's AGD.

## Contribution

The paper presents AXGD, a new accelerated method that differs from AGD, utilizing a predictor-corrector scheme inspired by Mirror-Prox and Extra-Gradient, with a novel primal-dual analysis and broad applicability.

## Key findings

- Achieves asymptotically optimal convergence rates for smooth functions.
- Matches Nesterov's method performance while showing increased robustness to noise.
- Extends to objectives with generalized smoothness and non-smooth Lipschitz properties.

## Abstract

We provide a novel accelerated first-order method that achieves the asymptotically optimal convergence rate for smooth functions in the first-order oracle model. To this day, Nesterov's Accelerated Gradient Descent (AGD) and variations thereof were the only methods achieving acceleration in this standard blackbox model. In contrast, our algorithm is significantly different from AGD, as it relies on a predictor-corrector approach similar to that used by Mirror-Prox [Nemirovski, 2004] and Extra-Gradient Descent [Korpelevich, 1977] in the solution of convex-concave saddle point problems. For this reason, we dub our algorithm Accelerated Extra-Gradient Descent (AXGD). Its construction is motivated by the discretization of an accelerated continuous-time dynamics [Krichene et al., 2015] using the classical method of implicit Euler discretization. Our analysis explicitly shows the effects of discretization through a conceptually novel primal-dual viewpoint. Moreover, we show that the method is quite general: it attains optimal convergence rates for other classes of objectives (e.g., those with generalized smoothness properties or that are non-smooth and Lipschitz-continuous) using the appropriate choices of step lengths. Finally, we present experiments showing that our algorithm matches the performance of Nesterov's method, while appearing more robust to noise in some cases.

## Full text

_Full body text omitted from this summary view._ Fetch the complete paper as Markdown: https://tomesphere.com/paper/1706.04680/full.md

## Figures

12 figures with captions in the complete paper: https://tomesphere.com/paper/1706.04680/full.md

## References

33 references — full list in the complete paper: https://tomesphere.com/paper/1706.04680/full.md

---
Source: https://tomesphere.com/paper/1706.04680