Algorithms of Inertial Mirror Descent in Convex Problems of Stochastic Optimization
Alexander Nazin

TL;DR
This paper introduces an inertial mirror descent method for convex stochastic optimization problems, extending classical mirror descent with a new approach inspired by the heavy ball method, and provides theoretical error bounds.
Contribution
It proposes a novel inertial mirror descent algorithm that does not require averaging, applicable to convex problems, with proven error bounds and a discrete implementation.
Findings
Inertial MD generalizes classical mirror descent.
The method achieves a proven upper bound on objective function error.
Discrete algorithm implementation is provided.
Abstract
The goal is to modify the known method of mirror descent (MD), proposed by A.S. Nemirovsky and D.B. Yudin in 1979. The paper shows the idea of a new, so-called inertial MD method with the example of a deterministic optimization problem in continuous time. In particular, in the Euclidean case, the heavy ball method by B.T. Polyak is realized. It is noted that the new method does not use additional averaging. A discrete algorithm of inertial MD is described. The theorem on the upper bound on the error in the objective function is proved.
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Workshop “Optimization and Statistical Learning”
April 10–14, 2017, Les Houches, France
**Algorithms of Inertial Mirror Descent
in Convex Problems of Stochastic Optimization111The full paper is accepted at Russian journal Automatika i Telemekhanika which would be translated as Automation and Remote Control. **
Alexander Nazin
(April 12, 2017)
Abstract
The goal is to modify the known method of mirror descent (MD), proposed by A.S. Nemirovsky and D.B. Yudin in 1979. The paper shows the idea of a new, so-called inertial MD method with the example of a deterministic optimization problem in continuous time. In particular, in the Euclidean case, the heavy ball method by B.T. Polyak is realized. It is noted that the new method does not use additional averaging. A discrete algorithm of inertial MD is described. The theorem on the upper bound on the error in the objective function is proved.
ICS RAS, Moscow, Russia
1. The idea of method of inertial mirror descent
Let be convex, differentiable function having a unique minimum point and its minimal value . Consider continuous algorithm which extends MDM that is
[TABLE]
Functional parameter in (2) is a convex, continuously differentiable function having conjugate function
[TABLE]
Let , , and for simplicity.
Remark 1
Under parameter in (2), algorithm (1)–(2) represents MDM (in continuous time) [1]; in particular, the identical map and lead to a standard gradient method
[TABLE]
Under and , algorithm (1)–(2) leads to continuous method of heavy ball (MHB) [9]
[TABLE]
- *
Further, we assume that differentiable parameter , and method (1)–(2) we call Method of Inertial Mirror Descent (MIDM).
Assume a solution , to system equations (1)–(2) exists.
Consider function
[TABLE]
attempting to find a candidate Lyapunov function.
Trajectory derivative to system (1)–(2) be
[TABLE]
where last inequality results from convexity . Now, integrating on interval with , we obtain
[TABLE]
where two last terms in RHS got by integrating in parts. Taking (3) into account, we continue (7):
[TABLE]
Therefore, it is reasonable to introduce the following constraints on patameter :
[TABLE]
leading to inequality
[TABLE]
Maximizing under constraints (9) we get
[TABLE]
The related (continuous) IMD algorithm
[TABLE]
proves upper bound
[TABLE]
2. Stochastic optimization problem
Consider minimization problem
[TABLE]
where loss function contains random variable with unknown distribution on space , — mathematical expectation, set — given convex compact in -dimension space, random function is convex a.s. on .
Let i.i.d sample be given where all have the same distribution on as . Introduce notation for stochastic subgradients
[TABLE]
such that ,
[TABLE]
The goal is in constructing and proving novel recursive MD algorithms meant for minimization (14) and using stochastic subgradients (15) at current points , .
3. Algorithm IMD. Main results.
Let be a norm in primal space , and be the related norm in dual space ; set is convex compact.
Assumption (L). Convex function is such that its -conjugate is continuously differential on with gradient satisfying Lipschitz condition
[TABLE]
where is positive constant being independent of .
Consider now the discrete time . Write a discrete version of algorithm IMD (11)–(12) using stochastic subgradients (15) instead of the gradients :
[TABLE]
Here function is defined by proxy-function via Legendre–Fenchel transformation, i.e.
[TABLE]
Remark 2
Equation (18) may be written as
[TABLE]
Since the vectors under each , equations (16)–(17) show that by induction.
Further, let sequences and are of view
[TABLE]
Then system equations (16)–(18) leads to the IMD algorithm:
[TABLE]
Theorem 1
Let be convex closed set in , and loss function satisfies the conditions of section 2, and, moreover,
[TABLE]
where constant . Let be proxy-function on with parameter from assumption (L), and let exists minimum point . Then for any estimate , defined by algorithm (22), (23) with stochastic subgradients (15) and sequence from (21) with arbitrary , satisfies inequality
[TABLE]
**
Corollary 1
If constant in Theorem 1 assumptions is such that and then
[TABLE]
In particular, one may get .
The reference list from the paper itself. Each links out to its DOI / PubMed record.
- 1[1] Nemirovskii, A.S. and Yudin, D.B., Problem Complexity and Method Efficiency in Optimization , Chichester: Wiley, 1983.
- 2[2] A. Ben-Tal, T. Margalit, A. Nemirovski. The ordered subsets mirror descent optimization method with applications to tomography. SIOPT 12(1), 79–108, 2001.
- 3[3] A. Beck, M. Teboulle. Mirror descent and nonlinear projected subgradient methods for convex optimization. Oper. Res. Lett. 31(3), 167–175, 2003.
- 4[4] Yu. Nesterov. Primal-dual subgradient methods for convex problems. Mathematical Programming , 2007. DOI: 10.1007/s 10107-007-0149-x.
- 5[5] A.B. Juditsky, A.V. Nazin, A.B. Tsybakov, and N. Vayatis. Recursive aggregation of estimators by the mirror descent algorithm with averaging. Problems of Information Transmission , 41(4):368–384, 2005.
- 6[6] Nemirovski A., Juditsky A., Lan G. and Shapiro A. Robust stochastic approximation approach to stochastic programming // SIAM J. Optim. 2009. V. 19. No. 4. P. 1574–1609.
- 7[7] Rockafellar R.T., Wets R.J.B. Variational Analysis. N.-Y.: Springer, 1998.
- 8[8] Polyak B.T. Some methods of speeding up the convergence of iteration methods // Zh. Vych. Mat., 4 , No. 5, 791 -803, 1964.
