Adaptive Mirror Descent for Constrained Optimization
Anastasia Bayandina

TL;DR
This paper introduces an adaptive Mirror Descent method for constrained convex optimization that improves convergence rates and can generate dual solutions, especially effective for strongly convex problems.
Contribution
The paper proposes an adaptive stepsize Mirror Descent algorithm with enhanced convergence rates and dual solution generation capabilities for constrained convex optimization.
Findings
Improved convergence rate over fixed stepsize MD
Method generates dual solutions for certain constraints
Effective restart technique for strongly convex problems
Abstract
This paper seeks to address how to solve non-smooth convex and strongly convex optimization problems with functional constraints. The introduced Mirror Descent (MD) method with adaptive stepsizes is shown to have a better convergence rate than MD with fixed stepsizes due to the improved constant. For certain types of constraints, the method is proved to generate dual solution. For the strongly convex case, the restart technique is applied.
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Adaptive Mirror Descent
for Constrained Optimization
Anastasia Bayandina
Moscow Institute of Physics and Technology
Moscow, Russia
Email: [email protected]
Abstract
This paper seeks to address how to solve non-smooth convex and strongly convex optimization problems with functional constraints. The introduced Mirror Descent (MD) method with adaptive stepsizes is shown to have a better convergence rate than MD with fixed stepsizes due to the improved constant. For certain types of constraints, the method is proved to generate dual solution. For the strongly convex case, the ’restart’ technique is applied.
I Introduction
Optimizing non-smooth functions with constraints is attracting widespread interest in large-scale optimization and its applications [1], [2]. There are various methods of solving this kind of optimization problems. The examples of these methods are: bundle-level method [3], penalty method [4], [5], Lagrange multipliers method [6]. Among them, Mirror Descent (MD) [7], [8] is viewed as a simple method for non-smooth convex optimization.
In this paper, it is proposed to modify MD so that the stepsizes along with the rate of convergence are no more dependent on the global Lipschitz constant [10], but rather on the sizes of the gradients in current points. These sizes are averaged in some sense and substitute the Lipschitz constant. If the constraints can be represented as the maximum of convex functions, which often arises in applications with maximum of many scalar constraints, it is possible to build up the dual solution using the proposed method. The idea of restarts [11] is adopted to construct the algorithm in the case of strongly convex objective and constraints. Both proposed methods are optimal in terms of the lower bounds [7].
The paper is organized as follows: in Section II we state the problem and notation; in Section III we describe the MD algorithm with adaptive stepsizes and prove the convergence theorem for it; Section IV is focused on the strongly convex case with restarting MD algorithm and theoretical estimates of its convergence; finally, Section V is about duality of the proposed MD method.
II Preliminaries and Problem Statement
Let be the -dimensional vector space. Let be an arbitrary norm in and be the conjugate norm in :
[TABLE]
Let be a closed convex set. We consider the two convex functions and to be subdifferentiable and Lipschitz continuous, i.e.
[TABLE]
and the same goes for .
We focus on the problem expressed in the form
[TABLE]
[TABLE]
Denote to be the genuine solution of the problem (1), (2).
Assume that we are equipped with the first-order oracle, which given the point returns the values of and .
Consider to be a distance generating function (d.g.f) which is continuously differentiable and strongly convex, modulus 1, w.r.t. the norm , i.e.
[TABLE]
and assume that
[TABLE]
Suppose we are given a constant such that
[TABLE]
Note that if there is a set of optimal points , than we may assume that
[TABLE]
For all consider the corresponding Bregman divergence
[TABLE]
For all , define the proximal mapping operator
[TABLE]
We make the simplicity assumption, which means that is easily computable.
III Mirror Descent for Constrained Optimization
The following algorithm is proposed to solve the problem (1), (2).
Denote , .
We are going to adopt the following lemma [9].
Lemma 1
Let be some convex subdifferentiable function over the convex set . Let the sequence be defined by the update
[TABLE]
Then, for any
[TABLE]
Theorem 1
The point supplied by Algorithm 1 satisfies
[TABLE]
for the number of oracle calls equal to
[TABLE]
where is found from
[TABLE]
Proof:
By the definition of and the convexity of ,
[TABLE]
Using (3) and the definitions of the stepsizes, consider the summation
[TABLE]
Since for
[TABLE]
recalling (7), we get
[TABLE]
As long as the inequality is strict, the case of the empty is impossible.
For holds . Then, by the definition of and the convexity of ,
[TABLE]
∎
It is worth mentioning that the constant is somewhat the average of all subgradient norms in particular points instead of being the Lipschitz constant biggest possible over the set .
IV Restarting Mirror Descent
In this section we assume that and in the problem (1), (2) are -strongly convex on , i.e.
[TABLE]
and the same goes for .
Also the d.g.f is assumed to be bounded on the unit ball, that is,
[TABLE]
where is some dimension-dependent constant which in most setups asymptotically behaves as O\big{(}\log(n)\big{)} [9].
Suppose we are given a constant such that
[TABLE]
The following algorithm is proposed to solve the problem (1), (2) in the case of strong convexity [11].
At each iteration of the loop the algorithm performs the restart: it calls the procedure described in the previous section with some accuracy which becomes smaller for each next restart.
Denote by the numbers of oracle calls at each restart in Algorithm 2 and by the corresponding sets of indices.
Further for the sake of brevity we accept the following statement without proof.
Lemma 2
Suppose and are -strongly convex functions w.r.t. the norm and is the genuine solution of the problem (1), (2). Then if for some
[TABLE]
then
[TABLE]
Now we are ready to prove the following
Theorem 2
The point supplied by Algorithm 2 satisfies
[TABLE]
for the total number of oracle calls equal to
[TABLE]
where is found from
[TABLE]
Proof:
Observe [10] that the function defined in Algorithm 2 is 1-strongly convex w.r.t. the norm . The conjugate of this norm is . It means that at each restart the actual Lipschitz constants are . Then, by (9) and (10) at the end of the first restart we obtain
[TABLE]
which by Theorem 1 guarantees the -solution of the problem.
Further, by Lemma 2, after the th restart it holds that
[TABLE]
Due to the choice of the d.g.f. , the starting point of the th restart is and
[TABLE]
In that way we have justified the redefinition of the d.g.f. and the ’distance’ argument of the procedure.
After the th restart by the definition of and we obtain
[TABLE]
Thus, for the whole procedure considering the definition of
[TABLE]
∎
Note that due to Lemma 2 the argument converges to along with the function, which is a typical property of strongly convex optimization.
V Dual Problem Solution
Following [12], in this section we regard the problem of the type (1), (2) where the constraints appear in the form
[TABLE]
Consider the dual problem
[TABLE]
Denote to be the genuine solution of (15). Then, by the weak duality property [6] we have
[TABLE]
Assume that Slater’s condition holds, i.e. there exists such that . This ensures strong duality . It means that if the algorithm is able to generate the dual solution of the problem (1), (2) with (14), the accuracy of this solution can be estimated via the size of the duality gap .
As long as the constraints are of the form (14), we can define the function
[TABLE]
Theorem 3
Consider Algorithm 1 and define dual Lagrange multipliers as
[TABLE]
where
[TABLE]
Then, the point supplied by Algorithm 1 satisfies
[TABLE]
for the number of oracle calls equal to
[TABLE]
where is found from
[TABLE]
Proof:
Combining (7) and (8) together with (16) we obtain
[TABLE]
Recalling (17) and rearranging the terms,
[TABLE]
∎
VI Conclusion
We proved MD algorithm with adaptive stepsizes to achieve optimal rates in both convex and strongly convex cases with the improved Lipschitz constant. For the problems with constraints in the form of maximum of convex functions we showed the duality of the method. However, it still remains open whether it is possible to construct high probability bounds for adaptive steps in the case of stochastic oracle.
Acknowledgment
The author gratefully acknowledges the help and valuable discussion kindly provided by Dr. Gasnikov.
This research was funded by Russian Science Foundation (project 17-11-01027).
The reference list from the paper itself. Each links out to its DOI / PubMed record.
- 1[1] S. Shpirko and Yu. Nesterov, ”Primal-dual Subgradient Methods for Huge-scale Linear Conic Problem”, SIAM Journal on Optimization , no. 24, pp. 1444-1457, 2014.
- 2[2] A. Ben-Tal and A. Nemirovski, ”Robust Truss Topology Design via Semidefinite Programming”, in SIAM J. Optim. , vol. 7, no. 4, pp. 991-1016, Nov., 1997.
- 3[3] Yu. Nesterov, Introduction to Convex Optimization . Moscow, Russia: MCCME, 2010.
- 4[4] F. Vasilyev, Optimization Methods . Moscow, Russia: FP, 2002.
- 5[5] G. Lan, ”Gradient Sliding for Composite Optimization”, Math. Program. , vol.159, no.1-2, pp. 201-235, 2016.
- 6[6] S. Boyd and L. Vandenberghe, Convex Optimization . New York, NY: Cambridge University Press, 2004.
- 7[7] A. Nemirovski and D. Yudin, Problem Complexity and Method Efficiency in Optimization . New York, NY: Wiley, 1983.
- 8[8] A. Beck and M. Teboulle, ”Mirror Descent and Nonlinear Projected Subgradient Methods for Convex Optimization”, in Operations Research Letters , vol. 31, pp. 167-175, 2003.
