Gradient Methods with Regularization for Constrained Optimization Problems and Their Complexity Estimates
Igor Konnov

TL;DR
This paper introduces modified gradient and conditional gradient methods for smooth convex optimization in Hilbert spaces, achieving strong convergence and comparable complexity estimates to traditional weakly convergent methods.
Contribution
It proposes simple, implementable modifications that ensure strong convergence and provide complexity estimates for these optimization methods.
Findings
Achieve strong convergence in convex optimization
Maintain similar complexity estimates to weakly convergent methods
Provide practical modifications for gradient-based algorithms
Abstract
We suggest simple implementable modifications of conditional gradient and gradient projection methods for smooth convex optimization problems in Hilbert spaces. Usually, the custom methods attain only weak convergence. We prove strong convergence of the new versions and establish their complexity estimates, which appear similar to the convergence rate of the weakly convergent versions.
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsNumerical methods in inverse problems · Optimization and Variational Analysis · Advanced Optimization Algorithms Research
Gradient Methods with Regularization
for Constrained Optimization Problems
**and Their Complexity Estimates **
I.V. Konnov111E-mail: [email protected]
*Department of System Analysis and Information Technologies,
Kazan Federal University, ul. Kremlevskaya, 18, Kazan 420008, Russia.*
Abstract
We suggest simple implementable modifications of conditional gradient and gradient projection methods for smooth convex optimization problems in Hilbert spaces. Usually, the custom methods attain only weak convergence. We prove strong convergence of the new versions and establish their complexity estimates, which appear similar to the convergence rate of the weakly convergent versions.
Key words: Convex optimization; Hilbert space; gradient projection method; conditional gradient method; strong convergence; complexity estimates.
MSC codes: 90C25, 65K05, 65J20
1 Introduction
Let be a convex set in a real Hilbert space and a convex function. Then one can define the optimization problem of finding the minimal value of the function over the feasible set . For brevity, we write this problem as
[TABLE]
its solution set is denoted by and the optimal value of the function by , i.e.
[TABLE]
For many significant applications this problem appears ill-posed, i.e. its solution does not depend continuously on the input data. At the same time, the custom convex optimization methods can in general provide only weak convergence to a solution, hence, they do not guarantee sufficient distance approximation of the solution set , besides, even small perturbation of the input data may give large deviations from the solution. In order to overcome these drawbacks, various regularization techniques that yield the strong convergence can be applied; see e.g. [1]–[4]. The most popular and efficient regularization method was suggested by A.N. Tikhonov; see [5].
That is, a family of perturbed problems with better properties is solved instead of the initial one. However, the solution of such a perturbed problem within a prescribed accuracy may be too difficult even for the convex optimization problem (1). At the same time, various simple and implementable versions of the regularization methods yield slow convergence due to the special restrictive rules for the choice of step-size and regularization parameters; see e.g. [2, 3].
In this paper, we suggest an intermediate variant of the implementable regularization method. We take the conditional gradient and gradient projection methods as basic ones. At each iteration of the selected method it is applied to some perturbed convex optimization problem. Unlike the known iterative regularization methods (see [2]), we change the perturbed problem only after satisfying some simple estimate inequality, which allows us to utilize rather mild rules for the choice of the parameters. Within these rules we prove strong convergence and establish some complexity estimates for these two-level methods. In particular, they show that this way of incorporating the regularization techniques gives almost the same convergence rate as the custom single-level methods, which provide only weak convergence.
2 Properties of regularization
methods
We first recall some definitions. Given a set , a function is said to be
(a) convex, if for each pair of points and for all , it holds that
[TABLE]
(b) strongly convex with constant , if for each pair of points and for all , it holds that
[TABLE]
(c) upper (lower) semicontinuous at a point , if for each sequence , , it holds that
[TABLE]
We will consider problem (1) under the following basic assumptions.
(A1) * is a nonempty, convex and closed subset of a real Hilbert space , is a lower semicontinuous and convex function.*
The classical Tikhonov regularization method (see [5]) consists in replacing problem (1) with a sequence of perturbed problems of the form
[TABLE]
where is a lower semicontinuous and strongly convex function, is a regularization parameter. We recall the basic approximation property; see e.g. [1, Chapter II, Section 5, Theorem 1].
Proposition 1
Suppose that all the assumptions in (A1) are fulfilled, , and that is a lower semicontinuous and strongly convex function. Then:
(i) problem (2) has a unique solution for each ;
(ii) if as , the corresponding sequence converges strongly to the point that is the unique solution of the problem
[TABLE]
The main issue of the above regularization method consists in its suitable implementation since we can not find the point exactly in the general nonlinear case. Clearly, instead of we can in principle take any point such that with as . Then also converges strongly to the point in case (ii) of Proposition 1. However, it is not so easy to guarantee even the prescribed distance approximation to the point in the general case.
In [6], the so-called iterative regularization method was proposed; see [2] for more details. The idea of this method consists in simultaneous changes of the regularization parameters and step-sizes of a chosen basic approximation method. In particular, if the functions and are smooth, we can take the basic gradient projection method for problem (2). Then the corresponding iterative procedure can be determined as follows:
[TABLE]
where
[TABLE]
and . Here and below, denotes the projection of onto .
Proposition 2
[2, Theorem 3.1]** Suppose that all the assumptions in (A1) are fulfilled, , the function is smooth, the function is smooth and strongly convex, there exists a constant such that
[TABLE]
Then any sequence generated in conformity with rules (3) – (4) converges strongly to the point .
Of course, the implementation of method (3) – (4) is relatively simple. Observe that the conditions in (4) are fulfilled if we set
[TABLE]
This means that the convergence of the iterative regularization method may be rather slow in comparison with that of the basic method. In fact, let us consider the custom gradient projection method:
[TABLE]
and . For brevity, set .
(A2) The function is smooth and its gradient satisfies the Lipschitz condition with constant .
Proposition 3
([7, Theorem 5.1] and [8, Chapter III, Theorem 2.6]) Suppose that (A1) and (A2) are fulfilled, a sequence is generated in conformity with rule (5) where
[TABLE]
Then these exists some constant such that
[TABLE]
It is well known that method (5) – (6), unlike (3) – (4), provides only weak convergence. At the same time, comparing the step-size rules (4) and (6) we can conclude that it seems rather difficult to obtain the estimate similar to (7) for the iterative regularization method (3) – (4). The same convergence properties were established for the gradient projection method with some other known step-size rules such as the exact one-dimensional minimization and Armijo rules.
3 Two-level gradient projection method with regularization
We now describe some other way to create an implementable regularization method, which is based on the gradient projection method. The method is applied to problem (1) under the assumptions (A1) and (A2). At each iteration, the gradient projection method is applied to some perturbed problem of form (2), however, the perturbed problem is changed only after satisfying some simple estimate inequality, unlike the above regularization methods. For the simplicity of exposition, we take the standard perturbation function , then we rewrite the perturbed problem
[TABLE]
and set
[TABLE]
Observe that problem (8) has the unique solution for each under the assumptions (A1) and (A2) due to Proposition 1 (i), hence . Denote by the set of non-negative integers.
Method (GPRM).
Step 0: Choose a point , numbers , , sequences and . Set .
Step 1: Set , .
Step 2: Take . If
[TABLE]
set , and go to Step 1. (Change the perturbation)
Step 3: Set , determine as the smallest number in such that
[TABLE]
set , , , and go to Step 2.
We see that the upper level changes the current perturbed problem which is associated to the index , whereas the lower level with iterations in is nothing but the custom gradient projection method with the Armijo step-size rule applied to the fixed perturbed problem (8) with . Clearly, condition (9) is very simple and suitable for the verification.
We now give some useful properties of the gradient projection method.
Lemma 1
Suppose that (A1) and (A2) are fulfilled. Fix any . Then we have
[TABLE]
for any ; besides, for any
Proof. Relation (11) follows directly for the projection properties. Next, under the assumptions made the gradient of the function satisfies the Lipschitz condition with constant . Hence, for any pair of points we now have
[TABLE]
see [8, Chapter III, Lemma 1.2]. Then (11) gives
[TABLE]
if . It follows from (10) that .
We show that the sequence of perturbed problems is infinite.
Lemma 2
Suppose that (A1) and (A2) are fulfilled. Then the number of iterations in for each number is finite.
Proof. It follows from (10) and Lemma 1 that , but , hence , and the result follows.
The next property enables us to evaluate the approximation error.
Lemma 3
Suppose that (A1) and (A2) are fulfilled. Fix any . Then
[TABLE]
for any
Proof. Since is strongly convex with modulus , we have
[TABLE]
see e.g. [1, Chapter I, Section 2]. Next, (11) gives
[TABLE]
It follows that (12) holds true.
We are ready to establish the basic convergence property for (GPRM).
Theorem 1
Suppose that (A1) and (A2) are fulfilled and , we apply (GPRM) with
[TABLE]
Then:
(i) the number of iterations in for each number is finite;
(ii) the sequence converges strongly to the point .
Proof. Assertion (i) has been obtained in Lemma 2. Fix any and denote by the maximal value of the index for this , i.e. . Then (12) gives
[TABLE]
but
[TABLE]
hence
[TABLE]
Therefore, by (13),
[TABLE]
Due to Proposition 1 (ii), converges strongly to . Therefore, assertion (ii) is also true.
We observe that inserting the control sequence does not require additional computational expenses per iteration, but implies the strong convergence, whereas the usual gradient projection method provides only weak convergence as indicated above. Besides, rule (13) is clearly less restrictive than (4) and maintains significant freedom for the choice of parameters.
4 Complexity estimate
It was observed in Section 2 that the usual gradient projection method has the convergence rate under the assumptions (A1) and (A2); see Proposition 3 and the remarks below. This means that the total number of iterations that is necessary for attaining some prescribed accuracy is estimated as follows:
[TABLE]
We intend to obtain a similar estimate for (GPRM). Namely, we define the complexity of (GPRM), denoted by , as the total number of iterations in that is necessary for attaining any accuracy . In order to establish an upper bound for we need certain auxiliary properties. We recall that denotes the solution of the perturbed problem (8) for , which is defined uniquely under (A1). Hence denotes any solution of problem (1).
Lemma 4
Suppose that (A1) holds. Then for any numbers and such that we have
[TABLE]
Proof. By definition,
[TABLE]
These relations give (15) and (16), besides, we also have
[TABLE]
which gives (17).
Denote by the total number of iterations in for any fixed in (GPRM) and by the maximal number of the upper iteration such that for any given . Then we can evaluate the complexity of (GPRM) as follows:
[TABLE]
Using this inequality, we now obtain the basic estimate.
Theorem 2
Suppose that (A1) and (A2) are fulfilled and , we apply (GPRM) with
[TABLE]
Then (GPRM) has the complexity estimate
[TABLE]
where and .
Proof. First we note that (19) implies (13), hence all the assertions of Theorem 1 remain true. Fix any . Then, due to (10) and Lemma 1, we have
[TABLE]
therefore,
[TABLE]
However,
[TABLE]
From (12) we have
[TABLE]
hence
[TABLE]
It follows that
[TABLE]
[TABLE]
Therefore,
[TABLE]
where
[TABLE]
Using these relations in (20) we have
[TABLE]
where
[TABLE]
In view of (18) and (22) we obtain
[TABLE]
We now proceed to evaluate . By definition,
[TABLE]
hence,
[TABLE]
From (21) we have
[TABLE]
whereas applying (15) with and gives
[TABLE]
Therefore,
[TABLE]
In view of (19) we have
[TABLE]
It follows that . Applying this inequality in (23) we obtain
[TABLE]
and the result follows.
From Theorem 2 we conclude that the complexity estimate of (GPRM) tends to (14) when . However, we can choose arbitrarily in . Therefore, taking small enough, we can obtain any approximation of the convergence rate of the usual gradient projection method under the same assumptions. At the same time, (GPRM), unlike the gradient projection method, attains the strong convergence.
5 Two-level conditional gradient method with regularization
We now describe a similar modification of the conditional gradient method under the following basic assumptions for problem (1).
(A3) * is a nonempty, convex, closed, and bounded subset of a real Hilbert space , is a smooth convex function and its gradient satisfies the Lipschitz condition with constant .*
The boundedness of guarantees the method is well-defined. Besides, now problem (1) has a solution, i.e. . We recall that the conditional gradient method was first suggested in [9] for the case when the goal function is quadratic and the feasible set is polyhedral and further was developed by many authors; see e.g. [7, 8, 10, 11, 12]. The main idea of this method consists in linearization of the goal function, so that solution of the linearized problem over the initial feasible set serves for finding the descent direction.
Following [7, 8], we describe one of the various versions of the custom conditional gradient method.
Method (CGM).
Step 0: Choose a point , set .
Step 1: Find a point as a solution of the problem
[TABLE]
set .
Step 2: If , stop. Otherwise choose a number , set , , , , and go to Step 1.
Clearly, termination of the method yields a solution. For this reason, we will consider only the non-trivial case where the sequence is infinite.
Proposition 4
([7, Theorem 6.1] and [8, Chapter III, Theorem 1.7]) Suppose that (A3) is fulfilled, a sequence is generated by (CGM) where
[TABLE]
Then these exists some constant such that
[TABLE]
That is, estimate (24) is the same as (7), but it can not be enhanced even if the function is strongly convex. Besides, (CGM) also provides only weak convergence. The same convergence properties were established for the conditional gradient method with the other known step-size rules such as the exact one-dimensional minimization and Armijo rules; see [8, 10, 11].
Some versions of the iterative regularization method based on the conditional gradient iterations were described in [1, Chapter II, Section 11] and [2, Chapter IV, Section 1]. They provides strong convergence but utilize the restrictive control rules for the regularization parameters and step-sizes, which are similar to (4). In particular, the version from [2] utilizes the exact one-dimensional minimization for the choice of the step-size and take the rule
[TABLE]
for the regularization parameter. This means that the convergence of the iterative regularization version may be rather slow in comparison with that of the basic conditional gradient method.
We now describe some other implementable conditional gradient method with regularization, which follows the approach given in Section 3. That is, the custom conditional gradient method is applied to some perturbed problem of form (2), however, the perturbed problem is changed only after satisfying some simple estimate inequality. We also take the standard perturbation function , hence we take the perturbed problem (8), which has the unique solution for each under the assumptions in (A3).
Method (CGRM).
Step 0: Choose a point , numbers , , sequences and . Set .
Step 1: Set , .
Step 2: Find a point as a solution of the problem
[TABLE]
set , . If
[TABLE]
set , and go to Step 1. (Change the perturbation)
Step 3: Determine as the smallest number in such that
[TABLE]
set , , , and go to Step 2.
We see again that the upper level changes the current perturbed problem associated to the index , whereas the lower level with iterations in is nothing but the conditional gradient method with the Armijo step-size rule applied to the fixed perturbed problem. Clearly, condition (25) is very simple and suitable for the verification.
We now give a lower bound for the step-size.
Lemma 5
Suppose that (A3) is fulfilled. Fix any . Then
[TABLE]
for any
Proof. It was noticed that, under the assumptions made the gradient of the function satisfies the Lipschitz condition with constant . Hence, for any pair of points we now have
[TABLE]
Therefore,
[TABLE]
if
[TABLE]
or , where denotes the diameter of the set . Fix any point . Then
[TABLE]
hence setting gives . Set . It follows now from (26) that .
We now show that the sequence of perturbed problems is infinite.
Lemma 6
Suppose that (A3) is fulfilled. Then the number of iterations in for each number is finite.
Proof. It follows from (26) that , but , hence , and the result follows.
The next property enables us to evaluate the approximation error.
Lemma 7
Suppose that (A3) is fulfilled. Fix any . Then
[TABLE]
for any
Proof. Since is strongly convex with modulus , we have
[TABLE]
see e.g. [1, Chapter I, Section 2]. By definition, we have
[TABLE]
which gives (27).
We are ready to establish the basic convergence property for (CGRM).
Theorem 3
Suppose that (A3) is fulfilled, we apply (CGRM) with (13). Then:
(i) the number of iterations in for each number is finite;
(ii) the sequence converges strongly to the point .
Proof. Assertion (i) has been obtained in Lemma 6. Fix any and denote by the maximal value of the index for this . Then and (27) gives
[TABLE]
hence, by (13),
[TABLE]
Due to Proposition 1 (ii), converges strongly to . Therefore, assertion (ii) is also true.
We also notice that rule (13) is clearly less restrictive than (4) and maintains significant freedom for the choice of parameters.
Due to Proposition 4, the total number of iterations of the conditional gradient method that is necessary for attaining some prescribed accuracy is estimated as follows:
[TABLE]
We intend to obtain a similar estimate for (CGRM). As above in Section 4, we define the complexity of (CGRM), denoted by , as the total number of iterations in that is necessary for attaining any given accuracy .
Denote by the total number of iterations in for any fixed in (CGRM) and by the maximal number of the upper iteration such that for any given . Then we can evaluate the complexity of (CGRM) as follows:
[TABLE]
cf. (18). Using this inequality, we now obtain the basic estimate. Its substantiation is somewhat different from the proof of Theorem 2.
Theorem 4
Suppose that (A3) is fulfilled, we apply (CGRM) with (19). Then (CGRM) has the complexity estimate
[TABLE]
where and .
Proof. First we note that (19) implies (13), hence all the assertions of Theorem 3 remain true. Fix any . Then, due to (26) and Lemma 5, we have
[TABLE]
therefore,
[TABLE]
However,
[TABLE]
From (27) we have
[TABLE]
[TABLE]
Therefore,
[TABLE]
where
[TABLE]
Using these relations in (30) we have
[TABLE]
where
[TABLE]
In view of (29) and (32) we obtain
[TABLE]
We now proceed to evaluate . By definition,
[TABLE]
Applying (15) with and gives
[TABLE]
From (31) it now follows that
[TABLE]
In view of (19) we have
[TABLE]
It follows that . Applying this inequality in (33) we obtain
[TABLE]
and the result follows.
From Theorem 4 we conclude that the complexity estimate of (CGRM) tends to (28) when . Due to (19), we can choose arbitrarily in . Therefore, taking small enough, we can obtain any approximation of the best convergence rate of the usual conditional gradient method under the same assumptions. At the same time, (CGRM) attains the strong convergence.
6 Conclusions
We suggested simple implementable versions of the combined regularization and gradient methods for smooth convex optimization problems in Hilbert spaces. We took the basic conditional gradient and gradient projection methods and proved strong convergence of their modified versions under rather mild rules for the choice of the parameters. Within these rules we also established complexity estimates for the methods. They show that this way of incorporating the regularization techniques gives the convergence rate similar to that of the custom method, which provides only weak convergence under the same assumptions.
Acknowledgement
This work was supported by the RFBR grant, project No. 13-01-00368-a.
The reference list from the paper itself. Each links out to its DOI / PubMed record.
- 1[1] F.P. Vasil’yev, Methods for Solving Extremal Problems , Nauka, Moscow, 1981.
- 2[2] A.B. Bakushinskii and A.V. Goncharskii, Iterative Solution Methods for Ill-Posed Problems , Nauka, Moscow, 1989.
- 3[3] V.V. Vasin and A.L. Ageev, Incorrect Problems with A Priori Information , Nauka, Ekaterinburg, 1993.
- 4[4] H.W. Engl, M. Hanke, and A. Neubauer, Regularization of Inverse Problems , Kluwer Academic Publishers, Dordrecht, 1996.
- 5[5] A.N. Tikhonov, On the solution of ill-posed problems and regularization method , Dokl. Akad. Nauk SSSR, vol. 151 (1963), pp.501–504.
- 6[6] A.B. Bakushinskii and B.T. Polyak, On the solution of variational inequalities , Sov. Math. Dokl., vol. 15 (1974), pp.1705–1710.
- 7[7] E.S. Levitin and B.T. Polyak, Constrained minimization methods , USSR Comp. Maths. Math. Phys., vol. 6 (1966), pp.1–50.
- 8[8] V.F. Dem’yanov and A.M. Rubinov, Approximate Methods for Solving Extremum Problems , Leningrad Univ. Press, Leningrad, 1968. (Engl. transl. in Elsevier, Amsterdam, 1970)
