Tikhonov regularized exterior penalty methods for hierarchical variational inequalities
Meggie Marschner, Mathias Staudigl

TL;DR
This paper introduces a novel double loop prox-penalization algorithm for solving hierarchical variational inequalities, with strong convergence guarantees and applications to bilevel optimization and multi-follower games.
Contribution
It develops a new algorithm for nested variational inequalities with proven convergence, expanding solution methods for hierarchical equilibrium problems.
Findings
Algorithm demonstrates strong convergence in Hilbert spaces.
Applicable to hierarchical convex bilevel problems and multi-follower games.
Preliminary numerical results show promising performance.
Abstract
We consider nested variational inequalities con- sisting in a (upper-level) variational inequality whose feasible set is given by the solution set of another (lower-level) variational inequality. This class of hierarchical equilibrium contains a wealth of important applications, including purely hierarchical convex bilevel optimization problems and certain multi-follower games. Working within a real Hilbert space setting, we develop a double loop prox-penalization algorithm with strong conver- gence guarantees towards a solution of the nested VI problem. We present various application that fit into our framework and present also some preliminary numerical results.
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsNumerical methods in inverse problems · Optimization and Variational Analysis · Contact Mechanics and Variational Inequalities
Tikhonov regularized exterior penalty methods for hierarchical variational inequalities
Meggie Marschner and Mathias Staudigl1 This research benefited from the support of the FMJH Program Gaspard Monge for optimization and operations research and their interactions with data science. MST research is sponsored by the Deutsche Forschungsgemeinschaft (DFG) - Projektnummer 556222748 (”Non-stationary hierarchical optimization”) 1 Mannheim University, Department of Mathematics, B6 26, 68159 Mannheim m.staudigl@uni-mannheim, [email protected]
Abstract
We consider nested variational inequalities consisting in a (upper-level) variational inequality whose feasible set is given by the solution set of another (lower-level) variational inequality. This class of hierarchical equilibrium contains a wealth of important applications, including purely hierarchical convex bilevel optimization problems and certain multi-follower games. Working within a real Hilbert space setting, we develop a double loop prox-penalization algorithm with strong convergence guarantees towards a solution of the nested VI problem. We present various application that fit into our framework and present also some preliminary numerical results.
I INTRODUCTION
In a real Hilbert space setting, we consider the hierarchical variational inequality problem
[TABLE]
in which the problem data satisfy the following conditions:
Assumption 1
* is a monotone and Lipschitz continuous map with Lipschitz constant and weakly sequentially continuous, i.e. ;* 2. 2.
* is monotone and Lipschitz with Lipschitz constant and weakly sequentially continuous;* 3. 3.
* is maximally monotone with bounded;* 4. 4.
The set is nonempty.
Our assumptions imply that , and thus is maximally monotone. Furthermore, it follows that is closed and convex. The problem is to find a point such that
[TABLE]
This nested VI problem encompasses a wide range of optimization and equilibrium problems involving a hierarchical structure. Concrete applications of this model framework can be found in signal processing [1], equilibrium selection in Nash games [2, 3, 4], inverse problems [5], certain classes of bilevel optimization [6, 7], and power allocation [8], to mention a few. We give below a few concrete examples.
Example 1** (Simple Bilevel Optimization)**
The seminal references [6, 9, 10] introduced simple bilevel problems of the form
[TABLE]
Here are convex Fréchet-differentiable functions with Lipschitz continuous gradient and , respectively. The function is a convex, proper and lower semicontinuous function, giving rise to the maximally monotone operator . Under the convexity assumptions, we can reformulate the problem (2) as the hierarchical VI (P). More generally, we can consider structured convex optimization problems of the form
[TABLE]
Using the Fenchel-Rockafellar duality in the lower level problem, we can reformulate the above as the saddle-point problem
[TABLE]
A saddle point is characterized by the monotone inclusion
[TABLE]
which can be compactly written as the problem of finding a point in , with and . Note that the operator is not cocoercive, which precludes a direct approach via viscosity techniques as the celebrated BiG-SAM [11].
Example 2** (Equilibrium Selection in Nash games)**
The Nash equilibrium problem is described by a finite set of players , each characterized by a strategy space , and a real-valued function . An -tuple is a Nash equilibrium if
[TABLE]
We define the game as a tuple and denote the set of Nash equilibrium points by . Assuming that each mapping is convex and differentiable, it is a classical result that the set of Nash equilibria can be characterized as the solution set of a Variational inequality , determined by the operator
[TABLE]
Setting , the Normal cone operator (see section II), we can reformulate problem as the inclusion problem . In many design problems, one asks for the ”best” equilibrium point, relative to a pre-defined loss function with gradient mapping . This leads to the equilibrium selection problem
[TABLE]
I-A Contributions
In order to resolve the hierarchical VI problem (P), we follow [1] and adopt a regularization framework involving a parametric family of structured monotone inclusions , where is defined by
[TABLE]
The parameter is a proximal parameter, while is a Tikhonov parameter that regulates the relative importance of the lower level versus the upper level. Auxiliary mappings of the form (3) have a long tradition in mathematical optimization in connection with the proximal penalization framework [12, 13, 14]. To motivate this construction, let us investigate the structure of the operator when applied to the simple bilevel optimization problem (Example 1). To iteratively solve this problem, suppose that is our best-so-far solution candidate. To obtain a new proposal, a common strategy within the proximal penalization framework is to solve the following strongly convex auxiliary problem
[TABLE]
Thanks to the proximal term, this problem admits a unique solution . Under a constraint qualification, we can compute
Given an anchor point , the auxiliary problem admits a unique solution . Our algorithmic framework builds upon a double loop architecture, in which the inner loop is an iterative method that drives the process to a neighborhood of the temporary solution . Upon termination of the inner loop, an outer loop is activated which updates the anchor point and the Tikhonov parameter . The combination of Tikhonov and proximal regularization terms is borrowed from the paper [1]. This paper develops a general template to design iterative methods for resolving problem (P). To keep the exposition concrete, we illustrate our machinery only in the special case in which the forward-backward splitting method is used as the main algorithmic map. The general abstract setting will be described in a forthcoming publication.
With respect to [1], the innovations of this note are the following: We consider a more general class of splitting problems, with explicit iterations via a resolvent step. Second, our scheme is robust to inexact computations and is formulated in Hilbert spaces, as required for potentially infinite-dimensional control applications. Our inner loop scheme is a relaxed-inertial forward backward implementation involving the regularized monotone operators that we employ in order to iteratively track the temporal solutions. This algorithmic technique is a classical tool in monotone splitting approaches. In fact, the inertial-relaxation version has been first studied in a discrete-time algorithm in [15]. However, their analysis assumes that the discrete velocity of the process is summable. This is a rather frequently encountered technical assumption, which, however, can never be checked before the algorithm is actually executed. Our proof does not make such an assumption, and instead uses a new Lyapunov-type analysis, inspired by [16]. Using this technique, we establish linear convergence of the inner loop scheme. Additionally, we develop convergence guarantees in an inexact computational framework. Inexact forward-backward algorithms are studied in [17] and [18].
Hybrid models for approaching problem (P) have been studied in [19, 20, 21]. These methods require to be co-coercive (inverse strongly monotone), a condition that substantially holds when the lower-level VI reduces to a convex optimization problem. On the contrary, here we only require to be monotone and Lipschitz continuous, allowing for the treatment of more general problems with respect to these classical approaches.
II Preliminaries
Let be a real Hilbert space with inner product and corresponding norm . Given a closed convex set , we define the orthogonal projection operator . A set-valued operator is monotone if
[TABLE]
where is the graph of the operator . The operator is maximally monotone if it is monotone and there exists no other monotone operator whose graph contains . The resolvent of an operator is defined as . If is maximally monotone, then the resolvent is a nonexpansive operator (and thus single-valued) [22].
Lemma 1
Let be a maximal monotone operator. A point belongs to if and only if
[TABLE]
This Lemma shows also that for all , the set is closed and convex.
Given a mapping and a set , the variational inequality problem , is the problem of finding a point such that The solution set of can be expressed as , involving the normal cone operator
[TABLE]
If is continuous and monotone, and is closed and convex, then is convex.
Lemma 2
Let be a monotone operator and convex. Let . If satisfies then .
Lemma 3
For all and it holds that
[TABLE]
Lemma 4
For all and holds
[TABLE]
The next important result is implicit in Theorem 2.1 of [23].
Lemma 5
Let and be nonnegative sequences such that for all , and . Consider a real sequence such that
[TABLE]
for all . Then is convergent. If , then . Either way, if for all with and nonnegative, then is convergent. If , then .
In the next results, we gather some basic properties of the operator .
Lemma 6
For all the mapping is -strongly monotone and - Lipschitz continuous.
Following the classical literature on proximal diagonal schemes, we consider the sequence of auxiliary problems . Denote by the unique solution of the monotone inclusion problem . Associated with this monotone inclusion problem, we have the equivalent fixed point reformulation using the forward-backward operator. Specifically, for , we define the merit function
[TABLE]
This function has the property that
[TABLE]
Hence, we make use of the norm to measure the accuracy of the point with respect to the unique element of , a strategy that achieves its formal justification via the next result.
Lemma 7
For all and , we have
[TABLE]
Proof:
Call , so that
[TABLE]
Additionally, satisfies
[TABLE]
From the monotonicity of , it follows
[TABLE]
Rearranging the above immediately yields the estimate
[TABLE]
Finally, using the -strong monotonicity of the operator , uniformly in , we can lower bound the left-hand side expression by
[TABLE]
We conclude ∎
III Algorithm
As mentioned, our algorithmic design uses a double loop architecture, in which a forward-backward splitting method serves as an inner loop scheme to produce an iterate that is close to a temporal solution of the auxiliary problem . We denote by the outer loop iteration count, and the corresponding time approximate solution which is produced in such a way that guarantees
[TABLE]
where is a positive sequence of accuracies that is dynamically adjusted when the outer loop is called upon.
III-A The inner loop
The main iteration performed by our method is a forward backward step, involving a set of user-provided parameters that are updated over time. We call this scheme by . The main computational step in this procedure is the recursive update
[TABLE]
where is a given anchor point and is a positive sequence of step sizes. Given the pair and , our method recursively constructs sequences and which iteratively approximate the unique solution of the auxiliary problem . In the updating step of the sequence instead of using the exact forward-backward map , we assume that we have only access to a -perturbation , in the sense that
[TABLE]
The inclusion of the numerical error is motivated by the fact that the resolvent is in general not available in exact form or its computation may be very demanding. Just to mention some examples, this happens when applying proximal methods to image deblurring with total variation [24], or to structured sparsity regularization problems in machine learning and inverse problems [25]. In those cases, the proximity operator is usually computed using ad hoc algorithms, and therefore inexactly. Employing this inexact computational model, we propose an inertial forward-backward algorithm for iteratively solving (P) by updating the sequence as follows:
[TABLE]
is a momentum term, while the parameter is a relaxation factor for the inexact Krasnoselskii-Mann iteration. It is well known that if we choose and setting , then the operator is a contraction.
As a last ingredient in the lower level problem, we need an operational stopping criterion which informs us when the iterates are sufficiently close to the temporal solution . To achieve this aim, we define the stopping time
[TABLE]
We introduce the following assumptions on the parameters:
Assumption 2
* and .*
Assumption 3
* and there exists a such that for all k and monotonically increasing such that there exists a for all k.*
III-B General Stopping Time
Before tackling the outer loop, we need an operational stopping criterion which informs us when the iterates are sufficiently close to the temporal solution . Let there exists an such that , then
[TABLE]
Moreover, let again , we get
[TABLE]
so that This suggests to take as a sensible stopping criterion the stopping time
[TABLE]
III-C The outer loop
In the outer loop we update the parameters in order to restart the inner loop of the method . We set , and update the anchor point by setting , so that condition (6) holds with accuracy . Within this loop, we assume that the step size is fixed to . Indeed, by using Lemma 7, we observe for :
[TABLE]
As an illustrative example, let us assume that error model is given by for a sequence . Then, there exists such that , and thus we can evaluate the previous inequality at the last iteration counter of the inner loop , to obtain
[TABLE]
Having obtained this new anchor point, we update the parameters and restart the inner loop with these parameters and the new anchor point .
We summarize our approach with the following method.
IV Convergence Analysis
IV-A Inner Loop
In the following, we simplify the notation by setting
[TABLE]
Assumption 4
The parameter sequences employed in the inner loop satisfy
[TABLE]
Theorem 8
Assume that , and Assumptions 3 and 4 hold. If is generated by procedure without stopping criterion, then converge strongly to . Moreover .
Proof:
To simplify the notation we abbreviate . We start with applying Lemma 3 with
[TABLE]
Next, we employ Lemma 3 with to obtain
[TABLE]
Using that for all and the -contraction property of the mapping , we obtain
[TABLE]
Substituting all these terms, we can continue our development with
[TABLE]
As a next step we take a look at the second term. Notice that . Applying again Lemma 3 we get
[TABLE]
Define , with for all and multiply both sides in the display above by in order to arrive at
[TABLE]
Combining all these estimates, together with some elementary algebra, we can continue our energy bound as
[TABLE]
Rearranging the terms and evoking Assumption (4) we get
[TABLE]
Subtracting from both sides and using that is monotonically increasing, we can continue our estimation by
[TABLE]
Consequently, if we define
[TABLE]
we obtain for all . Notice that is indeed summable and and therefore we can apply Lemma 5 and we obtain , and hence converges to [math]. Moreover,
[TABLE]
where . Applying again Lemma 5 with the identification and , we can conclude and therefore as . ∎
IV-B Outer Loop
We now proof the asymptotic convergence of the sequence produced in the outer loop of the method. Let . Note that , and is a closed convex and nonempty set.
Theorem 9
Let Assumptions 1 and 2 hold. Then, the outer loop of our method produces a bounded sequence whose weak limit points are contained in .
Proof:
Define the anchor function We then have
[TABLE]
For all we define the unique solution
[TABLE]
where . Let . By monotonicity of and the fact that , we conclude that
[TABLE]
Therefore, for all ,
[TABLE]
We now add and subtract terms to estimate the inner products as follows:
[TABLE]
In the same way,
[TABLE]
and Moreover we have
[TABLE]
Combining all these estimates, we can continue from the above to obtain
[TABLE]
where is defined as
[TABLE]
Using the inclusion , then we can particularise the above estimate by using the point ,111Indeed, and therefore . to obtain
[TABLE]
where
[TABLE]
and . To complete the proof, we consider several cases:
Case 1
Assume there exists such that for all it holds In other words, the set
[TABLE]
contains , and all subsequent iterates. Then, for all (15) immediately yields
[TABLE]
Hence, is monotonically decreasing, so that exists. Furthermore,
[TABLE]
which implies for Since for all , we deduce that . We claim that
[TABLE]
Suppose not, i.e assume there exists such that for all sufficiently large. Since and are both contained in for all , and the operators and are Lipschitz, it follows that is a bounded sequence. Hence, there exists for which . This implies that for sufficiently large, we have
[TABLE]
Assumption 2 guarantees that we can choose large enough so that . Hence, we can continue from the above display to obtain the bound
[TABLE]
Since is not summable, we conclude , a contradiction. Hence, (17) holds true.
Now, let be a weakly converging subsequence of with weak limit . We can extract such a weakly converging subsequence since, by Assumption 1, is closed and bounded. Suppose . By maximal monotonicity of , there exists such that
[TABLE]
For this pair , inequality (13) implies
[TABLE]
Since , and the sequence is bounded, we have
[TABLE]
Since is monotone, we further see
[TABLE]
Moreover, since by Assumption 2 , we conclude Thus, passing to the limit along the weakly converging subsequence, we see that we get
[TABLE]
This is a contradiction to (18). We therefore conclude that .
Now consider a subsequence of , such that
[TABLE]
Since, and we have by the uniqueness of the weak limit. Moreover the following holds
[TABLE]
By the definition of , which we know to converges, we deduce that converges weakly and in norm Therefore it converges strongly.
Now by the weak sequentially continuity of , the continuity of the inner product and (17) the following holds
[TABLE]
Furthermore, is convex and (weakly) closed as it is the zero set of a maximally monotone operator, hence the normal cone of is maximally monotone. On the other hand, is monotone and Lipschitz which implies that is maximally monotone. Therefore is convex and weakly closed. It follow that .
But then, via Lemma 2, we see .
Case 2
Consider the sets
[TABLE]
Assume that both sets are infinite. It then suffices to show that converges to [math]. To this end, observe first that for every , it holds that
[TABLE]
The sequence of temporal solution candidates admits a converging subsequence. Hence, we can take a weakly converging subsequence with and weak limit . Then, is bounded and thus the last display shows that . Reasoning exactly as in Case 1, we conclude . Additionally, from (19), we see
[TABLE]
The first addendum on the left-hand side is bounded, and by the slow control assumption 2 the second addendum is a null sequence. Hence, is bounded. In particular, we have by weak sequential continuity of :
[TABLE]
which implies by citing Lemma 2.
Case 3
is infinite while is finite. In this case, is infinite and hence the only relevant phase for the asymptotic analysis of the algorithm. The sequence is decreasing and hence converging. This means for . Using (15), we see
[TABLE]
Since the sequence is bounded and as well as as , it follows
[TABLE]
Let now denote a weak limit point of . As in case 2, we see , hence for . But since the entire sequence converges, we conclude . ∎
V Numerical Experiments
We consider the two-player zero-sum game introduced in [26], given by
[TABLE]
We seek a saddle point of the function , namely a point that satisfies
[TABLE]
The solution set is characterized through the inclusion
[TABLE]
The set of solutions is . We solve the problem
[TABLE]
where , whose analytical solution is . Equivalently, we aim to find a point such that \langle\nabla\phi(x_{1}^{*},x_{2}^{*}),(w_{1},w_{2})-(x_{1}^{*},x_{2}^{*})\rangle\geq 0\;\text{for all (w_{1},w_{2})\in\operatorname{zer}(F+\operatorname{\mathsf{N}}_{X})}. The subproblem of Algorithm is thus given by determining a solution to
[TABLE]
We define the smooth part to be which is Lipschitz continuous with constant . We note that is -strongly monotone and select a step-size , such that the operator defined by the forward-backward map is contractive with constant . We consider the relaxation parameter of the inner loop to be constant. Across inner loops, we consider the acceleration parameter to be constant and to be the largest value satisfying equation (11). We assume , where , and a stopping criterion of , where . We run a total of iterations. The iterates for various starting points and different values for , along with the feasible region are shown in figures 1, 2 and 3.
The reference list from the paper itself. Each links out to its DOI / PubMed record.
- 1[1] F. Facchinei, J.-S. Pang, G. Scutari, and L. Lampariello, “Vi-constrained hemivariational inequalities: distributed algorithms and power control in ad-hoc networks,” Mathematical Programming , vol. 145, no. 1, pp. 59–96, 2014. [Online]. Available: https://doi.org/10.1007/s 10107-013-0640-5 · doi ↗
- 2[2] E. Benenati, W. Ananduta, and S. Grammatico, “On the optimal selection of generalized nash equilibria in linearly coupled aggregative games,” IEEE 61st Conference on Decision and Control (CDC) , pp. 6389–6394, 2022.
- 3[3] ——, “A semi-decentralized tikhonov-based algorithm for optimal generalized nash equilibrium selection,” 62nd IEEE Conference on Decision and Control (CDC) , pp. 4243–4248, 2023.
- 4[4] H. D. Kaushik and F. Yousefian, “A method with convergence rates for optimization problems with variational inequality constraints,” SIAM Journal on Optimization , vol. 31, no. 3, pp. 2171–2198, 2021.
- 5[5] I. Yamada, M. Yukawa, and M. Yamagishi, “Minimizing the moreau envelope of nonsmooth convex functions over the fixed point set of certain quasi-nonexpansive mappings,” in Fixed-Point Algorithms for Inverse Problems in Science and Engineering , 2011. [Online]. Available: https://api.semanticscholar.org/Corpus ID:124791335
- 6[6] M. Solodov, “An explicit descent method for bilevel convex optimization,” Journal of Convex Analysis , vol. 14, no. 2, p. 227, 2007.
- 7[7] F. Yousefian, “Bilevel distributed optimization in directed networks,” in 2021 American Control Conference (ACC) . IEEE, 2021, pp. 2230–2235.
- 8[8] J. S. Pang, G. Scutari, F. Facchinei, and C. Wang, “Distributed power allocation with rate constraints in gaussian parallel interference channels,” IEEE Transactions on Information Theory , vol. 54, no. 8, pp. 3471–3489, 2008.
