A gradient descent akin method for constrained optimization: algorithms and applications
Long Chen, Kai-Uwe Bletzinger, Nicolas R. Gauger, Yinyu Ye

TL;DR
This paper introduces GDAM, a new gradient descent-like method for constrained optimization, with proven convergence and demonstrated effectiveness in engineering applications like shape optimization and sensor network localization.
Contribution
The paper develops a simplified, theoretically guaranteed gradient descent akin method (GDAM) for constrained optimization, expanding its applicability to large-scale engineering problems.
Findings
GDAM is globally convergent under certain conditions.
GDAM performs competitively on large, challenging engineering problems.
The method is robust and adaptable to various constrained optimization scenarios.
Abstract
We present a first-order method for solving constrained optimization problems. The method is derived from our previous work, a modified search direction method inspired by singular value decomposition. In this work, we simplify its computational framework to a ``gradient descent akin'' method (GDAM), i.e., the search direction is computed using a linear combination of the negative and normalized objective and constraint gradient. We give fundamental theoretical guarantees on the global convergence of the method. This work focuses on the algorithms and applications of GDAM. We present computational algorithms that adapt common strategies for the gradient descent method. We demonstrate the potential of the method using two engineering applications, shape optimization and sensor network localization. When practically implemented, GDAM is robust and very competitive in solving the…
| Prob. | step size | iters. | f() | f() | obj. error |
|---|---|---|---|---|---|
| G01 | 0.002 | 2362 | -15 | -14.7215 | 1.74e-2 |
| G04 | 0.2 | 136 | -3.0665e+4 | -3.0657e+4 | 2.61e-4 |
| G06 | 0.002 | 4826 | -6.9618e+3 | -6.8371e+3 | 1.79e-2 |
| G07 | 0.0027 | 3009 | 24.3062 | 24.7876 | 1.90e-2 |
| G08 | 0.01 | 66 | -9.5825e-2 | -9.5063e-2 | 6.95e-4 |
| G09 | 0.05 | 120 | 6.8063e+2 | 6.9238e+2 | 1.72e-2 |
| G10 | 0.35 | 5319 | 7.0492e+3 | 7.1898e+3 | 1.99e-2 |
| G18 | 0.01 | 257 | -0.8660 | -0.8546 | 6.10e-3 |
| G19 | 0.05 | 294 | 32.6556 | 2.7120e+2 | 7.09 |
| G24 | 0.02 | 268 | -5.5080 | -5.4147 | 1.43e-2 |
| Problem | Runtime [s] | Num. iters. | Obj. error | |||||||||
|---|---|---|---|---|---|---|---|---|---|---|---|---|
| Matlab | Gurobi | OSQP | GDAM | Matlab | Gurobi | OSQP | GDAM | Matlab | Gurobi | OSQP | GDAM | |
| AUG2D | 0.37 | 0.02 | 0.03 | 0.27 | 1 | 2 | 25 | 300 | 7.06e-16 | opt444A feasible solution that has the lowest objective function value is chosen as reference and is considered “optimal”. | 5.84e-6 | 2.33e-7 |
| AUG2DC | 0.11 | 0.02 | 0.03 | 0.33 | 1 | 2 | 25 | 350 | opt | 2.62e-16 | 5.81e-6 | 1.29e-7 |
| AUG2DCQP | 0.95 | 0.06 | 0.59 | 1.46 | 10 | 16 | 1075 | 1050 | 4.89e-12 | opt | 7.30e-5 | 3.36e-2 |
| AUG2DQP | 0.90 | 0.06 | 0.55 | 1.53 | 10 | 16 | 975 | 1100 | 7.17e-12 | opt | 1.98e-4 | 3.34e-2 |
| AUG3D | 0.09 | 0.004 | 0.008 | 0.05 | 1 | 2 | 25 | 400 | 5.32e-15 | opt | 2.02e-10 | 1.03e-4 |
| AUG3DC | 0.01 | 0.006 | 0.009 | 0.05 | 1 | 2 | 50 | 350 | opt | 1.39e-15 | 6.69e-11 | 4.24e-5 |
| AUG3DCQP | 0.11 | 0.02 | 0.009 | 0.16 | 8 | 15 | 50 | 640 | 1.21e-8 | opt | 3.09e-6 | 8.61e-5 |
| AUG3DQP | 0.12 | 0.02 | 0.01 | 0.13 | 7 | 15 | 75 | 500 | 2.93e-7 | opt | 1.32e-9 | 4.67e-4 |
| BOYD1 | 24.56 | 0.36 | 9.76 | 28.70 | 26 | 26 | 2750 | 10000 | 4.31e-12 | opt | 5.26 | 1.97e-2 |
| CONT-050 | 0.29 | 0.02 | 1.01 | 0.32 | 11 | 15 | 5150 | 900 | 2.83e-6 | opt | 1.24e-4 | 2.18e-4 |
| CONT-100 | 0.50 | 0.11 | 12.45 | 2.49 | 4 | 15 | 10000 | 1450 | 2.48e-2 | opt | 2.80e-4 | 1.47e-3 |
| CONT-101 | 0.31 | 0.09 | 14.67 | 0.83 | 3 | 11 | 10000 | 452 | 0.69 | opt | 1.44e-2 | 5.97e-7 |
| CONT-200 | 3.76 | 0.75 | 119.59 | 9.89 | 4 | 20 | 10000 | 750 | 0.30 | opt | 0.46 | 3.91e-3 |
| CONT-201 | 1.93 | 0.52 | 112.91 | 5.21 | 3 | 14 | 10000 | 397 | 0.67 | opt | 0.69 | 1.06e-6 |
| CONT-300 | 5.13 | 1.32 | 295.89 | 22.02 | 3 | 14 | 10000 | 715 | 0.67 | opt | 0.52 | 7.56e-6 |
| CVXQP1_L | 11.48 | 12.66 | 61.25 | 0.56 | 13 | 59 | 9075 | 800 | opt | 4.63e-8 | 2.23e-5 | 3.24e-5 |
| CVXQP2_L | 4.81 | 1.23 | 5.84 | 0.27 | 13 | 13 | 425 | 850 | 7.66e-12 | opt | 3.63e-9 | 3.12e-5 |
| CVXQP3_L | 17.06 | 8.99 | 29.56 | 0.83 | 15 | 51 | 3225 | 800 | opt | 8.52e-6 | 4.62e-5 | 2.76e-5 |
| DTOC3 | 0.03 | 0.02 | 0.03 | 0.19 | 1 | 2 | 50 | 400 | opt | 1.37e-13 | 0.18 | 8.73e-5 |
| HUES-MOD | 0.25 | 0.01 | 0.03 | 0.17 | 11 | 14 | 175 | 650 | 1.46e-11 | opt | 2.03e-5 | |
| HUESTIS | 0.26 | 0.01 | 0.01 | 0.16 | 11 | 15 | 50 | 600 | opt | 5.78e-15 | 1.76e-5 | |
| STCQP1 | 4.06 | 0.09 | 0.04 | 4.17 | 9 | 11 | 150 | 650 | 4.42e-10 | opt | 2.78e-8 | 9.99e-5 |
| STCQP2 | 0.59 | 0.16 | 0.06 | 0.11 | 9 | 11 | 125 | 500 | 4.55e-10 | opt | 9.99e-7 | 1.72e-4 |
| UBH1 | 0.16 | 0.01 | 0.03 | 1.78 | 3 | 7 | 50 | 2300 | 6.72e-11 | opt | 1.21 | 8.68 |
| n | m | r | iters. | RMSD | runtime (hh:mm:ss) | |
|---|---|---|---|---|---|---|
| 100 | 1058 | 0.3 | 0.9999 | 256 | 3.09e-3 | |
| 500 | 14699 | 0.21 | 0.9999 | 948 | 2.29e-3 | 00:00:09 |
| 1500 | 98007 | 0.18 | 0.9999 | 1312 | 3.27e-3 | 00:02:41 |
| 3000 | 246177 | 0.14 | 0.9999 | 1164 | 5.38e-3 | 00:12:37 |
| 5000 | 510116 | 0.12 | 0.9999 | 1231 | 5.63e-3 | 00:49:58 |
| 10000 | 1446144 | 0.10 | 0.9999 | 1427 | 3.97e-3 | 05:51:51 |
| n | iters. | RMSD | runtime (hh:mm:ss) | |
|---|---|---|---|---|
| 100 | 1e-6 | 36 | 5.06e-10 | |
| 500 | 1e-6 | 32 | 6.37e-10 | 00:06:47 |
| n | iters. | RMSD | runtime (hh:mm:ss) | |
|---|---|---|---|---|
| 100 | 1e-2 | 400 | 7.64e-2 | 00:00:02 |
| 1e-3 | 650 | 2.91e-3 | 00:00:03 | |
| 500 | 1e-2 | 400 | 1.57e-01 | 00:00:48 |
| 1e-3 | 950 | 1.03e-02 | 00:01:23 | |
| 1500 | 1e-2 | 400 | 2.29e-1 | 00:13:21 |
| 1e-3 | 1225 | 1.82e-03 | 00:40:50 | |
| 3000 | 1e-3 | 1950 | 1.21e-1 | 05:28:20 |
| 1e-4 | 7225 | 6.14e-4 | 18:35:00 |
| n | it. ADMM+/SSN | RMSD | runtime (hh:mm:ss) | |
|---|---|---|---|---|
| 100 | 1e-2 | 114/9 | 8.013e-4 | 00:00:01 |
| 500 | 1e-2 | 370/21 | 8.38e-3 | 00:00:42 |
| 1e-3 | 500/22 | 1.43e-3 | 00:00:47 | |
| 1500 | 1e-2 | 300/31 | 4.90e-2 | 00:20:43 |
| 1e-3 | 3600/43 | 1.85e-5 | 00:36:53 | |
| 3000 | 1e-2 | 414/38 | 2.15e-1 | 01:55:06 |
| 1e-3 | 21603/165 | 1.40e-3 | 19:49:07 |
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsSparse and Compressive Sensing Techniques · Robotics and Sensor-Based Localization · Advanced Vision and Imaging
A gradient descent akin method for constrained optimization: algorithms and applications
\nameLong Chena, Kai-Uwe Bletzingerb, Nicolas R. Gaugera and Yinyu Yec L. Chen. Email: [email protected].-U. Bletzinger. Email: [email protected]. R. Gauger. Email: [email protected]. Ye. Email: [email protected] aChair for Scientific Computing, University of Kaiserslautern-Landau (RPTU), Germany;
bChair of Structural Analysis, Technical University of Munich, Germany;
cDepartment of Management Science and Engineering and ICME, Stanford University, USA
Abstract
We present a first-order method for solving constrained optimization problems. The method is derived from our previous work [28], a modified search direction method inspired by singular value decomposition. In this work, we simplify its computational framework to a “gradient descent akin” method (GDAM), i.e., the search direction is computed using a linear combination of the negative and normalized objective and constraint gradient. We give fundamental theoretical guarantees on the global convergence of the method. This work focuses on the algorithms and applications of GDAM. We present computational algorithms that adapt common strategies for the gradient descent method. We demonstrate the potential of the method using two engineering applications, shape optimization and sensor network localization. When practically implemented, GDAM is robust and very competitive in solving the considered large and challenging optimization problems.
keywords:
Negative and normalized gradients; inequality constrained optimization; gradient descent; interior-point method; shape optimization; sensor network localization
{amscode}
65K05; 90C22; 90C30; 90C51; 90C90
1 Introduction
In this paper, we propose a first-order method for solving inequality constrained optimization problems. The problem of interest is
[TABLE]
where are twice differentiable. We propose a gradient descent akin method (GDAM) for (COP) with a single inequality constraint ,
[TABLE]
where denotes the step size, “” denotes the Euclidean norm, and is a parameter. and are the gradient column vectors of the objective function and constraint function, respectively.
Provided the well-known logarithmic barrier function
[TABLE]
where denotes the natural logarithm, we propose a generalization of (GDAM) for multiple constrained problems with as
[TABLE]
1.1 Main contributions
Theory, computational algorithms, and applications are the three pillars of an optimization method. Just as the title suggests, this manuscript mainly focuses on the algorithms and applications of GDAM. We also give essential theoretical guarantees on the global convergence of the method for a complete presentation.
We summarize the main contributions of this work:
We present (GDAM), which is derived from our previous work [28], whereby the equivalence of the two methods is not obvious.
- 2.
We give fundamental theoretical guarantees to the method using a continuous-time dynamical systems approach.
- 3.
We present two computational algorithms based on (GDAM), a vanilla and an accelerated implementation.
- 4.
We apply the developed algorithms to two engineering applications, node-based shape optimization and sensor network localization, which demonstrate the practical usefulness of GDAM in solving large-scale and difficult optimization problems.
1.2 Organization of paper
In section 2, we first review related works focusing on algorithm and application aspects. In section 3, we derive the method for single inequality constrained problems. Illustrative examples are presented in section 4 for an informal but intuitive presentation of the method. We give fundamental theoretical guarantees of the method in 5 and show its generalization to multiple constraints in section 6. We present computational algorithms of GDAM in section 7 and show preliminary numerical experiments. Two engineering applications, shape optimization and sensor network localization, are presented in sections 8 and 9. We conclude the work with a discussion in section 10.
2 Related works and applications
2.1 Gradient descent method
The gradient descent method, originally proposed by Cauchy, is a first-order method for unconstrained optimization that uses the negative gradient of the objective function for the variable update [64]. In a canonical form, it writes
[TABLE]
The direction of the gradient descent is the steepest descent direction in Euclidean norm. To see this, we write the first-order Taylor expansion at the current iterate of the objective function,
[TABLE]
where “” denotes the inner product in Euclidean space. The steepest descent direction is found by the optimization problem
[TABLE]
From the Cauchy-Schwarz inequality, we obtain
[TABLE]
which is the negative and normalized objective gradient. Comparing (4) and (GDAM),
[TABLE]
which linearly combines the negative and normalized objective and constraint gradient, we consider the present method: a gradient descent akin method for inequality constrained optimization problems. The present method can be explained intuitively: For an inequality constrained problem, we use the objective gradient descent direction to minimize , and use the constraint gradient descent direction to minimize (so as to maintain feasibility by moving away from the constraint boundary). Since the gradient directions are normalized to , a parameter ensures that the computed direction is a descent direction of the objective function.
2.2 Interior-point methods
Interior-point methods (IPMs), which are mostly based on the Newton method, are among the most competitive methods for constrained optimization problems. The signature of IPM is the existence of continuously parameterized families of approximate solutions that converge to the exact solution asymptotically [40]. IPMs find a wide variety of applications of convex and nonconvex optimizations in broad fields. There are a vast amount of excellent works that have been devoted to IPM. A comprehensive review of this class of methods is certainly beyond the scope of the present paper, however we refer the interested reader to [40][87][112], more recently, in [43], and many other excellent optimization books.
The present method results in trajectories that asymptotically converge to the central path for a particular IPM, the logarithmic barrier method [38]. We introduce its connections/differences to the present method in the next. Consider a convex optimization problem of the form (COP), we start with its approximated unconstrained subproblem using the logarithmic barrier function ,
[TABLE]
where the barrier parameter is a positive parameter. As , the solution of the approximated problem converges to the original one. The central path is characterized by the set of points that satisfy the necessary and sufficient conditions [24]:
[TABLE]
where . The conditions (6) are interpreted as a modified KKT system in the literature [24][27][40]. The barrier method finds an approximated solution for the original problem by 1) iteratively decreasing the barrier parameter , and 2) in each iteration, solving the subproblem (the modified KKT system) defined by (6) using the Newton method. Therefore, the barrier parameter can be considered a central path parameter.
In the present method, we do not parameterize the central path. Instead, we normalize the objective and constraint gradients. For optimization problems with a single inequality constraint, we propose the normalized central path condition as
[TABLE]
The condition (7) is used in section 5 for the analysis. A generalization of the normalized central path condition from single inequality to multiple inequalities is introduced by the use of the logarithmic barrier function (see section 6) as
[TABLE]
Compared with the barrier method, the path parameter has vanished. The condition (8) characterizes the central path, which differs from (6), which instead characterizes a point on the central path.
To see the connections between GDAM and the interior-point method, we first write the gradient descent direction for a barrier -subproblem (5),
[TABLE]
A straightforward way to design a first-order method is to follow the double-loop structure of the second-order IPM: 1) iteratively decrease the barrier parameter in the outer loop, and 2) solve the subproblem (5) by the gradient descent (9) in the inner loop. Unfortunately, this approach has long been known to be computationally impractical, see, e.g., the discussions in [70, p. 469]. Roughly speaking, when is very small, tends to result in an optimization trajectory that travels alongside the boundary of the constraints and suffers severely from poor conditioning.
Next, we scale the GDAM search direction with ,
[TABLE]
Comparing (10) with (9), it is then clear that the (scaled) GDAM suggests a dynamic computation of the barrier parameter at each step ,
[TABLE]
GDAM results in an optimization trajectory that travels alongside the central path (within a neighborhood relative to the parameter ) —a typical behavior of a path-following method. Therefore, we consider GDAM as a first-order interior-point method. As one of the main contributions of this work, we will show that GDAM, in contrast to the sequential application of gradient descent to the barrier subproblems, is a computationally practical optimization method.
2.3 Dynamical systems approaches
Dynamical systems approaches have been used to study optimization methods in many works, and our literature review could not be exhaustive. Extensive studies on the connections between interior-point flows with linear programming methods can be found in [49, Chapter 4] and the references therein. Nonlinear dissipative dynamical systems are studied in [11] in view of unconstrained optimization. [102] studies the celebrated Nesterov’s accelerated gradient method using a dynamical system as the analysis tool. Recently, dynamical systems have been used to study optimization algorithms for solving problems arising from machine learning applications, see, e.g., [10][30][56][110]. In some literature, optimization methods that use dynamical systems are called trajectory methods. These methods construct optimization paths in a way so that one or all solutions to the optimization problem are a priori known to lie on these paths [35]. Typically, these optimization paths are solution trajectories to ODE of first or second-order. Trajectory methods are mainly studied for unconstrained optimizations for finding local solutions [13][22], and global solutions [45][96]. Studies for constrained optimization are, however, very limited, see [3][21][108] and the references therein. In this work, we give basic theoretical guarantees of the present method using a dynamical system’s perspective —we show that the method is globally convergent to first-order stationarities.
2.4 Feasible direction methods
The method of feasible directions (MFD) dates back to the 1960’s by the work of Zoutendijk [115] and has enjoyed fruitful developments for decades. MFDs have been especially popular in the engineering community because of the importance of ending up with a design that satisfies the hard specifications expressed by a set of inequalities [29]. The general idea behind the MFD is to move from one feasible design to an improved feasible design iteratively so that a local solution can be found [9]. In the present method, the search direction must not be a feasible direction. The idea behind the method design is to find a search direction that approaches the central path while maintaining a descent direction of the objective function. Due to this major difference, we do not categorize our method as a method of feasible directions.
In the present method, we compute a search direction that uses normalized gradients. In [99], the authors also present a feasible direction method that applies normalized gradients. Their work is then continued and further developed in [33] and [100]. In these works, the active-set strategy is used. The common idea is to formulate a linear system under given input criteria on a chosen working set of (active) constraints, and a feasible descent search direction is obtained by solving the linear system. In the present method, we use a barrier function based formulation to treat multiple inequality constraints. The search direction is computed as a linear combination of the negative and normalized objective and barrier gradient.
2.5 Shape optimization
As a subset of design optimization, shape optimization is characterized by a very large or even infinite number of design variables that describe the varying boundary in the optimization process. Introductions to shape optimization are given in [48][98]. Shape optimization is distinct from another well-known problem in design optimization: topology optimization [14] (sometimes referred to as the homogenization method [4]). The main difference is that the topology optimization method removes smoothness and topological constraints in shape optimization, which results in different optimization formulations. Many topology optimization problems can be formulated in an (equivalent) convex optimization problem, while shape optimizations are typically nonlinear, and often nonconvex [53]. This difference partially contributes to the fact that there are successful implementations of IPM for large-scale111In this work, we refer large-scale optimizations to problems that have a high-dimensional variable space. topology optimization problems [55][58][61][73], but only a few works have presented a shape optimization that uses an IPM as the optimizer [8][50]. In the latter works, the size of the shape optimization problem is only moderate so that the power of IPM is not fully exploited. One of the most successful methods for nonlinear topology optimization is the method of moving asymptotes (MMA) that was introduced by Svanberg in 1987 [105]. In each iteration, MMA generates and solves an approximated convex problem related to the original one. For shape optimization, however, there is as yet no literature that discusses a large-scale problem using MMA.
Shape optimization is a subject frequently considered in multidisciplinary design optimization (MDO) [47]. To design complex engineering systems, MDO considers multiple disciplines and their interactions. For shape optimization, ongoing efforts are devoted to the computation of the shape gradient for coupled disciplines, for example, for steady-state coupled problems [59][79] and for transient coupled problems [63]. While the development for the coupled-gradient is challenging, the reward is accurate derivatives and massive reductions in computational cost, which are essential for large-scale optimizations [76]. In many practical circumstances, the shape geometry is reparameterized with finitely many parameters. A shape reparameterization is typically needed if there are producibility restrictions [67] or if the initial geometric variable is not differentiable [54]. Well-parameterized shape geometry often allows the application of standard optimization approaches [41][51][84] or Newton-Krylov type methods [34]. On the other hand, the achievable shape is limited and dependent on the chosen parameterization. High-fidelity shape optimizations, which are directly based on finite element meshes [68][114] or level-set methods [5][95], exploit the largest shape space possible for real-life problems but lead to challenging optimization problems [37][71].
Manually deriving and implementing shape derivatives can be a laborious and error-prone task. A prominent approach to tackle this challenge is Algorithmic Differentiation (AD) [44], which computes derivatives of a function given as a computer program. In the context of shape optimization, AD has been successfully implemented in open-source solvers, such as OpenFOAM[107], SU2[2], and FEniCS[36], enabling shape design optimization practice on increasingly complex problems. Another ongoing research is the computation of shape Hessians, which are complex objects even for moderate problems. Recently, several works compute approximated shape Hessians and use a Newton-based method for the design optimization [92][93]. In general, large-scale shape optimization is mainly performed using gradient descent type methods so far [94]. In engineering practice, a large number of constraints may be considered. The lack of literature in this regard has motivated our development of MSDM in [28]. In the present work, we simplify its computational framework to a gradient descent akin method. As an important result, the implementation effort and computational cost are reduced significantly. It opens the possibility of shape optimization to a wider range of applications.
2.6 Sensor network localization via semidefinite programming relaxation
Wireless sensor networks (WSNs), which consist of low-cost, low-energy, and multi-functional sensors that can communicate over short distances, provide many opportunities for monitoring and controlling the physical environment when the sensors are wireless linked and deployed in large numbers [1]. WSNs find a wide range of applications in environmental, industrial, urban, health, and other sectors, and we refer to comprehensive surveys in [57][88]. For many applications, awareness of the sensor locations is crucial for a meaningful interpretation of the gathered sensing data, see, e.g., [26][65][90][104]. Sensor network localization (SNL), which estimates sensor locations based on pairwise distance, is therefore considered one of the key enabling technologies for WSN applications.
SNL is a challenging optimization problem, as it is a nonconvex problem and is generally intractable to find the global solution. Among various solution methods, convex relaxation based on semidefinite programming (originally introduced in Biswas-Ye [18]) is regarded as one of the most prominent approaches for global localization [31]. In theory, the semidefinite programming (SDP) relaxation provides the exact solution to an SNL problem if the problem is uniquely localizable [97]. The challenge that has hindered the practical application of SDP relaxation to large-scale SNL problems is primarily computational222Indeed, scalability is a major challenge to the successful application of semidefinite programming to many large-size practical problems.. Well-established SDP solvers that are based on second-order interior-point methods ([7][15][86][106][111]) can solve SNL problems with up to a few hundreds of sensors to arbitrary high accuracy, but they are unable to solve larger problems efficiently. This has motivated the development of advanced modelling approaches to alleviate the computational difficulty, see, e.g., [19][60][109]. Meanwhile, in the last two decades, a number of scalable SDP solvers were successfully developed. For example, the packages PENNON [62] and SDPNAL+ [103] are based on the Augmented Lagrangian method, and SCS [82] is based on the Alternating Direction of Multipliers method (ADMM). A comprehensive review of the latest advancements in the scalability of SDP solvers is given in [74]. In this work, we apply GDAM to solve the Biswas-Ye SDP relaxation of SNL problems. We use the classical logarithmic barrier function approach for the semidefinite cone constraint, and thus our method to solve SDP problems is general. Our computational experiments show that GDAM, when practically implemented, is very competitive in finding moderate accurate solutions to SNL problems of moderate size and is capable of solving very large problems (over 5000 sensors) that are intractable for comparing solvers.
3 Deriving the search direction for single inequality constrained optimizations
In this section, we show the consistent derivation of (GDAM) from our previous work [28] considering a single inequality constraint. First, we review the basic ideas of the modified search direction method and then show the derivation. For the remainder of the paper, we use and instead of and to lighten the notation when it is clear from the context.
In MSDM, at each iterate , we construct a sensitivity matrix that contains the normalized objective and constraint gradient,
[TABLE]
The total differential of the objective and constraint function at are
[TABLE]
A perspective from the input-output system established by the matrix gives
[TABLE]
Applying SVD to the matrix , we obtain
[TABLE]
Thus, an orthonormal bases set is obtained. Each can be used as a base search direction for the variable update, and and are defined as follows:
: by taking as the variable update, we obtain a change in objective as well as in constraint function , which is a decrease in the objective function and an increase in the constraint function.
- -
: by taking as the variable update, we obtain a change in the objective as well as in the constraint function , which is a decrease in the objective function and a decrease in the constraint function.
It is worth mentioning that the update step provides a similar result as the filter approach presented in [39], which tries to minimize the so-called bi-objective optimization problem with two goals of minimizing the objective function and the constraint violation .
With and , we can then rewrite the normalized steepest descent direction as
[TABLE]
where is the angle between and , and is the angle between and . The modified search direction reads
[TABLE]
where is introduced to enlarge the contribution of for the variable update.
In the following, we show the derivation of (GDAM) from (17). By (15), we have
[TABLE]
which is a diagonalization of the symmetric matrix . We have
[TABLE]
where is the angle between and . The eigenvalues of are
[TABLE]
By the definition of and in MSDM, the eigenvectors of can be factorized,
[TABLE]
The respective singular values are
[TABLE]
By the definition of SVD,
[TABLE]
Therefore, we obtain
[TABLE]
where is the angle between the objective function gradient and the constraint function gradient (full details in Appendix A). With we also have
[TABLE]
Inserting (24), (25) into (17) we have
[TABLE]
As we are mainly interested in the direction of the vector field , we can rewrite it as
[TABLE]
with . With we have and thus we get the present search direction (GDAM).
4 Illustrative examples and intuition
In this section, we demonstrate GDAM with two illustrative examples and provide intuitive explanations for its optimization behavior.
4.1 The behavior of the method with a fixed
We consider a quadratically constrained nonconvex optimization problem,
[TABLE]
The problem has multiple local solutions due to nonconvexity. Figure 1 shows that there are two local solutions to the considered problem in the depicted variable space:
A critical point of , illustrated as a star, lies inside the feasible set.
- 2.
A KKT solution, illustrated as a dot, is located at the boundary of the constraint.
It can be observed that the optimization trajectories all converge to the critical points (shown as stars). Moreover, the intersection points of the two left trajectories and the constraint boundary are approximate KKT points. While the former case is a common behavior of gradient flow approaches, the latter case is unique to the present method.
4.2 The behavior of the method in terms of
We show the behavior of the method in terms of different by solving a 2D optimization problem analytically. The optimization problem reads,
[TABLE]
The present search direction field reads
[TABLE]
We define an initialization as . Let , then, the trajectory of the present search direction field is
[TABLE]
Let be a point on the trajectory with a maximal component, then we have
[TABLE]
and
[TABLE]
Let , then, , , the trajectory will converge to the curve that is a union of the parabola
[TABLE]
and the interval on -axis as is shown in figure 2 (full derivation details in Appendix A).
We now give an estimation of the intersection point of the trajectory () with the constraint boundary,
[TABLE]
For a simple presentation, we first assume that . By
[TABLE]
the optimization trajectory is monotonically decreasing as , and thus . Furthermore, ensures that holds for any . Combining both arguments, we obtain an estimate for the intersection point ,
[TABLE]
Obviously, allows a similar study. Thus we obtain
[TABLE]
where is the KKT solution. We note that the error bound (36) is derived specifically for the problem (29) and is a very rough estimation that is based on a special point . In the following section, we will give a general result regarding such an error bound, as well as other fundamental theoretical guarantees.
5 Global convergence
In this section, we give fundamental theoretical guarantees of the method regarding global convergence. We shall first analyze the single constraint optimization problem (SCOP), whose results can then be easily generalized to multiple constraints via the logarithmic barrier function (see next section).
For now, let’s consider
[TABLE]
Recall the iterative update formula of (GDAM),
[TABLE]
We make use of the following dynamical system to study the convergence of the method,
[TABLE]
In addition to the theoretical convergence guarantee, we report results on the global behavior of the continuous-time trajectory of the method, which can be useful for the design of practical algorithms.
5.1 Assumptions and basic results
For the analysis of the system (DS), we make the following assumptions:
- (A1)
Coercive condition for the objective function ,
[TABLE]
- (A2)
in the feasible set ;
- (A3)
in the feasible set ;
- (A4)
and are twice continuously differentiable functions.
If the function is coercive and continuous, then it has a global minimizer. The assumptions (A2) and (A3) are introduced due to our continuous-time analysis, as we need system (DS) to be well-defined. We will discuss the case where (A2) is excluded in subsection 5.2. In practice, (A3) may not always be satisfied. To escape the critical points of constraint functions, we suggest using the gradient descent search direction (see also section 6).
Let be the solution of the system:
[TABLE]
with as the initial design in the feasible set and as the maximal existence interval of in the set ,
[TABLE]
We refer to [85] for a formal definition of the maximal interval of existence. In the following, we show some basic properties of the trajectory .
5.1.1 Time derivative of and
We first observe the changes of the objective function and constraint function along the solution trajectory . This is done by taking the time derivatives of and ,
[TABLE]
similarly,
[TABLE]
where
[TABLE]
Obviously, (38), (39), and (40) hold under Assumptions and as . In the following, we derive basic theoretical results based on the above three equations.
5.1.2 Boundedness of the solution
First, it is easy to observe from (38) that decreases along with a fixed . Furthermore, there exists a global lower bound for the objective function by the Assumption (1). We can show that the solution trajectory always stays in a bounded set that is dependent on the initialization . The result is formalized as follows.
First, we define the bounded set in relation to a feasible initialization .
Definition 5.1** (Bounded set ).**
Given a feasible initialization ,
[TABLE]
The bounded set is defined by
[TABLE]
Lemma 5.2** (Boundedness).**
Suppose that assumptions (A1) - (A4) hold, and let , . Then the trajectory always stays within the bounded set associated with the feasible initialization .
Proof.
Based on the deformation of the objective function along the trajectory shown in (38), and with , , we have
[TABLE]
Therefore,
[TABLE]
Thus, the proof is complete. ∎
Remark 1*.*
Lemma 5.2 is a basic result that implies that the integral of (38) is always bounded under our assumptions, i.e.,
[TABLE]
5.1.3 Lipschitz continuity of the vector field
The intersection of the feasible set and the bounded set defines the bounded feasible set.
Definition 5.3** (Bounded feasible set ).**
Given a feasible initialization ,
[TABLE]
The bounded feasible set is defined by
[TABLE]
In the bounded feasible set , we show Lipschitz continuity of the present vector field .
Lemma 5.4**.**
Suppose that assumptions (A1)-(A4) hold and let the bounded feasible set be non-empty. Then, the vector field is Lipschitz continuous for , i.e.,
[TABLE]
where is a positive constant.
Proof.
By assumption (A4), the second derivatives of and are bounded in . Therefore, and are Lipschitz continuous in . By assumption (A2) and (A3), we have
[TABLE]
for some positive numbers . Therefore, and are Lipschitz continuous. Thus, we have a Lipschitz continuity for with . ∎
5.1.4 A cosine measure for the central path neighborhood
For the studies of inequality constrained optimizations, the central path is recognized as “a fundamental mathematical object” [12]. For example, its total curvature is used in the study of the existence of strongly polynomial algorithms for linear programs in the context of IPMs [6]. The equation (40) inspired us to give a cosine measure for the central path and its neighborhood.
Recall the normalized central path condition (7),
[TABLE]
which can be formulated as the angular relation between and
[TABLE]
Under our assumptions (A1)-(A4), is a continuously differentiable function of in . Additionally, we note that reaches its own minimum when is at the central path. Combining both arguments allows us to conveniently define a neighborhood of the central path using the level set of .
Definition 5.5** (neighborhood).**
The neighborhood of a central path is defined as
[TABLE]
Intuitively, is a cone-like neighborhood around the central path. Obviously, shrinks to the central path as With the definition of the neighborhood, we can show the following behavior of the optimization trajectory related to the constraint function .
Lemma 5.6**.**
Suppose that assumptions (A1)-(A4) hold, is fixed, and , then the constraint function decreases along the trajectory out of the neighborhood , and increases in , with defined as
[TABLE]
Proof.
Based on the deformation of the constraint function along the trajectory shown in (39),
[TABLE]
we get a proof directly. ∎
Remark 2*.*
Lemma 5.6 hints that the optimization progresses faster when an iterate than vice versa. This result is useful in designing practical algorithms where mechanisms/heuristics can be developed to adjust the stepsize and/or the parameter so to achieve a faster convergence.
5.2 Global convergence to critical points of
We show that, as long as , the present optimization trajectory always converges to a critical point of the objective function.
Theorem 5.7**.**
Suppose that assumptions (A1),(A3), (A4) hold and in the whole . The trajectory converges to a connected subset of critical points of the objective function by . Especially, if the critical points of the objective function are isolated, then
[TABLE]
where is a critical point of .
Proof.
First, notice that the present flow may be finite-time convergent333If , the present system reduces to the normalized gradient flow for unconstrained minimization, which is shown to be finite-time convergent using nontrivial nonsmooth stability analysis [32, Theorem 8].. The analytical example in section 4.2 shows that the present system can be finite-time convergent: 1) the analytical trajectory has a finite length, and 2) that the speed of the flow has a minimum norm of by the definition of the present system. To overcome this difficulty, we introduce a time-reparameterized Ysystem:
[TABLE]
The system (50) has the same orbit as (37) in the subset . Under our assumptions, the Y-system is uniformly Lipschitz continuous and continuous in . By Picard’s existence theorem, it has a unique solution with an infinite existence interval.
For , consider an integral along ,
[TABLE]
Lemma 5.2 ensures
[TABLE]
with some positive number independent of . Hence
[TABLE]
Notice that
[TABLE]
Notice that , we have
[TABLE]
By Cauchy-Schwarz inequality, we have
[TABLE]
Further, we have
[TABLE]
Therefore,
[TABLE]
By assumption (A4) and Lemma 5.2, the right-hand side of above equation is bounded. There is a constant so that
[TABLE]
Hence, we have a Lipschitz continuity for :
[TABLE]
Now, we claim
[TABLE]
Otherwise, there is a sequence of and a positive constant so that
[TABLE]
Choosing , then
[TABLE]
for any Therefore,
[TABLE]
This is a contradiction to (52). Hence
[TABLE]
This means that approaches the connected subset of the critical points. An isolated condition makes sure that
[TABLE]
for some critical point of the objective function. Recall that the system (50) and (37) are orbit equivalent in , therefore
[TABLE]
∎
Remark 3*.*
The global behavior of the system (37) looks like a gradient flow. Obviously, as , the system reduces to the normalized gradient flow of the objective function. Specifically, when converging to the same critical point, the present trajectory is homotopic with the gradient descent trajectory for by the continuous dependence of the solutions to differential equations on parameters.
Remark 4*.*
The analytical trajectory may find a critical point of constraint without condition (A3). In practical implementations, we suggest using the gradient descent to escape these critical points.
Remark 5*.*
We note that the assumption (A2) is excluded in Theorem 5.7. If there exists some critical point of in the feasible set , the optimization trajectory may converge to it (also compare to Lemma 5.9 where the assumption (A2) is included for the analysis). In such a case, obviously, a first-order solution is found for the considered constrained optimization problem. An example is illustrated in figure 1 for a 2D problem.
5.3 Global convergence to KKT solutions at the constraint boundary
In the previous subsection, we have shown that the continuous optimization trajectory converges to a critical point of the objective function , regardless if lies in the feasible set or not. For cases that , this implies that we are able to use the present optimization trajectory to obtain a (first-order) optimal solution. What remains is the question of whether the present method can find a (near-optimal) solution at the constraint boundary, i.e., the second case presented in section 4.1. In the following, we give an affirmative answer to this question. To give a rigorous presentation, we shall first define what we meant by the wording “find”, based on which we establish an error bound of the solution to a KKT point.
5.3.1 A solution is found at the intersection point of the optimization trajectory and the constraint boundary
As illustrated in figure 2 of the second preliminary example, the optimization trajectories , with different values, converge to the critical point of the objective function while intersecting the constraint boundary at some time . The larger the chosen, the closer the intersection point is to the KKT solution. We consider these intersection points as the solutions found by the present method to a single constrained problem. A formal definition is given as follows.
Definition 5.8**.**
(Solution point ) Let and , and suppose that the optimization trajectory intersects the constraint boundary at a point , where is the minimum time for the trajectory to reach the constraint boundary. We say the intersection point is a solution point obtained by the system (DS) for the problem (SCOP). We have
[TABLE]
Next, we argue that under the assumption , the optimization trajectory , with and , always intersects the constraint boundary. Although this statement can readily be made by following Theorem 5.7, we give an independent and simple proof for clarity.
Lemma 5.9**.**
Suppose that assumptions (A1) - (A4) hold, and let , . Then the trajectory must go out of the feasible set. Hence, a solution point must exist.
Proof.
We prove the lemma by contradiction. Suppose that stays in the feasible set for any , then the vector field keeps continuous in a neighborhood of trajectory , since there is no critical point of and in . Lemma 5.2 ensures that
[TABLE]
with some positive numbers and . On the other hand, is uniformly Lipschitz continuous in by Lemma 46. Picard’s existence theorem implies that The integral of (38) shows (see also Remark 1)
[TABLE]
For any , holds, therefore
[TABLE]
(61) is a contradiction to (60). This completes the proof. ∎
5.3.2 An upper bound of the solution time relative to
We estimate the solution time that is equivalent for the trajectory to reach the constraint boundary. An upper bound for is given as follows.
Lemma 5.10**.**
Suppose that the assumptions (A1)- (A4) hold, and let , , the time in which the optimization trajectory reaches the boundary of the feasible set is at most with independent of .
Proof.
By Lemma 5.9, the trajectory must go out of the feasible set. Let be the time in which the trajectory first reaches the boundary of the feasible set, then (60) holds for . By (38),
[TABLE]
Hence
[TABLE]
which implies
[TABLE]
so that holds. This ends the proof. ∎
Lemma 5.10 implies that the system (DS) finds a solution in .
5.3.3 An upper bound of the solution error relative to
Lastly, we shall give an error measure of a solution (intersection) point to a KKT solution . The KKT conditions for the problem (SCOP) write
[TABLE]
It is obvious that the conditions (65a) and (65b) are equivalent to the normalized centrality condition (7), repeated as the follows for convenience,
[TABLE]
In addition, (65c) is satisfied with (59). Following the normalized central path condition (7), a proper residual for an intersection point can be defined using the L2-norm,
[TABLE]
Based on (66), we give a formal definition for an error measure .
Definition 5.11** (Solution error).**
An error measure for a solution point is defined as the L2-norm of the residual (66),
[TABLE]
In the following, we show that the error of the method is upper bounded relative to the parameter .
Theorem 5.12**.**
Suppose that assumptions (A1)-(A4) hold, and let , . Then, the solution error of is upper bounded in relative to the parameter ,
[TABLE]
Proof.
By Lemma 5.9, the optimization trajectory must intersect the constraint boundary when initialized in the feasible set . By (59), we have at the intersection point ,
[TABLE]
By (39),
[TABLE]
Thus,
[TABLE]
By Definition 67, we have
[TABLE]
Therefore,
[TABLE]
Our proof is thus complete. ∎
Remark 6*.*
By Lemma 5.10 and Theorem 68, we can conclude that the time complexity for the optimization trajectory to achieve an approximate solution is
[TABLE]
The result matches the ergodic convergence rate of first-order methods for general optimization problems.
6 The method for multiple constraints: a formulation based on the logarithmic barrier function
Recall the logarithmic barrier function for the problem (COP),
[TABLE]
We first notice that the barrier function grows without bound if and it is not differentiable at the boundary of the feasible set. To overcome this difficulty, we consider a subset of the feasible set,
[TABLE]
where M is a sufficiently large positive number so that approximates the original feasible set . The barrier function is twice continuously differentiable in and thus fulfills the assumption (A4).
Set
[TABLE]
The original problem (COP) can be approximately reformulated as
[TABLE]
so that the present system (DS) may be applied.
Notice that the global behavior shown in section 5 is based on the assumption that in the feasible set . In general, this may not always be the case for the function . To escape the points where , we suggest using the gradient descent . Thus we get a dynamical system for solving the approximate problem (75),
[TABLE]
Note that in computational practice, rarely occurs. The second ODE of the above system serves as a safeguard for the present method.
Corollary 6.1**.**
Suppose that the assumptions (A1) (A2) and (A4) hold, and in the feasible subset . Let be the solution point of the system (DSM) for the approximate problem (75), then
[TABLE]
And the solution error to a KKT solution of the problem (75) is
[TABLE]
Proof.
A similar method as used in Theorem 68 gives the proof. ∎
To conclude, using the barrier function formulation for problem (COP), the resulting trajectory achieves an approximated local solution that locates on the boundary of the subset . Notice, too, that the system (DSM) does not depend on the choice of , and exhaust , i.e., . This means that the resulting trajectory keeps approaching the boundary of the original feasible set by crossing the boundary of any subset dependent on . Therefore, it eases the practical implementation of the method since no extra parameter needs to be defined for the stopping criterion, but rather, a check for each constraint violation would be sufficient.
7 Algorithms and preliminary computational experiments
We present two computational implementations of GDAM and show their preliminary computational tests on common benchmarks for both convex and nonconvex constrained optimizations.
7.1 Computational algorithms
When implementing the present method for the problem (COP), the canonical first-order optimization procedure is given as below,
[TABLE]
with
[TABLE]
In this work, we design computational algorithms for problems whose minimizers are non-trivially located at the constraint boundary, as is the case for our targeted applications, shape optimization and sensor network localization.
7.1.1 Fix-length stepsize
Obviously, the choice of is essential to the computational performance of the method, and, certainly, there are many ways that can satisfy the purpose of the convergence. In this work, we propose a fixed-length step size rule for the method, i.e.,
[TABLE]
where is some positive parameter. As a result,
[TABLE]
The step size rule can be considered aggressive, since at a central point , as , . A fixed-length step size (78) aggressively re-scales the step size using the reciprocal of , resulting in a finite variable update. A fix-length stepsize is also used in subgradient methods to deal with nonsmoothness at the solution by dynamically adjusting towards zero, see, e.g., [16, Proposition 3.2.7]. For constrained problems, when more than one inequality constraints are active at a solution, the solution vertex itself can be nonsmooth and contributes to the ill-conditioning of the logarithmic barrier function at its close neighborhood. A fix-length stepsize bounds the range of an update and can thus be considered a trust-region-like approach that stabilizes the iterative process. Finally, various line search methods can be added to dynamically adjust the parameter .
7.1.2 Vanilla implementation
A vanilla implementation based on the fixed-length stepsize approach is summarized in the pseudocode in Algorithm 1. For constant parameter , we terminate the optimization whenever a constraint is violated.
When an additional line search is equipped, the stopping criteria is set as to simultaneously fulfill the following two conditions:
Any constraint is violated;
- 2.
.
For example, the algorithm can be equipped with a simple backtracking line search to improve the performance, i.e., we reduce the stepsize by a scaling factor whenever a constraint is violated until .
7.1.3 Accelerated implementation
A prominent approach to improve the convergence rate of the gradient descent method is Nesterov’s Accelerated Gradient (NAG) method [81]. We apply NAG to the present method by replacing the gradient descent update step with a GDAM update step. We summarize the pseudocode in Algorithm 2. Notice that compared to the vanilla implementation, a momentum parameter and a sequence of auxiliary variables are added.
The following shows preliminary computational experiments of the proposed vanilla and accelerated GDAM algorithms on common benchmark tests. Computational results show that the present implementations are robust in finding near-optimal solutions, which can be desirable in practical engineering applications.
7.2 Experiments on IEEE CEC 2006 tests with the vanilla implementation
We first show numerical experiments for the inequality constrained problems presented at the EA competition at the 2006 IEEE Congress on Evolutionary Computation [66]. These benchmark tests are widely used in the community of evolutionary algorithms. We choose them as test examples for three reasons. First, they are well defined constrained optimization problems and have different characteristics [89]. Second, they are nontrivial to solve with first-order methods. Third, the relatively simple formulation of the optimization problems allows us to gain deeper insight into the numerical behavior of the present method. We choose the inequality constrained optimization problems for the experimentation. The problem is excluded because it has a feasible set consisting of disjointed spheres. For the problem , which has a feasible set consisting of two disconnected sub-regions, we choose initial designs in the sub-region that contains the reported optimal solution.
We conduct numerical experiments for the vanilla implementation (Algorithm 1) with and tuned fixed-length stepsize parameter . All bound constraints are treated as inequalities. Random initializations that are away from the reported optimal solution are selected in the feasible sets. We evaluate the error in the objective function value by
[TABLE]
where is the reported global optimum. As shown in table 1, apart from the problem G19, vanilla GDAM finds solutions with objective function value errors less than 2e-2 compared to the reported global optima. This is somewhat surprising because GDAM is a local search algorithm. More accurate results can be obtained when we choose shorter step sizes and larger parameters . For the problem G19, the reported optimal solution (1.6699e-17, 3.9637e-16, 3.9459, 1.060e-16, 3.2831, 9.9999, 1.1283e-17, 1.2026e-17, 2.5071e-15, 2.2462e-15, 0.3708, 0.2785, 0.5238, 0.3886, 0.2982) is not achievable with the present fixed-length stepsize rule with a reasonable constant parameter . For the problems G01, G04, G06, G07, G08, G24, we find sub-optimal solutions that are close to the reported optimal designs. For the problems G09, G10, G18, G19, sub-optimal solutions close to local minima are found.
7.3 Experiments on Maros and Meszaros QPs with the accelerated implementation
The Maros and Meszaros test set [75] is a collection of hard convex quadratic programming problems from a variety of sources. The problems included are of the following form
[TABLE]
where , is a symmetric positive definite matrix, is a sparse matrix, , , and . Note that components of and may be infinite, and .
The algorithms 1 and 2 can easily be extended to handle linear equality constraints by the gradient projection method. Suppose that is nondegenerate, following Rosen’s method [91], the projection matrix for the subspace spanned by the linear equalities writes
[TABLE]
The projected vector of any vector reads
[TABLE]
Then, the projected GDAM search direction writes,
[TABLE]
Practically, we can precompute and store the projection matrix . For large-scale and sparse problems ( and are large, and is sparse), we may make use of Cholesky factorization for for more efficient computation. More advanced methods for treating linear equalities, such as [25], may be applied to the present method.
We use a subset of the Maros and Meszaros instances for the experiments. These instances were carefully chosen to cover a variety of problem models for performance comparisons among dedicated QP solvers [78]. We evaluate the error in the objective function value by
[TABLE]
where is the smallest objective function value of the solvers. In table 2, we show the results of the present method and compare it with well-established solvers Matlab quadprog, Gurobi [46], and OSQP [101]. Both Matlab quadprog and Gurobi implement state-of-the-art second-order solvers for convex QPs, and OSQP is a first-order convex QP solver based on the alternating direction method of multipliers [23]. For Matlab quadprog and Gurobi, we use the default settings. For OSQP, we set the absolute and relative tolerance to for more accurate solutions (default settings are both ). For GDAM, and a backtracking line search with reduction parameter are chosen. We set the maximum number of iterations to 10000 for both OSQP and GDAM. All experiments are run on a Linux workstation with a 2.70 GHz 6-core AMD Ryzen 5 5600U processor and 16 GB RAM.
As shown in table 2, accelerated GDAM finds near-optimal solutions for all instances except UBH1, which demonstrates its robustness. Gurobi is the fastest and the most robust solver in general. GDAM is more efficient in solving the instances CVXQP*. By setting smaller tolerances and a larger number of maximum iterations, OSQP finds more accurate solutions for the instances BOYD1, CONT*, and DTOC3. We note that no presolve and preconditioning strategies have been implemented for GDAM, unlike other mature QP solvers, since the goal here is to show the robustness of the present algorithm. On the other hand, it leaves room for improvements in computational efficiency.
8 Application to Shape Optimization
We show applications of vanilla GDAM (Algorithm 1) to shape optimization problems arising in Computational Mechanics. We begin by briefly introducing the shape optimization framework used in this work and then show the optimization results using GDAM.
8.1 Vertex Morphing
The shape optimization problems we aim to solve are almost always ill-posed due to the very large discrete design space based on the finite element mesh. To tackle the problem, we use the Vertex Morphing method (VM) introduced in [52].
The idea of the Vertex Morphing is to control the discrete surface coordinates with design controls , filtered by a filter function. The explicit filtering used in VM is the convolution of the coordinate field with a kernel. In the discretized system, it is a matrix-vector multiplication that is the summation of nodal contributions that are weighted with the kernel function. VM distinguishes itself with standard explicit filtering, in which it applies the filtering process twice.
First, the so-called forward mapping step that uses the linear filtering matrix R is defined as follows:
[TABLE]
Similarly, the change of the control is mapped onto the change of the design configuration
[TABLE]
The design controls are the control variables of the gradient-based optimization. The number of variables is equivalent to the number of surface coordinates. Following the chain rule of differentiation, the sensitivities of a response function with respect to the discretized geometry x are backward mapped to the design control using the adjoint or backward mapping matrix , with for regular grids,
[TABLE]
Equations (87) and (88) can be used in a gradient descent framework, ensuring smooth shape updates in each iteration.
8.2 Academic example
First, we show an academic convex problem. The objective is to maximize the volume of a small sphere. This small sphere is located inside a bigger sphere that acts as the geometric constraint. The shape of both spheres are represented with finite element meshes [114]. The optimization problem writes
[TABLE]
where is the volume function, is a point-wise defined geometric constraint for the -th design node, is the number of nodes of the design mesh, and is the field of nodal coordinates of the design sphere mesh. The number of nodes of the small sphere (design sphere) is 19897. Thus, the total number of design variables is 59691 and the total number of constraints is 19897. We use the logarithmic barrier function for multiple constraints and choose the parameter . In figure 3, the shape variation process with depicted iterations is shown. Initially, the design sphere is located close to the boundary of the constraint sphere. During the shape variation process, it moves towards the center of the constraint sphere while adapting its shape at each iteration. One could recognize easily that the central path of this optimization problem is being approached and followed until the solution is found when constraints become active.
8.3 Real-world example
We consider a real-world application to shape optimization. The present method is implemented in ShapeModule, which is a flexible solver-agnostic optimization platform and provides optimization algorithms as well as shape control methods, such as Vertex Morphing [52]. The optimization problem is to minimize the mass of a frame structure under load-displacement constraint (i.e., the displacement of every surface node is bounded). The optimization problem writes
[TABLE]
where is the function for the mass, is a point-wise formulated displacement constraint for the -th node, is the number of nodes of the design surface mesh, is the field of nodal coordinates of the design surface mesh, and is the nodal displacement field. The number of design variables is 144423, and the number of constraints is 48141. Note that for multiple constraints, we can use the logarithmic barrier function formulation as in the previous test examples. Each single displacement constraint gradient can be efficiently computed using the adjoint sensitivity analysis. In this application, we use the load-displacement sensitivity provided by the software OptiStruct to conform with a standard industrial design chain. We choose the parameter . We note that there is a rather large gap between our theoretical analysis and the large-scale implementation of the method, as our results are asymptotic and the oracle is a black-box. Nevertheless, we were able to obtain results that are qualitatively in agreement with our theory.
In figure 4 we show the initial frame design and the optimized shape design after 194 iterations. The mass of the structure is reduced by as shown in figure 5a. In figure 5b, we show a plot of maximum constraint value of each iteration. In figure 6, we show that the optimization is able to approach and follow a central path within the -neighborhood.
Remark 7*.*
By following a central path, an intermediate design improves not only the objective function but also the constraint function. Take the design of iteration 80 as an example: the mass is reduced by , and the displacement is reduced by . These designs may enrich the design options if the original problem is reformulated as a bi-objective optimization problem, in which both mass and the maximum displacement are set as objectives. The resulting intermediate designs alongside a central path are approximated Pareto solutions.
Remark 8*.*
This section focuses on the vanilla implementation of GDAM and demonstrates its robustness in solving node-based shape optimization problems, as robustness is considered a key to practical success for real-world shape design applications [20]. Accelerated GDAM can be applied to further reduce the number of iterations. However, both accelerated GDAM and Vertex Morphing introduce their own auxiliary variables into the optimization process: in the Algorithm 2 for GDAM and the control variables in Vertex Morphing, respectively. Moreover, the forward and backward mapping between the control space and variable space in VM makes an application of accelerated GDAM even more delicate. The most efficient and robust way to combine both methods is worthy of further investigation. And this is of independent interest than this manuscript, which we leave to future work.
9 Application to Sensor Network Localization via Semidefinite Programming
In this section, we apply the present method to solve sensor network localization (SNL) problems via the semidefinite programming (SDP) formulation [18]. First, we briefly introduce the basics of SDP-based SNL, after which we show computational experiments and comparisons with state-of-the-art first- and second-order SDP solvers.
9.1 Sensor Network Localization via Semidefinite Relaxation
Sensor Network Localization (SNL) aims to recover unknown sensor locations with (partially) given distance information between them, and thus can be considered an inverse problem. Consider a nonlinear least square (NLS) formulation for SNL problems with sensors and anchors in 2d:
[TABLE]
where the location of each sensor is to be determined, and the location of each anchor is known. The distance and are known observations between sensor-sensor pairs and sensor-anchor pairs, respectively. Often, the distances are detected in a given radius .
We can formulate SNL as a constrained optimization problem,
[TABLE]
SNL is a typical nonconvex quadratic optimization problem [72][80] for which one is not content with finding a local solution but the global optimum, so that the true sensor locations can be recovered. Following the work [18], we relax the NP-hard nonconvex problem (91) using SDP. First, we rewrite the distances of sensor-sensor pairs,
[TABLE]
where is the -th unit vector, is a matrix whose -th column is , and . Similarly, we have
[TABLE]
where
[TABLE]
and
[TABLE]
Furthermore, let
[TABLE]
By lifting the variables, we obtain an equivalent formulation in the variable for (91),
[TABLE]
The problem (96) is, still, a very challenging nonconvex optimization problem. However, by lifting the variable, the nonconvexity is now only allocated at the rank constraint
[TABLE]
The semidefinite relaxation is to drop out this rank constraint, and thus a convex optimization problem is obtained,
[TABLE]
The dual of (SNLP) writes,
[TABLE]
The semidefinite relaxation is mathematically elegant, and many applications show that, indeed, SDRs result in tight approximations to the original NP-hard problems, see, e.g., the seminal work on the maximum cut problem [42]. In the case of SNL, the work [97] shows that SDR provides the exact solution to the original problem if the sensors are uniquely localizable with provided distance information. On the other side, the cost of SDR is the much-increased problem dimensions and the introduced nontrivial PSD cone constraint, which are computationally challenging for large-scale applications.
Notice that the SDR formulation (SNLP) of an SNL problem is a feasibility program —any feasible solution solves the optimization problem. The present method is not directly applicable since the optimization iterates in the feasible set. On the other hand, the dual formulation (SNLD) is an inequality constrained problem and is, therefore, a well-suited optimization formulation for the present method. We implement the accelerated GDAM (Algorithm 2) in MATLAB R2022a to solve the dual problem (SNLD). Further details on implementation are reported in Appendix B.
9.2 Sensor Network Localization examples
We randomly generate a set of examples with nodes representing the true sensor locations , with ranging from 100 to 10000. Additional four anchor nodes are generated whose positions are known a priori. A total of pairs of distances information are measured within a radio detection range . To measure the estimation accuracy, we use the root-mean-square-distance (RMSD), which has been widely used on SNL to test the performance and is defined as
[TABLE]
where is the estimated position and is the true position of sensor , respectively. All experiments are run on a Linux workstation with a 2.70 GHz 6-core AMD Ryzen 5 5600U processor and 16 GB RAM.
9.2.1 GDAM results
We report the runtimes and RMSDs of the present method in table 3. It can be seen that the obtained solutions are of moderate accuracy in terms of the RMSD. The eigenvalues of the solution matrices show that they are near rank 2 matrices. Thus, the present method not only solves the SDR problem (SNLP) approximately, but also solves the rank-constrained SDP problem (96) to moderate accuracy. As (96) is equivalent to the original feasibility problem (91), we can hope that the present method provides near-optimal solutions to the original NP-hard SNL problems. Indeed, as we observed in our extensive numerical experiments, using the obtained SDR solutions as initializations, we can accurately recover all the true sensor locations by applying local search methods such as gradient descent for (NLS). We show some of the graphical results in figure 7 and 8. The left figures show the results obtained by applying accelerated GDAM for the SDR (SNLD); the right figures show refinements of the GDAM solutions by applying gradient descent to the original and low-dimensional problem (NLS).
9.2.2 Comparisons
We compare results in terms of runtime and RMSD with three well-established solvers: DSDP v5.8 [15], a generic SDP solver based on a second-order dual interior-point method, SCS (Splitting Conic Solver) version 3.1.0 [82], a generic large-scale conic solver based on ADMM, and SDPNAL+ version 1.0 [103], a large-scale SDP solver based on a semismooth Newton-CG augmented Lagrangian method. We run the same examples listed in table 3.
DSDP: We use the default settings of DSDP, where the tolerance for the optimality gap is set to be . DSDP is an efficient second-order solver for SNL problems that exploits the problem feature which leads to reduced storage and increased efficiency. Table 4 shows that DSDP can solve problems to very high accuracy. For problems with size , the workstation is running out of memory. While DSDP achieves high accuracy for the SNL problems, GDAM requires a much less time in finding an approximate solution. To recover the true sensor locations, we run GD for the low-dimensional nonlinear least squares problem (NLS), a well-established approach for SNL problems [17] that belongs to the general two-phase strategy for solving SDP relaxation problems, where a SDP solutions is used as an initialization for nonlinear programming methods to locally solve the original NP-hard problem [72].
SCS: We use ‘sparse-indirect’ for the linear solve setting and choose convergence tolerances between and , and set the maximum runtime to be 24 hours. We observe that SCS encounters numerical difficulties if the objective is set to be empty, as is the case of our feasibility formulation (SNLP). Therefore, We add an objective to the model. We find that the runtime is quite sensible with the choice of the sign. Therefore, we tuned the sign and report results with better performance. We use Yalmip (Version 31-March-2021) [69] for the problem modeling and the interface to SCS.
We report the results of SCS in table 5. For problems with , a tolerance with yields comparably accurate results as GDAM. For the problem with , a smaller tolerance is needed to obtain a good estimation as initialization for local refinement, as shown in figure 9. For all the instances, GDAM is more competitive in the runtime. For example, for the problem with , GDAM takes about 12 minutes to find a moderate accurate solution with RMSD , and SCS finds an inferior solution after five and a half hours.
SDPNAL+: SDPNAL+ v1.0 is a two-phase solver in which an ADMM algorithm is used in phase I to provide an initial point for the phase II algorithm, which is a semismooth Newton-CG augmented Lagrangian method. We use the default settings of SDPNAL+ and set the tolerances to and , and set the maximum runtime to be 24 hours. We solve the test examples shown in section 9.2.1 and report results in table 6. Comparing table 6 and table 5, we see that the performance of SDPNAL+ and SCS are fairly close. In about the same runtime, SDPNAL+ finds more accurate solutions for problems with , while SCS finds a more accurate solution for the problem with (see SDPNAL+ result in figure 10). A comparison of table 6 and table 3 shows that GDAM is more efficient than SDPNAL+ in finding moderate accurate solutions in terms of RMSD for the considered SNL problems.
Remark 9*.*
Large-scale semidefinite programs arising from the relaxation of SNL problems are challenging convex optimization problems. Even advanced large-scale SDP solvers, such as SCS and SDPNAL+, need to make a compromise between the computational time and the accuracy of a solution. A practical difficulty lies in how to choose such a tolerance a priori. Take the example of , SCS requires to obtain a moderate accurate localization solution, which is an order of magnitude smaller than that required for problems with . For the same problem , a practical tolerance for SDPNAL+ may lie between and , whereas the difference in the runtime of the two tolerances is about 18 hours. On the other hand, without the knowledge of the true sensor locations, it may not be straightforward to design a practical stopping criterion. In this regard, GDAM shows its practical advantage by only using a single fixed parameter (universally for different numbers of sensors, radius, and anchor locations). In theory, is related to solution accuracy (see Theorem 68), but its practical implications for large-scale SDP problems may be more profound, which is worth further investigation.
Remark 10*.*
Compared to SCS and SDPNAL+, the number of iterations needed for GDAM increases only slowly as the dimension of the problem grows. Indeed, GDAM is the only method we are aware of that is capable of solving the SDP model (SNLP) (or (SNLD)) to 10000 sensors on a laptop workstation. This clearly shows the potential of GDAM as a practical optimization method for some of the large-scale and difficult constrained optimization problems.
10 Discussion
This paper proposes a gradient descent akin method for solving constrained optimization problems. GDAM may be considered a gradient descent method for constrained optimization and a first-order interior-point method. We give essential theoretical guarantees on the global convergence of the method to first-order solutions. We present computational algorithms based on GDAM and show their applications to two engineering problems, shape optimization and sensor network localization. Computational experiments demonstrate that GDAM is robust and very competitive in finding moderate accurate solutions and scales well to very large problems. Given the rather big difference in the characteristics of the two applications, we believe that GDAM could be suitable for other large-scale constrained optimization problems by providing inexact but useful solutions.
Acknowledgements
The author LC thanks the financial support of the German National High Performance Computing (NHR) Alliance. We are grateful to the ShapeModule team at the BMW Group for providing a real-world model and their shape optimization framework.
Appendix A Derivation of the analytic trajectory for problem (29)
For problem (29), the search direction field reads
[TABLE]
By the uniqueness theorem of the ordinary differential equation, the integral curves of the vector field has non-vanishing component when the initial design satisfies Hence we have,
[TABLE]
While , let , then we have . Thus,
[TABLE]
We rewrite (100) as
[TABLE]
Solving equation (101) we get
[TABLE]
Or,
[TABLE]
Similarly, for , the trajectory has the form
[TABLE]
Assume an initial design , then we get
[TABLE]
Hence,
[TABLE]
Let , then we obtain the trajectory of
[TABLE]
Let be the point in the trajectory with maximal component, then
[TABLE]
This gives
[TABLE]
and
[TABLE]
Let , then, and . The trajectory converges to the curve that is a union of the parabola
[TABLE]
and the interval on -axis.
Appendix B Implementation detail of GDAM for SNL
We report more implementation details in addition to the section 7.1 for solving the SNL problems.
B.1 Logarithmic barrier function for the positive semidefinite cone constraint
In the following, we briefly show how the PSD constraint can be tackled using the standard logarithmic barrier function. For the primal SDP formulation (SNLP), the PSD cone constraint of the matrix variable can be expressed as only has non-negative eigenvalues, i.e.,
[TABLE]
where we assume for a general presentation. Using the logarithmic barrier function for (112), we have
[TABLE]
The gradient of with respect to the matrix variable writes
[TABLE]
For the dual SDP formulation (SNLD), the PSD cone constraint can be formulated similarly. Notice now that the variable is the dual variable , and so we have
[TABLE]
The gradient of writes
[TABLE]
where . From (114) and (116), we see that in order to use a gradient-based method with the logarithmic barrier framework, a matrix inversion needs to be done for a matrix in for both the primal and dual formulation.
B.2 Presolve
Presolve is an important part of the practical implementation of optimization algorithms that transforms the input problem into an easier one. For SNL problems, we use a simple presolve strategy that rescales the network geometry to the range . Numerically, it can be regarded as a preconditioning of the formulated dual SDP problem (SNLD). Compared to the canonical sensor range , which is often presented the literature, the dual IPM solver DSDP justifies the efficacy of our presolve strategy with a reduced number of iterations. We note that general purpose presolve/preconditioning strategies (see, e.g., [113]) will probably further improve the performance of the method, which we leave for future work.
B.3 Strictly feasible initialization
Different initialization strategies are available for IPMs in solving SDPs. In this work, we use the Phase I-then-Phase II method, i.e., we first try to find a feasible point (and in our case one for the dual problem), and then start the main solve routine. Notice that the dual problem (SNLD) is feasible when and for all and for all , and we denote this zero initialization as for convenience. The point is, however, not an interior point in the feasible set for which we can start GDAM. To overcome this, we propose to run a few steps of gradient descent, starting from , for the following auxiliary logarithmic barrier problem,
[TABLE]
where is a positive number, which we choose to be in our implementation. After a few steps of GD, an feasible interior initialization for the dual problem (SNLD) can be readily obtained.
B.4 Main solve
After a strictly feasible initialization is found, the main solution phase starts that applies the accelerated Algorithm 2 for the dual problem (SNLD). To further improve the practical performance, we developed adaptive stepsize and restart strategies, whose heuristics were tuned for the SNL problems. We implement a backtracking-like stepsize rule for the iterate to remain in the feasible set, i.e., the stepsize is reduced by a scaling factor whenever a constraint is violated. We choose the initial stepsize to be and set the minimum stepsize to be . The restart strategy is motivated by the work [83]. Specifically, we restart the optimization algorithm when the objective function value increases or the objective reduction is heavily slowed down555In our practical implementation, the objective function values are monitored in a fixed frequency.. For efficiency, we do not reset the stepsize to , but to a multiple of the stepsize prior to the restart, i.e., , where is a constant and is the index of each restart. It should be noted that the implemented heuristic strategies are still basic, and more sophisticated methods can be used to further enhance the practical performance. Nonetheless, we found the present implementation is robust and scales well to large-scale problems.
B.5 Postsolve
After solving the dual SDR problem (SNLD), an additional step is needed to recover the primal solution that contains the actual sensor locations. Denote the feasible sets of (SNLP) and (SNLD) by and , respectively. Denote , and the interior of by . Assume that , the central path of a SDP can be expressed as
[TABLE]
where is the barrier parameter (compare (5)), is the identity matrix. The primal solution can be recovered from the dual solution by
[TABLE]
where is the barrier parameter, and, within the computational framework of GDAM, it can be approximated by (11),
[TABLE]
where is a solution found by the parameter . To achieve higher accuracy, we run GDAM with by taking the optimizer of the main solve as the initialization. In our implementation, we set for the postsolve. Note that smaller (for both the main solve and the postsolve) leads to more accurate solutions. However, too small step sizes result in (almost) singular matrices for , and numerical instabilities occur when inverting them.
The reference list from the paper itself. Each links out to its DOI / PubMed record.
- 1[1] I.F. Akyildiz, W. Su, Y. Sankarasubramaniam, and E. Cayirci, Wireless sensor networks: a survey , Computer networks 38 (2002), pp. 393–422.
- 2[2] T.A. Albring, M. Sagebaum, and N.R. Gauger, Efficient aerodynamic design using the discrete adjoint method in SU 2 , in 17th AIAA/ISSMO multidisciplinary analysis and optimization conference . 2016, p. 3518.
- 3[3] M.M. Ali and T.L. Oliphant, A trajectory-based method for constrained nonlinear optimization problems , Journal of Optimization Theory and Applications 177 (2018), pp. 479–497.
- 4[4] G. Allaire, Shape optimization by the homogenization method , Vol. 146, Springer Science & Business Media, 2012.
- 5[5] G. Allaire, F. Jouve, and A.M. Toader, Structural optimization using sensitivity analysis and a level-set method , Journal of computational physics 194 (2004), pp. 363–393.
- 6[6] X. Allamigeon, P. Benchimol, S. Gaubert, and M. Joswig, Log-barrier interior point methods are not strongly polynomial , SIAM Journal on Applied Algebra and Geometry 2 (2018), pp. 140–178.
- 7[7] E.D. Andersen and K.D. Andersen, The MOSEK interior point optimizer for linear programming: an implementation of the homogeneous algorithm , in High performance optimization , Springer, 2000, pp. 197–232.
- 8[8] H. Antil, R.H. Hoppe, and C. Linsenmann, Path-following primal-dual interior-point methods for shape optimization of stationary flow problems , Journal of Numerical Mathematics 15 (2007), pp. 81–100.
