Escaping Locally Optimal Decentralized Control Polices via Damping
Han Feng, Javad Lavaei

TL;DR
This paper investigates how increasing damping in decentralized control systems causes local optima to merge into a single global optimum, simplifying the control design landscape.
Contribution
It introduces a theoretical framework using hemi-continuity to analyze the evolution of local optima under damping and proves the elimination of spurious local solutions with large damping.
Findings
Damping merges local solutions into the global solution.
Large damping eliminates spurious local optima.
Numerical examples illustrate complex trajectories and convergence.
Abstract
We study the evolution of locally optimal decentralized controllers with the damping of the control system. Empirically it is shown that even for instances with an exponential number of connected components, damping merges all local solutions to the one global solution. We characterize the evolution of locally optimal solutions with the notion of hemi-continuity and further derive asymptotic properties of the objective function and of the locally optimal controllers as the damping becomes large. Especially, we prove that with enough damping, there is no spurious locally optimal controller with favorable control structures. The convoluted behavior of the locally optimal trajectory is illustrated with numerical examples.
Click any figure to enlarge with its caption.
Figure 1
Figure 2
Figure 3
Figure 4
Figure 5
Figure 6
Figure 7
Figure 8
Figure 9Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Escaping Locally Optimal Decentralized Control Polices via Damping††thanks: Email: [email protected], [email protected]
Han Feng and Javad Lavaei
This work was supported by grants from ARO, ONR, AFOSR, and NSF.
Abstract
We study the evolution of locally optimal decentralized controllers with the damping of the control system. Empirically it is shown that even for instances with an exponential number of connected components, damping merges all local solutions to the one global solution. We characterize the evolution of locally optimal solutions with the notion of hemi-continuity and further derive asymptotic properties of the objective function and of the locally optimal controllers as the damping becomes large. Especially, we prove that with enough damping, there is no spurious locally optimal controller with favorable control structures. The convoluted behavior of the locally optimal trajectory is illustrated with numerical examples.
1 Introduction
The optimal decentralized control problem (ODC) adds controller constraints to the classical centralized optimal control problem. This addition breaks down the separation principle and the classical solution formulas culminated in [4]. Although ODC has been proved intractable in general [23, 1], the problem has convex formulations under assumptions such as partially nestedness [19], positiveness [17], and quadratic invariance [10]. A recently proposed System Level Approach [21] convexified the problem in the space of system response matrix. Convex relaxation techniques have been extensively documented in [2], though it is considered challenging to solve large scale optimization problems with linear matrix inequalities.
The line of research on convexification is in contrast with the success of stochastic gradient descent well-documented in machine learning practice [9, 8]. Admittedly, the problem of generalizability, training speed, and fairness in machine learning departs from the traditional control focus on stability, robustness, and safety. Nevertheless, the interplay of the two has inspired fruitful results. As an example, to solve the linear-quadratic optimal control problem, the traditional nonlinear programming methods include Gauss-Newton, augmented Lagrangian, and Newton’s methods [11, 22, 12, 13]. Only in the last few years do researchers started to look at the classical problem with the newly developed optimization techniques and proved the efficiency of policy gradient methods in model-based and model-free optimal control problems [6]. This efficiency statement of local search, however, is unlikely to carry over trivially to ODC, due to the NP-hardness of the problem and the recent investigation of the topological properties of ODC in [7]. Nevertheless, questions can be answered without contradicting the general complexity statement. For example, it is known that damping of the system reduces the number of connected components of the set of stabilizing decentralized controllers. Does damping reduce the number of locally optimal decentralized controllers? This paper attempts an answer with (1) a study of the continuity properties of the trajectories of the locally optimal solutions formed by varying damping, and (2) an asymptotic analysis of the trajectories as the damping becomes large. The observation of our study shall shed light on the properties of local minima in reinforcement learning, whose aim is to design optimal control policies and different local minima have different practical behaviors.
This work is closely related to continuation methods such as homotopy. They are known to be appealing yet theoretically poorly understood [15]. Homotopy has been used as an initialization strategy in optimal control: in [3], the author mentioned the idea of gradually moving from a stable system to the original system to obtain a stabilizing controller. The paper [24] considered -reduced order problem and proposed several homotopy maps and initialization strategies; in its numerical experiments, initialization with a large multiple of was found appealing. [5] compared descent and continuation algorithms for optimal reduced-order control problem and concluded that homotopy methods are empirically superior to descent methods. The difficulty of obtaining a convergence theory for general constrained optimal control problem can be appreciated from the examples in [14]. Compared with those earlier works, we consider a special kind of continuation, that is, damping, to improve the locally optimal solutions in optimal decentralized control. Our focus is not so much on following a specific path but on the evolution of several paths and the movement of locally optimal solutions from one path to another.
The remainder of this paper is organized as follows. Notations and problem formulations are given in Section 2. Continuity and asymptotic properties of our damping strategies are outlined in Section 3 and Section 4, respectively. Numerical experiments are detailed in Section 5. Concluding remarks are drawn in Section 6.
2 Problem Formulation
Consider the linear time-invariant system
[TABLE]
where and are real matrices of compatible sizes. The vector is the state of the system with an unknown initialization , where is modeled as a random variable with zero mean and a positive definite covariance . The control input is to be determined via a static state-feedback law with the gain such that some quadratic performance measure is maximized. Given a controller , the closed-loop system is
[TABLE]
A matrix is said to be stable if all its eigenvalues lie in the open left half plane. The controller is said to stabilize the system if is stable. ODC optimizes over the set of structured stabilizing controllers
[TABLE]
where is a linear subspace of matrices, often specified by fixing certain entries of the matrix to zero. In that case, the sparsity pattern can be equivalently described with the indicator matrix , whose -entry is defined to be
[TABLE]
The structural constraint is then equivalent to , where denotes entry-wise multiplication. In the following, we will consider the discounted, or damped cost, which is defined as
[TABLE]
where is positive semi-definite and is positive definite. The expectation is taken over . Setting , the cost can be equivalently written as
[TABLE]
The two equivalent formulations above motivate the notion of “damping property”. We make a formal statement below.
Lemma 1**.**
The function defined in (1) and (2) satisfies the following “damping property”: suppose that stabilizes the system , then for all , stabilizes the system with .
Proof.
From the formulation (3), when is stable and , it holds that is stable. Therefore, is well-defined. From formulation (1), . ∎
The ODC problem can be succinctly written as
[TABLE]
We denote its set of globally optimal controllers by , and its set of locally optimal controllers by . The paper studies the properties of , , and for or .
To motivate the study of , consider Figure 1 below. The set-up of the experiments will be detailed in Section 5. It is known that systems of this type have a large number of locally optimal controllers [7]. The left figure plots selected trajectories of against , where . The selected trajectories are connected to a stabilizing controller in . The lowest curve corresponds to . The right figure plots the distance of the selected to the one .
The fact that modest damping causes the locally optimal trajectories to “collapse” to each other is a very attractive phenomenon. Especially, they suggest two improving heuristics.
- •
Solve (3) from a large and then gradually decrease to [math].
- •
Start from a locally optimal , solve (3) while gradually increase to a positive value and then decrease to [math].
The first idea shall avoid many unnecessary local optimum and its empirical behavior has been documented in [24]. The second idea has the potential to improve the locally optimal controllers obtained from many other methods. Due to the NP-hardness of general ODC, we expect no guarantee of producing a globally optimal, or even a stabilizing, decentralized controller. The breakdown of these heuristics will be discussed in Section 5.
3 Continuity
This section studies the continuity properties of and . The key notion of hemi-continuity captures the evolution of parametrized optimization problems.
Definition 1**.**
The set valued map is said to be upper hemi-continuous (uhc) at a point if for any open neighborhood of there exists a neighborhood of such that .
A related notion of lower hemi-continuity is provided in the supplement. A set-valued map is said to be continuous if it is both upper and lower hemi-continuous. A single-valued function is continuous if and only if it is uhc. We restate a version of Berge Maximum Theorem with a compactness assumption from [16].
Lemma 2** (Berge Maximum Theorem).**
Let and , assume that is jointly continuous and is a compact-valued correspondence. Define
[TABLE]
and
[TABLE]
If is continuous at some , then is continuous at . Furthermore, is non-empty, compact-valued, closed, and upper hemi-continuous.
Berge Maximum Theorem does not trivially apply to ODC: the set of stabilizing controllers is open and often unbounded. However, a lower-level set trick applies.
Theorem 1**.**
Assume that is non-empty, then the set is non-empty for all . is upper hemi-continuous and the optimal cost is continuous and strictly decreasing in .
Proof.
When is non-empty, there is an optimal decentralized controller for the undamped system. With the set of stabilizing controller non-empty, we incur the “damping property” in Lemma 1 and conclude
[TABLE]
The inequality above assumed existence of the globally controller for all values of damping parameter . This is true because the lower-level set of is compact [20]. Precisely, define to be
[TABLE]
The set-valued function is compact-valued for all fixed given a fixed . From the damping property, we can select any and optimize instead over without losing any globally optimal controller. The continuity of at for almost all is proved in the supplement. Berge maximum theorem then applies and yields the desired continuity of and . ∎
The argument above can be extended to characterize all locally optimal controllers. A caveat is the possible existence of locally optimal controllers with unbounded cost. Their existence does not contradict the damping property — damping can introduce locally optimal controllers that are not stabilizing without the damping.
Theorem 2**.**
Assume that is non-empty, then the set is nonempty for all . Suppose furthermore that at an
[TABLE]
then is upper hemi-continuous at and the optimal cost is upper hemi-continuous at .
Proof.
That is non-empty follows from the existence of globally optimal controllers in Theorem 1. Consider the parametrized optimization problem
[TABLE]
The assumption ensures the existence of an and an such that for where . This choice of guarantees that the formulation (5) does not cut off any locally optimal controllers. As proved in the supplement, is continuous at for almost any , and a large can be selected to make continuous at . Berge Maximum Theorem applies to conclude that is upper hemi-continuous. Since is jointly continuous in , is upper hemi-continuous. ∎
4 Asymptotic Properties
In this section, we state asymptotic properties of the local solutions . The controllers satisfy the first order necessary conditions in the following equations (6)-(9); their derivation can be found in [18].
[TABLE]
The above conditions provide a closed-form expression of the cost
[TABLE]
It is worth pointing out that equations (6)-(10) are algebraic, involving only polynomial functions of the unknown matrices and . The matrices and are written as a function of because they are uniquely determined from (6) and (7) given a stabilizing controller . The following theorem characterizes the evolution of locally optimal controllers for a specific sparsity pattern. The theorem justifies the practice of random initialization around zero.
Theorem 3**.**
Suppose that the sparsity pattern is block-diagonal with square blocks and that has the same sparsity pattern as . Then, all points in converge to the zero matrix as . Furthermore, as for all .
Not only do all locally optimal controllers approach zero, the problem is in fact convex over bounded regions with enough damping.
Theorem 4**.**
For any given , the Hessian matrix is positive definite over for all large .
The proof of the two theorems above is given in the supplement.
Corollary 1**.**
With the assumption of Theorem 3, there is no spurious locally optimal controller for large . That is, for all large .
Proof.
For any given , all controllers in the ball are stabilizing when is large. As a result, stability constraints can be relaxed over . Furthermore, from Theorem 3, when is large, all locally optimal controllers will be inside . From Theorem 4, the objective function become convex over for large enough . The observations imply local and global solutions coincide. ∎
The theorems above rely on the “damping property” in Lemma 1. It is worth commenting that damping the system with is almost the only continuation method for general system matrices that achieves the monotonic increasing of stable sets. Formally,
Theorem 5**.**
When , for any -by- real matrix that is not a multiple of , there exists a stable matrix for which is unstable.
The proof is given in the supplement. This theorem justifies the use of as the continuation parameter. However, in a given system with structure, matrices other than may be appropriate.
5 Numerical Experiments
In this section, we document various homotopy behaviors as the damping parameter varies. The focus is on the evolution of locally optimal trajectories, which can be tracked by any local search methods. The experiments are performed on small-sized systems so the random initialization can find a reasonable number of distinct locally optimal solutions. Despite the small system dimension, the existence of many locally optimal solutions and their convoluted trajectories demonstrates what is possible in a theory of homotopy.
The local search methods we used is the simplest projected gradient descent. At a controller , we perform line search along the direction . The step size is determined with backtracking and Armijo rule, that is, we select as the largest number in such that is stabilizing while
[TABLE]
Our choice of parameters are , , and . We terminate the iteration when the norm of the gradient is less than .
5.1 Systems with a large number of local minima
We first consider the examples from [7], where the feasible set is reasonably disconnected and admits many local minima. The system matrices are given by
[TABLE]
When the dimension is , it is known that the set of stabilizing decentralized controllers has at least connected components. We sample the initial controllers from and, after 1000 samples, obtain initial optimal solutions. We gradually increase the damping parameter from [math] to with increment, and track the trajectories of locally optimal solutions by solving the newly damped system with the previous local optimal solution as the initialization. The evolution of the optimal cost and the distance from the best known optimal controller is plotted Figure 1. Notice that all sub-optimal local trajectories terminate after a modest damping . After that, the minimization algorithm always tracks a single trajectory. This illustrates the prediction of Corollary 1. Especially, if we start tracking a sub-optimal controller trajectory from , we will be on the better trajectory when . At that time, if we gradually decrease to zero, we obtain a stabilizing controller with a lower cost.
5.2 Experiments on Random Systems
With the same initialization and optimization procedure, we perform the experiments with -by- system matrices and randomly generated from the distribution . For 92 out of 100 samples we are not able to find more than one locally optimal trajectory. Examples with more than one local trajectories are listed below. All figures to the left plot the cost of locally optimal controllers. All figures to the right plot the distance of the locally optimal controllers to the controller with the lowest cost. Note that the order of the cost of the trajectories may be preserved during the damping (Figure 2) and may also be disrupted (Figure 3). More than one trajectory may have the lowest cost during the damping (Figure 4).
Figure 5 shows a hysteresis-like loop as the damping coefficient is first decreased and then increased. The trajectory of the controller first leads up to large cost and, the local search method escapes this local minimum to another one with a smaller cost. As the damping decreases, it returns where it starts along a different route.
6 Conclusion
This paper studied the trajectory of locally and globally optimal solution to the optimal decentralized control problem as the damping of the decentralized control system varies. Asymptotic and continuity properties of trajectories are proved. The complicated phenomenon of continuation is illustrated with numerical examples. The fact that damping merges all locally optimal solutions is strong evidence that the idea of homotopy can be fruitfully used to improve locally optimal solutions.
Acknowledgments
The authors are grateful to Salar Fattahi and Cédric Josz for their constructive comments and feedback. The author thanks Yuhao Ding for sharing the implementation of local search algorithms.
Appendix A Notions of continuity
We recount the notion of upper and lower hemi-continuity and prove the continuity properties of the lower level-set map. The reader is referred to [16] for an accessible treatment.
Definition 2**.**
The set valued map is said to be upper hemi-continuous (uhc) at a point if for any open neighborhood of there exists a neighborhood of such that .
If is compact, uhc is equivalent to the graph of being closed, that is, if and , then .
Definition 3**.**
The set valued map is said to be lower hemi-continuous (lhs) at a point if for any open neighborhood intersecting there exists a neighborhood of such that intersects for all .
Equivalently, for all and , there exists subsequence of and a corresponding , such that .
We prove the upper hemi-continuity of the lower level set map in Lemma 3 below.
Lemma 3**.**
Given matrices and the objective cost that satisfies the damping property. Define
[TABLE]
Assume that is not empty for all and a given , then is an upper hemi-continuous set-valued map.
Proof.
From [20], is compact for all . From the damping property, for any , we have . Therefore, to characterize the continuity of at a , it suffices to consider the restricted map for some , that is, to consider the range of to be compact. Therefore, the sequence characterization of uhc applies. Suppose , pick a sequence of that converges to . The continuity of implies . The fact that the cost is bounded implies is stable. Since subspaces of matrices are closed, . We have verified all conditions for , so is upper hemi-continuous. ∎
The lower hemi-continuity of is more subtle.
Lemma 4**.**
At any given , is lower hemi-continuous at except when , which is a finite set of locally optimal costs.
Proof.
Prove by contradiction, consider a sequence and a , but there exists no subsequence of and such that . We must have — otherwise for large and, since the set of stabilizing controllers is open, for large . Furthermore, must be a local minimum of — otherwise there exists a sequence with and, by the continuity of , there exists as sequence of large enough indices such that ; the sequence converges to . The argument above suggests that belongs to the cost locally optimal controllers at . Because as a function over can be described as a linear function over an algebraic set, the value of local minimum is finite. ∎
Appendix B Convergence of locally optimal controllers
We prove the asymptotic properties of the locally optimal controllers in Section 4 of the main paper.
Theorem**.**
Suppose the sparsity pattern is block-diagonal with square blocks, and has the same sparsity pattern as . Then all points in converges to the zero matrix as . Furthermore, as for all .
Proof.
Recall the expression of the objective function
[TABLE]
and the first order necessary conditions
[TABLE]
Those first order conditions can be used to characterize the objective function
[TABLE]
As increases, some local solution may disappear, some new local solution may appear. The appearance cannot happen infinitely often because the equations (17)-(20) are algebraic. Suppose when , the number of local solutions does not change. The damping property ensures for ,
[TABLE]
The right hand side optimizes over a fixed, finite set of controllers and goes to zero as from the formulation (16) and the dominated convergence theorem. The left hand side, therefore, also converges to zero as . From (21) and the assumption that is positive definite, for all as .
The assumption on sparsity allows the expression of the locally optimal controllers in (19) as
[TABLE]
Especially we can bound
[TABLE]
Pre- and post- multiply (18) by ’s unit minimum eigenvector ,
[TABLE]
Therefore
[TABLE]
This simplifies to
[TABLE]
Take the trace of (18) and consider the estimate
[TABLE]
where for clarity denotes and denotes . The second and the third inequalities use the fact that for a positive definite matrix and any matrix . This estimate, combined with previous argument that , concludes . We also obtain from the inequality that
[TABLE]
for small enough . Combining (26) and (27)
[TABLE]
which converges to [math] as . ∎
Appendix C The Positive Definiteness of Hessian
Theorem**.**
For any given , the Hessian matrix is positive definite over for all large .
Proof.
The proof requires the vectorized Hessian formula given in Lemma 3.7 of [18], restated below.
Lemma 5** ([18]).**
Define by . The Hessian of is given by the formula
[TABLE]
where
[TABLE]
and is an permutation matrix.
We show that in the lemma is positive definite for any fixed when is large. Recall the definition of and .
[TABLE]
With triangle inequality
[TABLE]
which means and as . The minimum eigenvalue of can be bounded similarly: let be the unit eigenvector of corresponding to , pre- and post- multiply (28) by , we obtain
[TABLE]
The first Hessian term can bounded from below with (30)
[TABLE]
We bound the norm of the second and the third Hessian term as follows, where hides constants that do not depend on .
[TABLE]
Comparing the two estimates above, we find the first term dominates the two following terms with large , uniformly over bounded . The Hessian is therefore positive definite over bounded when is large. The conclusion carries over to the Hessian of the decentralized controller, which is a principal sub-matrix of the Hessian of the centralized controller. ∎
Appendix D The uniqueness of the continuation direction
This section aims to prove the following result
Theorem**.**
When , for any -by- real matrix that is not a multiple of , there exists a stable matrix for which is unstable.
Define the set of stable directions
[TABLE]
where and are -by- real matrices.
Lemma 6**.**
All matrices in is similar to a diagonal matrix with non-positive diagonal entries. Especially, they cannot have complex eigenvalues.
Proof.
When is large, is a small perturbation of , hence the eigenvalues of has to be in the closed left half plane. With a suitable similar transform assume is in real Jordan form. First consider the case of two by two matrices, and we denote the matrices by and . Assume for contradiction that is not diagonalizable. The non-diagonal real Jordan form of has the following possibilities:
- •
, where has real eigenvalues . Pick , which is stable because and . We have , whose stability criterion and amounts to
[TABLE]
or equivalently . Especially when , is not stable.
- •
. Pick a stable matrix . is not stable when .
- •
, where , Pick , is not stable.
- •
, where and . By rescaling assume . Consider the following matrix function
[TABLE]
We have
[TABLE]
Espeically,
[TABLE]
Hence as long as
[TABLE]
for small enough , is a stable matrix and there will be matrices with whose trace is negative and whose determinant is smaller. Consider the minimal value the determinant can take
[TABLE]
which means when
[TABLE]
The matrix with is unstable. There certainly exist and that satisfies (33) and (34).
For general , ’s real Jordan form is an block upper-triangular matrix
[TABLE]
where can take the four possibilities mentioned above. We take the corresponding stable constructed above, which has the property that is not stable for some . Form the block diagonal matrix
[TABLE]
Then is stable, while is not stable. ∎
We can strengthen the argument above and further characterize in the case .
Lemma 7**.**
When , the set of stable directions does not contain any matrices of rank , , …, .
Proof.
From lemma 6, we only need to consider the case where is diagonal with negative diagonal entries. Assume there is a rank one matrix , write
[TABLE]
where . This is possible with the rank assumption. We will construct a stable -by- matrix , such that there is some that makes unstable, and then carry the instability to with the extended matrix
[TABLE]
From [7], the set
[TABLE]
has two disconnected components. Consider the Jordan decomposition of the matrix
[TABLE]
where is some invertible matrix. Write
[TABLE]
After this similar transform, the set can be written with .
[TABLE]
Since is disconnected there exists some such that is stable, while is unstable with some eigenvalue in the right half plane. Setting and completes the proof. ∎
Since we can perturb the direction and make full-rank, the fact that has rank one is not the substantial property. This is indeed the case.
Lemma 8**.**
When , .
Proof.
From lemma 6, we only need to consider the case where is diagonal with negative diagonal entries. Write
[TABLE]
where . The diagonal entries are non-positive and not all equal. We will construct an and a corresponding such that is stable while is not stable, and extend to the general as in Lemma 7. The case where has rank has been considered in Lemma 7. We show the remaining rank is impossible. Without loss of generality we rescale and assume .
- •
, where . Consider the matrix function
[TABLE]
The characteristic polynomial of is
[TABLE]
The Routh-Hurwitz Criterion insists
[TABLE]
which is simplified with to
[TABLE]
Especially, when , (36) simplifies to the obvious expression . when , (35) implies is not stable. Setting and concludes the proof.
- •
, where without loss of generally we assume
[TABLE]
Consider the matrix
[TABLE]
Its Routh-Hurwitz Criterion insists
[TABLE]
We claim that when
[TABLE]
the set of that satisfy Routh-Hurwitz Criterion is disconnected. To see this, write the positive local minimum of in (38) as , and write the positive local minimum of in (39) as . The condition (37) ensures that and the condition (40) ensures that and are negative. Furthermore, consider , which is the root of . It holds that and both and are positive, which implies that the positive intersection and are positive. We conclude that when , the matrix is stable, and when is large, is again stable. Yet when , the matrix is not stable.
∎
The reference list from the paper itself. Each links out to its DOI / PubMed record.
- 1[1] Vincent D. Blondel and John N. Tsitsiklis. A survey of computational complexity results in systems and control. Automatica , 36(9):1249–1274, 2000.
- 2[2] Stephen P. Boyd, L El Ghaoui, E Feron, and V Balakrishnan. Linear Matrix Inequalities in System and Control Theory , volume 15. 1994.
- 3[3] J. R. Broussard and N. Halyo. Active Flutter Control using Discrete Optimal Constrained Dynamic Compensators. In 1983 American Control Conference , pages 1026–1034, June 1983.
- 4[4] J.C. Doyle, K. Glover, P.P. Khargonekar, and B.A. Francis. State-space solutions to standard H-2 and H-infinity control problems. IEEE Transactions on Automatic Control , 34(8):831–847, 1989.
- 5[5] Emmanuel G. Collins Jr. and Debashis Sadhukhan. A comparison of descent and continuation algorithms for H 2 optimal, reduced-order control designs. International Journal of Control , 69(5):647–662, January 1998.
- 6[6] Maryam Fazel, Rong Ge, Sham M. Kakade, and Mehran Mesbahi. Global Convergence of Policy Gradient Methods for Linearized Control Problems. January 2018.
- 7[7] Han Feng and Javad Lavaei. On the Exponential Number of Connected Components for the Feasible Set of Optimal Decentralized Control Problems. In To Appear in Proceedings of the 2019 American Control Conference , page 8.
- 8[8] Ian Goodfellow, Yoshua Bengio, and Aaron Courville. Deep Learning . MIT Press, 2016.
