Escaping Locally Optimal Decentralized Control Polices via Damping

Han Feng; Javad Lavaei

arXiv:1905.09915·math.OC·May 27, 2019·ACC

Escaping Locally Optimal Decentralized Control Polices via Damping

Han Feng, Javad Lavaei

PDF

TL;DR

This paper investigates how increasing damping in decentralized control systems causes local optima to merge into a single global optimum, simplifying the control design landscape.

Contribution

It introduces a theoretical framework using hemi-continuity to analyze the evolution of local optima under damping and proves the elimination of spurious local solutions with large damping.

Findings

01

Damping merges local solutions into the global solution.

02

Large damping eliminates spurious local optima.

03

Numerical examples illustrate complex trajectories and convergence.

Abstract

We study the evolution of locally optimal decentralized controllers with the damping of the control system. Empirically it is shown that even for instances with an exponential number of connected components, damping merges all local solutions to the one global solution. We characterize the evolution of locally optimal solutions with the notion of hemi-continuity and further derive asymptotic properties of the objective function and of the locally optimal controllers as the damping becomes large. Especially, we prove that with enough damping, there is no spurious locally optimal controller with favorable control structures. The convoluted behavior of the locally optimal trajectory is illustrated with numerical examples.

Figures9

Click any figure to enlarge with its caption.

Equations160

\overset{x}{˙} (t)

\overset{x}{˙} (t)

\overset{x}{˙} (t)

\overset{x}{˙} (t)

{K : A + B K is stable, K \in S},

{K : A + B K is stable, K \in S},

[I_{S}]_{ij} = {1, if K_{ij} is free 0, if K_{ij} = 0.

[I_{S}]_{ij} = {1, if K_{ij} is free 0, if K_{ij} = 0.

J (K, α) =

J (K, α) =

s . t .

\overset{u}{^} (t) = K \overset{x}{^} (t) .

J (K, α) = s . t . E \int_{0}^{\infty} [x^{⊤} (t) Q x (t) + u^{⊤} (t) R u (t)] d t \overset{x}{˙} (t) = (A - α I) x (t) + B u (t) u (t) = K x (t),

J (K, α) = s . t . E \int_{0}^{\infty} [x^{⊤} (t) Q x (t) + u^{⊤} (t) R u (t)] d t \overset{x}{˙} (t) = (A - α I) x (t) + B u (t) u (t) = K x (t),

min

min

s . t .

A - α I + B K is stable .

K^{*} (α) = arg min {J (K, α) ∣ K \in Γ (α)}, for all α \in A,

K^{*} (α) = arg min {J (K, α) ∣ K \in Γ (α)}, for all α \in A,

J (K^{*} (α), α) = min {J (K, α) ∣ K \in Γ (α)}, for all α \in A .

J (K^{*} (α), α) = min {J (K, α) ∣ K \in Γ (α)}, for all α \in A .

J (K^{*} (α), α) \leq J (K^{*} (0), α) < J (K^{*} (0), 0) .

J (K^{*} (α), α) \leq J (K^{*} (0), α) < J (K^{*} (0), 0) .

Γ_{M} (α) = {K \in S : A - α I + B K is stable and J (K, α) \leq M} .

Γ_{M} (α) = {K \in S : A - α I + B K is stable and J (K, α) \leq M} .

ϵ \to 0^{+} lim α \in [α_{0} - ϵ, α_{0} + ϵ] sup K \in K^{†} (α) sup J (K, α) < \infty,

ϵ \to 0^{+} lim α \in [α_{0} - ϵ, α_{0} + ϵ] sup K \in K^{†} (α) sup J (K, α) < \infty,

min

min

s . t .

(A - α I + B K)^{⊤} P_{α} (K) + P_{α} (K) (A - α I + B K) + K^{⊤} R K + Q = 0

(A - α I + B K)^{⊤} P_{α} (K) + P_{α} (K) (A - α I + B K) + K^{⊤} R K + Q = 0

L_{α} (K) (A - α I + B K)^{⊤} + (A - α I + B K) L_{α} (K) + D_{0} = 0

((B^{⊤} P_{α} (K) + R K) L_{α} (K)) \circ I_{S} = 0

K \circ I_{S} = K .

J (K, α) = tr (D_{0} P_{α} (K)) .

J (K, α) = tr (D_{0} P_{α} (K)) .

J (K^{i} + s^{i} \tilde{K}^{i}) < J (K^{i}) + α s^{i} ⟨ \nabla J (K^{i}), \tilde{K}^{i} ⟩ .

J (K^{i} + s^{i} \tilde{K}^{i}) < J (K^{i}) + α s^{i} ⟨ \nabla J (K^{i}), \tilde{K}^{i} ⟩ .

A = - 1 - 2 00 20 - 1 0 010 - 2 0020, B

A = - 1 - 2 00 20 - 1 0 010 - 2 0020, B

Γ_{M} (α) = {K \in S : A - α I + B K is stable and J (K, α) \leq M} .

Γ_{M} (α) = {K \in S : A - α I + B K is stable and J (K, α) \leq M} .

J (K, α) =

J (K, α) =

s . t .

\overset{u}{^} (t) = K \overset{x}{^} (t),

(A - α I + B K)^{⊤} P_{α} (K) + P_{α} (K) (A - α I + B K) + K^{⊤} R K + Q = 0

(A - α I + B K)^{⊤} P_{α} (K) + P_{α} (K) (A - α I + B K) + K^{⊤} R K + Q = 0

L_{α} (K) (A - α I + B K)^{⊤} + (A - α I + B K) L_{α} (K) + D_{0} = 0

((B^{⊤} P_{α} (K) + R K) L_{α} (K)) \circ I_{S} = 0

K \circ I_{S} = K .

J (K, α) = tr (D_{0} P_{α} (K)) .

J (K, α) = tr (D_{0} P_{α} (K)) .

K \in K^{†} (β) max J (K, β) \leq K \in K^{†} (α) max J (K, β)

K \in K^{†} (β) max J (K, β) \leq K \in K^{†} (α) max J (K, β)

K = - R^{- 1} ((B^{⊤} P_{α} (K) L_{α} (K)) \circ I_{S}) (L_{α} (K) \circ I_{S})^{- 1} .

K = - R^{- 1} ((B^{⊤} P_{α} (K) L_{α} (K)) \circ I_{S}) (L_{α} (K) \circ I_{S})^{- 1} .

∥ B K ∥ \leq ∥ B R^{- 1} B^{⊤} P_{α} (K) L_{α} (K) ∥ λ_{m i n} (L_{α} (K))^{- 1} .

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Full text

Escaping Locally Optimal Decentralized Control Polices via Damping††thanks: Email: [email protected], [email protected]

Han Feng and Javad Lavaei

This work was supported by grants from ARO, ONR, AFOSR, and NSF.

Abstract

We study the evolution of locally optimal decentralized controllers with the damping of the control system. Empirically it is shown that even for instances with an exponential number of connected components, damping merges all local solutions to the one global solution. We characterize the evolution of locally optimal solutions with the notion of hemi-continuity and further derive asymptotic properties of the objective function and of the locally optimal controllers as the damping becomes large. Especially, we prove that with enough damping, there is no spurious locally optimal controller with favorable control structures. The convoluted behavior of the locally optimal trajectory is illustrated with numerical examples.

1 Introduction

The optimal decentralized control problem (ODC) adds controller constraints to the classical centralized optimal control problem. This addition breaks down the separation principle and the classical solution formulas culminated in [4]. Although ODC has been proved intractable in general [23, 1], the problem has convex formulations under assumptions such as partially nestedness [19], positiveness [17], and quadratic invariance [10]. A recently proposed System Level Approach [21] convexified the problem in the space of system response matrix. Convex relaxation techniques have been extensively documented in [2], though it is considered challenging to solve large scale optimization problems with linear matrix inequalities.

The line of research on convexification is in contrast with the success of stochastic gradient descent well-documented in machine learning practice [9, 8]. Admittedly, the problem of generalizability, training speed, and fairness in machine learning departs from the traditional control focus on stability, robustness, and safety. Nevertheless, the interplay of the two has inspired fruitful results. As an example, to solve the linear-quadratic optimal control problem, the traditional nonlinear programming methods include Gauss-Newton, augmented Lagrangian, and Newton’s methods [11, 22, 12, 13]. Only in the last few years do researchers started to look at the classical problem with the newly developed optimization techniques and proved the efficiency of policy gradient methods in model-based and model-free optimal control problems [6]. This efficiency statement of local search, however, is unlikely to carry over trivially to ODC, due to the NP-hardness of the problem and the recent investigation of the topological properties of ODC in [7]. Nevertheless, questions can be answered without contradicting the general complexity statement. For example, it is known that damping of the system reduces the number of connected components of the set of stabilizing decentralized controllers. Does damping reduce the number of locally optimal decentralized controllers? This paper attempts an answer with (1) a study of the continuity properties of the trajectories of the locally optimal solutions formed by varying damping, and (2) an asymptotic analysis of the trajectories as the damping becomes large. The observation of our study shall shed light on the properties of local minima in reinforcement learning, whose aim is to design optimal control policies and different local minima have different practical behaviors.

This work is closely related to continuation methods such as homotopy. They are known to be appealing yet theoretically poorly understood [15]. Homotopy has been used as an initialization strategy in optimal control: in [3], the author mentioned the idea of gradually moving from a stable system to the original system to obtain a stabilizing controller. The paper [24] considered $H_{2}$ -reduced order problem and proposed several homotopy maps and initialization strategies; in its numerical experiments, initialization with a large multiple of $-I$ was found appealing. [5] compared descent and continuation algorithms for $H_{2}$ optimal reduced-order control problem and concluded that homotopy methods are empirically superior to descent methods. The difficulty of obtaining a convergence theory for general constrained optimal control problem can be appreciated from the examples in [14]. Compared with those earlier works, we consider a special kind of continuation, that is, damping, to improve the locally optimal solutions in optimal decentralized control. Our focus is not so much on following a specific path but on the evolution of several paths and the movement of locally optimal solutions from one path to another.

The remainder of this paper is organized as follows. Notations and problem formulations are given in Section 2. Continuity and asymptotic properties of our damping strategies are outlined in Section 3 and Section 4, respectively. Numerical experiments are detailed in Section 5. Concluding remarks are drawn in Section 6.

2 Problem Formulation

Consider the linear time-invariant system

[TABLE]

where $A\in\mathbb{R}^{n\times n}$ and $B\in\mathbb{R}^{n\times m}$ are real matrices of compatible sizes. The vector $x(t)$ is the state of the system with an unknown initialization $x(0)=x_{0}$ , where $x_{0}$ is modeled as a random variable with zero mean and a positive definite covariance $\mathbb{E}[x(0)x(0)^{\top}]=D_{0}$ . The control input $u(t)$ is to be determined via a static state-feedback law $u(t)=Kx(t)$ with the gain $K\in\mathbb{R}^{m\times n}$ such that some quadratic performance measure is maximized. Given a controller $K$ , the closed-loop system is

[TABLE]

A matrix is said to be stable if all its eigenvalues lie in the open left half plane. The controller $K$ is said to stabilize the system if $A+BK$ is stable. ODC optimizes over the set of structured stabilizing controllers

[TABLE]

where $\mathcal{S}\subseteq\mathbb{R}^{m\times n}$ is a linear subspace of matrices, often specified by fixing certain entries of the matrix to zero. In that case, the sparsity pattern can be equivalently described with the indicator matrix $I_{\mathcal{S}}$ , whose $(i,j)$ -entry is defined to be

[TABLE]

The structural constraint $K\in\mathcal{S}$ is then equivalent to $K\circ I_{\mathcal{S}}=K$ , where $\circ$ denotes entry-wise multiplication. In the following, we will consider the discounted, or damped cost, which is defined as

[TABLE]

where $Q\succeq 0$ is positive semi-definite and $R\succ 0$ is positive definite. The expectation is taken over $x_{0}$ . Setting $x(t)=e^{-\alpha t}\hat{x}(t),u(t)=e^{-\alpha t}\hat{u}(t)$ , the cost $J(K,\alpha)$ can be equivalently written as

[TABLE]

The two equivalent formulations above motivate the notion of “damping property”. We make a formal statement below.

Lemma 1.

The function $J(K,\alpha)$ defined in (1) and (2) satisfies the following “damping property”: suppose that $K$ stabilizes the system $(A-\alpha I,B)$ , then for all $\beta>\alpha$ , $K$ stabilizes the system $(A-\beta I,B)$ with $J(K,\beta)<J(K,\alpha)$ .

Proof.

From the formulation (3), when $A-\alpha I+BK$ is stable and $\beta>\alpha$ , it holds that $A-\beta I+BK=(A-\alpha I+BK)-(\beta-\alpha)I$ is stable. Therefore, $J(K,\beta)$ is well-defined. From formulation (1), $J(K,\beta)<J(K,\alpha)$ . ∎

The ODC problem can be succinctly written as

[TABLE]

We denote its set of globally optimal controllers by $K^{*}(\alpha)$ , and its set of locally optimal controllers by $K^{\dagger}(\alpha)$ . The paper studies the properties of $K^{*}(\alpha)$ , $K^{\dagger}(\alpha)$ , and $J(K,\alpha)$ for $K\in K^{*}(\alpha)$ or $K^{\dagger}(\alpha)$ .

To motivate the study of $K^{\dagger}(\alpha)$ , consider Figure 1 below. The set-up of the experiments will be detailed in Section 5. It is known that systems of this type have a large number of locally optimal controllers [7]. The left figure plots selected trajectories of $J(K,\alpha)$ against $\alpha$ , where $K\in K^{\dagger}(\alpha)$ . The selected trajectories are connected to a stabilizing controller in $K^{\dagger}(0)$ . The lowest curve corresponds to $J(K^{*}(\alpha),\alpha)$ . The right figure plots the distance of the selected $K\in K^{\dagger}(\alpha)$ to the one $K\in K^{*}(\alpha)$ .

The fact that modest damping causes the locally optimal trajectories to “collapse” to each other is a very attractive phenomenon. Especially, they suggest two improving heuristics.

•

Solve (3) from a large $\alpha$ and then gradually decrease $\alpha$ to [math].

•

Start from a locally optimal $K\in K^{\dagger}(\alpha)$ , solve (3) while gradually increase $\alpha$ to a positive value and then decrease $\alpha$ to [math].

The first idea shall avoid many unnecessary local optimum and its empirical behavior has been documented in [24]. The second idea has the potential to improve the locally optimal controllers obtained from many other methods. Due to the NP-hardness of general ODC, we expect no guarantee of producing a globally optimal, or even a stabilizing, decentralized controller. The breakdown of these heuristics will be discussed in Section 5.

3 Continuity

This section studies the continuity properties of $K^{*}(\alpha)$ and $K^{\dagger}(\alpha)$ . The key notion of hemi-continuity captures the evolution of parametrized optimization problems.

Definition 1.

The set valued map $\Gamma:\mathcal{A}\to\mathcal{B}$ is said to be upper hemi-continuous (uhc) at a point $a$ if for any open neighborhood $V$ of $\Gamma(a)$ there exists a neighborhood $U$ of $a$ such that $\Gamma(U)\subseteq V$ .

A related notion of lower hemi-continuity is provided in the supplement. A set-valued map is said to be continuous if it is both upper and lower hemi-continuous. A single-valued function is continuous if and only if it is uhc. We restate a version of Berge Maximum Theorem with a compactness assumption from [16].

Lemma 2 (Berge Maximum Theorem).

Let $\mathcal{A}\subseteq\mathbb{R}$ and $\mathcal{S}\subseteq\mathbb{R}^{m\times n}$ , assume that $J:\mathcal{S}\times\mathcal{A}\to\mathbb{R}$ is jointly continuous and $\Gamma:\mathcal{A}\to\mathcal{S}$ is a compact-valued correspondence. Define

[TABLE]

and

[TABLE]

If $\Gamma$ is continuous at some $\alpha\in\mathcal{A}$ , then $J(K^{*}(\alpha),\alpha)$ is continuous at $\alpha$ . Furthermore, $K^{*}$ is non-empty, compact-valued, closed, and upper hemi-continuous.

Berge Maximum Theorem does not trivially apply to ODC: the set of stabilizing controllers is open and often unbounded. However, a lower-level set trick applies.

Theorem 1.

Assume that $K^{*}(0)$ is non-empty, then the set $K^{*}(\alpha)$ is non-empty for all $\alpha>0$ . $K^{*}(\alpha)$ is upper hemi-continuous and the optimal cost $J(K^{*}(\alpha),\alpha)$ is continuous and strictly decreasing in $\alpha$ .

Proof.

When $K^{*}(0)$ is non-empty, there is an optimal decentralized controller for the undamped system. With the set of stabilizing controller non-empty, we incur the “damping property” in Lemma 1 and conclude

[TABLE]

The inequality above assumed existence of the globally controller for all values of damping parameter $\alpha$ . This is true because the lower-level set of $J(K,\alpha)$ is compact [20]. Precisely, define $\Gamma_{M}(\alpha)$ to be

[TABLE]

The set-valued function $\Gamma_{M}$ is compact-valued for all fixed $\alpha$ given a fixed $M$ . From the damping property, we can select any $M>J(K^{*}(0),0)$ and optimize instead over $\Gamma_{M}(\alpha)$ without losing any globally optimal controller. The continuity of $\Gamma_{M}(\alpha)$ at $\alpha$ for almost all $M$ is proved in the supplement. Berge maximum theorem then applies and yields the desired continuity of $K^{*}(\alpha)$ and $J(K^{*}(\alpha),\alpha)$ . ∎

The argument above can be extended to characterize all locally optimal controllers. A caveat is the possible existence of locally optimal controllers with unbounded cost. Their existence does not contradict the damping property — damping can introduce locally optimal controllers that are not stabilizing without the damping.

Theorem 2.

Assume that $K^{\dagger}(0)$ is non-empty, then the set $K^{\dagger}(\alpha)$ is nonempty for all $\alpha>0$ . Suppose furthermore that at an $\alpha_{0}>0$

[TABLE]

then $K^{\dagger}(\alpha)$ is upper hemi-continuous at $\alpha_{0}$ and the optimal cost $J(K^{\dagger}(\alpha),\alpha)$ is upper hemi-continuous at $\alpha_{0}$ .

Proof.

That $K^{\dagger}(\alpha)$ is non-empty follows from the existence of globally optimal controllers in Theorem 1. Consider the parametrized optimization problem

[TABLE]

The assumption ensures the existence of an $M$ and an $\epsilon>0$ such that $M>J(K,\alpha)$ for $K\in K^{\dagger}(\alpha)$ where $\alpha\in[\alpha_{0}-\epsilon,\alpha_{0}+\epsilon]$ . This choice of $M$ guarantees that the formulation (5) does not cut off any locally optimal controllers. As proved in the supplement, $\Gamma_{M}(\alpha)$ is continuous at $\alpha_{0}$ for almost any $M$ , and a large $M$ can be selected to make $\Gamma_{M}(\alpha)$ continuous at $\alpha_{0}$ . Berge Maximum Theorem applies to conclude that $K^{\dagger}(\alpha)$ is upper hemi-continuous. Since $J(K,\alpha)$ is jointly continuous in $(K,\alpha)$ , $J(K^{\dagger}(\alpha),\alpha)$ is upper hemi-continuous. ∎

4 Asymptotic Properties

In this section, we state asymptotic properties of the local solutions $K^{\dagger}(\alpha)$ . The controllers $K\in K^{\dagger}(\alpha)$ satisfy the first order necessary conditions in the following equations (6)-(9); their derivation can be found in [18].

[TABLE]

The above conditions provide a closed-form expression of the cost

[TABLE]

It is worth pointing out that equations (6)-(10) are algebraic, involving only polynomial functions of the unknown matrices $K,P_{\alpha}$ and $L_{\alpha}$ . The matrices $P_{\alpha}$ and $L_{\alpha}$ are written as a function of $K$ because they are uniquely determined from (6) and (7) given a stabilizing controller $K$ . The following theorem characterizes the evolution of locally optimal controllers for a specific sparsity pattern. The theorem justifies the practice of random initialization around zero.

Theorem 3.

Suppose that the sparsity pattern $I_{S}$ is block-diagonal with square blocks and that $R$ has the same sparsity pattern as $I_{S}$ . Then, all points in $K^{\dagger}$ converge to the zero matrix as $\alpha\to\infty$ . Furthermore, $J(K,\alpha)\to 0$ as $\alpha\to\infty$ for all $K\in K^{\dagger}(\alpha)$ .

Not only do all locally optimal controllers approach zero, the problem is in fact convex over bounded regions with enough damping.

Theorem 4.

For any given $r>0$ , the Hessian matrix $\nabla^{2}J(K,\alpha)$ is positive definite over $\|K\|\leq r$ for all large $\alpha$ .

The proof of the two theorems above is given in the supplement.

Corollary 1.

With the assumption of Theorem 3, there is no spurious locally optimal controller for large $\alpha$ . That is, $K^{\dagger}(\alpha)=K^{*}(\alpha)$ for all large $\alpha$ .

Proof.

For any given $r>0$ , all controllers in the ball $\mathcal{B}=\{K:\|K\|\leq r\}$ are stabilizing when $\alpha$ is large. As a result, stability constraints can be relaxed over $\mathcal{B}$ . Furthermore, from Theorem 3, when $\alpha$ is large, all locally optimal controllers will be inside $\mathcal{B}$ . From Theorem 4, the objective function become convex over $\mathcal{B}$ for large enough $\alpha$ . The observations imply local and global solutions coincide. ∎

The theorems above rely on the “damping property” in Lemma 1. It is worth commenting that damping the system with $-I$ is almost the only continuation method for general system matrices $A$ that achieves the monotonic increasing of stable sets. Formally,

Theorem 5.

When $n\geq 3$ , for any $n$ -by- $n$ real matrix $H$ that is not a multiple of $-I$ , there exists a stable matrix $A$ for which $A+H$ is unstable.

The proof is given in the supplement. This theorem justifies the use of $-\alpha I$ as the continuation parameter. However, in a given system with structure, matrices other than $-I$ may be appropriate.

5 Numerical Experiments

In this section, we document various homotopy behaviors as the damping parameter $\alpha$ varies. The focus is on the evolution of locally optimal trajectories, which can be tracked by any local search methods. The experiments are performed on small-sized systems so the random initialization can find a reasonable number of distinct locally optimal solutions. Despite the small system dimension, the existence of many locally optimal solutions and their convoluted trajectories demonstrates what is possible in a theory of homotopy.

The local search methods we used is the simplest projected gradient descent. At a controller $K^{i}$ , we perform line search along the direction $\tilde{K}^{i}=-\nabla J(K)\circ I_{S}$ . The step size is determined with backtracking and Armijo rule, that is, we select $s^{i}$ as the largest number in $\{\bar{s},\bar{s}\beta,\bar{s}\beta^{2},...\}$ such that $K^{i}+s^{i}\tilde{K}^{i}$ is stabilizing while

[TABLE]

Our choice of parameters are $\alpha=0.001$ , $\beta=0.5$ , and $\bar{s}=1$ . We terminate the iteration when the norm of the gradient is less than $10^{-3}$ .

5.1 Systems with a large number of local minima

We first consider the examples from [7], where the feasible set is reasonably disconnected and admits many local minima. The system matrices are given by

[TABLE]

When the dimension $n$ is $4$ , it is known that the set of stabilizing decentralized controllers has at least $5$ connected components. We sample the initial controllers from $N(0,1)$ and, after 1000 samples, obtain $5$ initial optimal solutions. We gradually increase the damping parameter from [math] to $0.6$ with $0.002$ increment, and track the trajectories of locally optimal solutions by solving the newly damped system with the previous local optimal solution as the initialization. The evolution of the optimal cost and the distance from the best known optimal controller is plotted Figure 1. Notice that all sub-optimal local trajectories terminate after a modest damping $\alpha\approx 0.2$ . After that, the minimization algorithm always tracks a single trajectory. This illustrates the prediction of Corollary 1. Especially, if we start tracking a sub-optimal controller trajectory from $\alpha=0$ , we will be on the better trajectory when $\alpha\approx 0.2$ . At that time, if we gradually decrease $\alpha$ to zero, we obtain a stabilizing controller with a lower cost.

5.2 Experiments on Random Systems

With the same initialization and optimization procedure, we perform the experiments with $3$ -by- $3$ system matrices $A$ and $B$ randomly generated from the distribution $N(0,1)$ . For 92 out of 100 samples we are not able to find more than one locally optimal trajectory. Examples with more than one local trajectories are listed below. All figures to the left plot the cost of locally optimal controllers. All figures to the right plot the distance of the locally optimal controllers to the controller with the lowest cost. Note that the order of the cost of the trajectories may be preserved during the damping (Figure 2) and may also be disrupted (Figure 3). More than one trajectory may have the lowest cost during the damping (Figure 4).

Figure 5 shows a hysteresis-like loop as the damping coefficient is first decreased and then increased. The trajectory of the controller first leads up to large cost and, the local search method escapes this local minimum to another one with a smaller cost. As the damping decreases, it returns where it starts along a different route.

6 Conclusion

This paper studied the trajectory of locally and globally optimal solution to the optimal decentralized control problem as the damping of the decentralized control system varies. Asymptotic and continuity properties of trajectories are proved. The complicated phenomenon of continuation is illustrated with numerical examples. The fact that damping merges all locally optimal solutions is strong evidence that the idea of homotopy can be fruitfully used to improve locally optimal solutions.

Acknowledgments

The authors are grateful to Salar Fattahi and Cédric Josz for their constructive comments and feedback. The author thanks Yuhao Ding for sharing the implementation of local search algorithms.

Appendix A Notions of continuity

We recount the notion of upper and lower hemi-continuity and prove the continuity properties of the lower level-set map. The reader is referred to [16] for an accessible treatment.

Definition 2.

The set valued map $\Gamma:A\to B$ is said to be upper hemi-continuous (uhc) at a point $a$ if for any open neighborhood $V$ of $\Gamma(a)$ there exists a neighborhood $U$ of $a$ such that $\Gamma(U)\subseteq V$ .

If $B$ is compact, uhc is equivalent to the graph of $\Gamma$ being closed, that is, if $a_{n}\to a^{*}$ and $b_{n}\in\Gamma(a_{n})\to b^{*}$ , then $b^{*}\in\Gamma(a^{*})$ .

Definition 3.

The set valued map $\Gamma:A\to B$ is said to be lower hemi-continuous (lhs) at a point $a$ if for any open neighborhood $V$ intersecting $\Gamma(a)$ there exists a neighborhood $U$ of $a$ such that $\Gamma(x)$ intersects $V$ for all $x\in U$ .

Equivalently, for all $a_{m}\to a\in A$ and $b\in\Gamma(a)$ , there exists $a_{m_{k}}$ subsequence of $a_{m}$ and a corresponding $b_{k}\in\Gamma(a_{m_{k}})$ , such that $b_{k}\to b$ .

We prove the upper hemi-continuity of the lower level set map in Lemma 3 below.

Lemma 3.

Given matrices $A,B$ and the objective cost $J(K,\alpha)$ that satisfies the damping property. Define

[TABLE]

Assume that $\Gamma_{M}(\alpha)$ is not empty for all $\alpha\geq 0$ and a given $M>0$ , then $\Gamma_{M}(\alpha)$ is an upper hemi-continuous set-valued map.

Proof.

From [20], $\Gamma_{M}(\alpha)$ is compact for all $\alpha$ . From the damping property, for any $\alpha<\beta$ , we have $\Gamma_{M}(\alpha)\subseteq\Gamma_{M}(\beta)$ . Therefore, to characterize the continuity of $\Gamma$ at a $\alpha^{*}\geq 0$ , it suffices to consider the restricted map $\Gamma_{M}:[\alpha^{*}-\epsilon,\alpha^{*}+\epsilon]\to\Gamma_{M}(\alpha^{*}+\epsilon)$ for some $\epsilon>0$ , that is, to consider the range of $\Gamma_{M}$ to be compact. Therefore, the sequence characterization of uhc applies. Suppose $\alpha_{i}\to\alpha^{*}$ , pick a sequence of $K_{i}\in\Gamma_{M}(\alpha_{i})$ that converges to $K^{*}$ . The continuity of $J(K,\alpha)$ implies $J(K^{*},\alpha^{*})\leq M$ . The fact that the cost is bounded implies $A-\alpha^{*}I+BK$ is stable. Since subspaces of matrices are closed, $K^{*}\in\mathcal{S}$ . We have verified all conditions for $K^{*}\in\Gamma(\alpha^{*})$ , so $\Gamma_{M}$ is upper hemi-continuous. ∎

The lower hemi-continuity of $\Gamma_{M}$ is more subtle.

Lemma 4.

At any given $\alpha^{*}\geq 0$ , $\Gamma_{M}(\alpha)$ is lower hemi-continuous at $\alpha^{*}$ except when $M\in\{J(K,\alpha^{*}):K\in K^{\dagger}(\alpha^{*})\}$ , which is a finite set of locally optimal costs.

Proof.

Prove by contradiction, consider a sequence $\alpha_{i}\to\alpha^{*}$ and a $K^{*}\in\Gamma(\alpha^{*})$ , but there exists no subsequence of $\alpha_{i}$ and $K_{i}\in\Gamma(\alpha_{i})$ such that $K_{i}\to K^{*}$ . We must have $J(K^{*},\alpha^{*})=M$ — otherwise $J(K^{*},\alpha_{i})<M$ for large $i$ and, since the set of stabilizing controllers is open, $K^{*}\in\Gamma_{M}(\alpha_{i})$ for large $i$ . Furthermore, $K^{*}$ must be a local minimum of $J(K,\alpha^{*})$ — otherwise there exists a sequence $K_{j}\to K^{*}$ with $J(K_{j},\alpha^{*})<M$ and, by the continuity of $J$ , there exists as sequence of large enough indices $n_{j}$ such that $J(K_{j},\alpha_{n_{j}})<M$ ; the sequence $K_{j}\in\Gamma_{M}(\alpha_{n_{j}})$ converges to $K^{*}$ . The argument above suggests that $M$ belongs to the cost locally optimal controllers at $\alpha^{*}$ . Because $J(K,\alpha^{*})$ as a function over $K$ can be described as a linear function over an algebraic set, the value of local minimum is finite. ∎

Appendix B Convergence of locally optimal controllers

We prove the asymptotic properties of the locally optimal controllers in Section 4 of the main paper.

Theorem.

Suppose the sparsity pattern $I_{S}$ is block-diagonal with square blocks, and $R$ has the same sparsity pattern as $I_{S}$ . Then all points in $K^{\dagger}(\alpha)$ converges to the zero matrix as $\alpha\to\infty$ . Furthermore, $J(K,\alpha)\to 0$ as $\alpha\to\infty$ for all $K\in K^{\dagger}(\alpha)$ .

Proof.

Recall the expression of the objective function

[TABLE]

and the first order necessary conditions

[TABLE]

Those first order conditions can be used to characterize the objective function

[TABLE]

As $\alpha$ increases, some local solution may disappear, some new local solution may appear. The appearance cannot happen infinitely often because the equations (17)-(20) are algebraic. Suppose when $\alpha\geq\alpha_{0}$ , the number of local solutions does not change. The damping property ensures for $\beta>\alpha>\alpha_{0}$ ,

[TABLE]

The right hand side optimizes over a fixed, finite set of controllers and goes to zero as $\beta\to\infty$ from the formulation (16) and the dominated convergence theorem. The left hand side, therefore, also converges to zero as $\beta\to\infty$ . From (21) and the assumption that $D_{0}$ is positive definite, $\|P_{\beta}(K)\|\to 0$ for all $K\in K^{\dagger}(\beta)$ as $\beta\to\infty$ .

The assumption on sparsity allows the expression of the locally optimal controllers in (19) as

[TABLE]

Especially we can bound

[TABLE]

Pre- and post- multiply (18) by $L_{\alpha}(K)$ ’s unit minimum eigenvector $v$ ,

[TABLE]

Therefore

[TABLE]

This simplifies to

[TABLE]

Take the trace of (18) and consider the estimate

[TABLE]

where for clarity $L_{\alpha}$ denotes $L_{\alpha}(K)$ and $P_{\alpha}$ denotes $P_{\alpha}(K)$ . The second and the third inequalities use the fact that $|\operatorname*{tr}(AL)|\leq\|A\|\operatorname*{tr}(L)$ for a positive definite matrix $L$ and any matrix $A$ . This estimate, combined with previous argument that $\|P_{\alpha}\|\to 0$ , concludes $\|L_{\alpha}\|\to 0$ . We also obtain from the inequality that

[TABLE]

for small enough $P_{\alpha}$ . Combining (26) and (27)

[TABLE]

which converges to [math] as $\alpha\to\infty$ . ∎

Appendix C The Positive Definiteness of Hessian

Theorem.

For any given $r>0$ , the Hessian matrix $\nabla^{2}J(K,\alpha)$ is positive definite over $\|K\|\leq r$ for all large $\alpha$ .

Proof.

The proof requires the vectorized Hessian formula given in Lemma 3.7 of [18], restated below.

Lemma 5 ([18]).

Define $j_{\alpha}:\mathbb{R}^{m\cdot n}\to\mathbb{R}$ by $j_{\alpha}(vec(K))=J(K,\alpha)$ . The Hessian of $j_{\alpha}$ is given by the formula

[TABLE]

where

[TABLE]

and $P(n,n)$ is an $n^{2}\times n^{2}$ permutation matrix.

We show that $H_{\alpha}(K)$ in the lemma is positive definite for any fixed $K$ when $\alpha$ is large. Recall the definition of $L_{\alpha}$ and $K_{\alpha}$ .

[TABLE]

With triangle inequality

[TABLE]

which means $\|P_{\alpha}(K)\|\to 0$ and $\|L_{\alpha}(K)\|\to 0$ as $\alpha\to\infty$ . The minimum eigenvalue of $L_{\alpha}(K)$ can be bounded similarly: let $v$ be the unit eigenvector of $L_{\alpha}(K)$ corresponding to $\lambda_{\min}(L_{\alpha}(K))$ , pre- and post- multiply (28) by $v$ , we obtain

[TABLE]

The first Hessian term $L_{\alpha}(K)\otimes R$ can bounded from below with (30)

[TABLE]

We bound the norm of the second and the third Hessian term $\|G_{\alpha}(K)\|$ as follows, where $\lesssim$ hides constants that do not depend on $\alpha$ .

[TABLE]

Comparing the two estimates above, we find the first term dominates the two following terms with large $\alpha$ , uniformly over bounded $K$ . The Hessian $H_{\alpha}(K)$ is therefore positive definite over bounded $K$ when $\alpha$ is large. The conclusion carries over to the Hessian of the decentralized controller, which is a principal sub-matrix of the Hessian of the centralized controller. ∎

Appendix D The uniqueness of the continuation direction

This section aims to prove the following result

Theorem.

When $n\geq 3$ , for any $n$ -by- $n$ real matrix $H$ that is not a multiple of $-I$ , there exists a stable matrix $A$ for which $A+H$ is unstable.

Define the set of stable directions

[TABLE]

where $A$ and $H$ are $n$ -by- $n$ real matrices.

Lemma 6.

All matrices in $\mathcal{H}$ is similar to a diagonal matrix with non-positive diagonal entries. Especially, they cannot have complex eigenvalues.

Proof.

When $t$ is large, $A+tH$ is a small perturbation of $tH$ , hence the eigenvalues of $H$ has to be in the closed left half plane. With a suitable similar transform assume $H$ is in real Jordan form. First consider the case of two by two matrices, and we denote the matrices by $H_{2}$ and $A_{2}$ . Assume for contradiction that $H_{2}$ is not diagonalizable. The non-diagonal real Jordan form of $H_{2}$ has the following possibilities:

•

$H_{2}=\begin{bmatrix}h&1\\ 0&h\end{bmatrix}$ , where $H_{2}$ has real eigenvalues $h<0$ . Pick $A_{2}=\begin{bmatrix}4h&-2\\ 10h^{2}&-3h\end{bmatrix}$ , which is stable because $tr(A_{2})=h<0$ and $\det(A_{2})=8h^{2}>0$ . We have $A_{2}+tH_{2}=\begin{bmatrix}ht+4hby&t-2\\ 10h^{2}&ht-3h\end{bmatrix}$ , whose stability criterion $tr(A_{2}+tH_{2})<0$ and $\det(A_{2}+tH_{2})>0$ amounts to

[TABLE]

or equivalently $t\in(-1/2,1)\cup(8,+\infty)$ . Especially when $t=2$ , $A_{2}+tH_{2}$ is not stable.

•

$H_{2}=\begin{bmatrix}0&1\\ 0&0\end{bmatrix}$ . Pick a stable matrix $A=\begin{bmatrix}-1&0\\ 1&-1\end{bmatrix}$ . $A+tH$ is not stable when $t=2$ .

•

$H_{2}=\begin{bmatrix}0&f\\ -f&0\end{bmatrix}$ , where $f>0$ , Pick $A=\begin{bmatrix}-1&-4\\ 1&-1\end{bmatrix}$ , $A+\frac{2}{f}H_{2}=\begin{bmatrix}-1&-2\\ -1&-1\end{bmatrix}$ is not stable.

•

$H_{2}=\begin{bmatrix}h&f\\ -f&h\end{bmatrix}$ , where $h<0$ and $f>0$ . By rescaling assume $f=1$ . Consider the following matrix function

[TABLE]

We have

[TABLE]

Espeically,

[TABLE]

Hence as long as

[TABLE]

for small enough $\epsilon>0$ , $A_{2}=G(-\frac{1}{2}+\epsilon)$ is a stable matrix and there will be matrices $G(t)$ with $t>-\frac{1}{2}$ whose trace is negative and whose determinant is smaller. Consider the minimal value the determinant can take

[TABLE]

which means when

[TABLE]

The matrix $G(t)$ with $t=-\frac{1}{2}-\frac{hw}{1+h^{2}}$ is unstable. There certainly exist $u$ and $w$ that satisfies (33) and (34).

For general $n$ , $H$ ’s real Jordan form is an block upper-triangular matrix

[TABLE]

where $H_{2}$ can take the four possibilities mentioned above. We take the corresponding stable $A_{2}$ constructed above, which has the property that $A_{2}+t_{0}H_{2}$ is not stable for some $t_{0}>0$ . Form the block diagonal matrix

[TABLE]

Then $A$ is stable, while $A+t_{0}H=\begin{bmatrix}A_{2}+t_{0}H_{2}&*\\ 0&*\end{bmatrix}$ is not stable. ∎

We can strengthen the argument above and further characterize $\mathcal{H}$ in the case $n\geq 3$ .

Lemma 7.

When $n\geq 3$ , the set of stable directions $\mathcal{H}$ does not contain any matrices of rank $1$ , $2$ , …, $n-2$ .

Proof.

From lemma 6, we only need to consider the case where $H$ is diagonal with negative diagonal entries. Assume there is a rank one matrix $H\in\mathcal{H}$ , write

[TABLE]

where $H_{3}=\operatorname*{diag}(-1,0,0)$ . This is possible with the rank assumption. We will construct a stable $3$ -by- $3$ matrix $A_{3}$ , such that there is some $t_{0}>0$ that makes $A_{3}+t_{0}H_{3}$ unstable, and then carry the instability to $A+t_{0}H$ with the extended matrix

[TABLE]

From [7], the set

[TABLE]

has two disconnected components. Consider the Jordan decomposition of the matrix

[TABLE]

where $P$ is some invertible matrix. Write

[TABLE]

After this similar transform, the set $T$ can be written with $G(t,0)$ .

[TABLE]

Since $T$ is disconnected there exists some $t_{1}<t_{2}$ such that $G(t_{1})$ is stable, while $G(t_{2})$ is unstable with some eigenvalue in the right half plane. Setting $A_{3}=G(t_{1})$ and $t_{0}=t_{2}-t_{1}$ completes the proof. ∎

Since we can perturb the direction and make $H$ full-rank, the fact that $H$ has rank one is not the substantial property. This is indeed the case.

Lemma 8.

When $n\geq 3$ , $\mathcal{H}=\{-\lambda I,\lambda\geq 0\}$ .

Proof.

From lemma 6, we only need to consider the case where $H$ is diagonal with negative diagonal entries. Write

[TABLE]

where $H_{3}=\operatorname*{diag}(h_{1},h_{2},h_{3})$ . The diagonal entries $h_{i},i=1,2,3$ are non-positive and not all equal. We will construct an $A_{3}$ and a corresponding $t_{0}$ such that $A_{3}$ is stable while $A_{3}+t_{0}H_{3}$ is not stable, and extend to the general $A$ as in Lemma 7. The case where $H_{3}$ has rank $1$ has been considered in Lemma 7. We show the remaining rank is impossible. Without loss of generality we rescale $H_{3}$ and assume $h_{1}=-1$ .

•

$H_{3}=\operatorname*{diag}(-1,h_{2},0)$ , where $h_{2}<0$ . Consider the matrix function

[TABLE]

The characteristic polynomial of $G(t)$ is

[TABLE]

The Routh-Hurwitz Criterion insists

[TABLE]

which is simplified with $h_{2}<0$ to

[TABLE]

Especially, when $t=\frac{3}{2}$ , (36) simplifies to the obvious expression $\frac{1}{8}(11-15h_{2})>0$ . when $t=3$ , (35) implies $G(t)$ is not stable. Setting $A_{3}=G(\frac{3}{2})$ and $t_{0}=\frac{3}{2}$ concludes the proof.

•

$H_{3}=\operatorname*{diag}(-1,h_{2},h_{3})$ , where without loss of generally we assume

[TABLE]

Consider the matrix

[TABLE]

Its Routh-Hurwitz Criterion insists

[TABLE]

We claim that when

[TABLE]

the set of $t$ that satisfy Routh-Hurwitz Criterion is disconnected. To see this, write the positive local minimum of $f_{1}(t)$ in (38) as $t_{1}=\sqrt{\frac{1}{3}}$ , and write the positive local minimum of $f_{2}(t)$ in (39) as $t_{2}=\sqrt{\frac{h_{2}h_{3}}{3(1-h_{1})(1-h_{2})}}$ . The condition (37) ensures that $t_{1}<t_{2}$ and the condition (40) ensures that $f_{1}(t_{1})$ and $f_{2}(t_{2})$ are negative. Furthermore, consider $t_{0}=a\frac{h_{2}+h_{3}-h_{2}h_{3}}{h_{2}+h_{3}}$ , which is the root of $(1-h_{2})(1-h_{3})(-h_{2}-h_{3})f_{1}(t)-f_{2}(t)$ . It holds that $t_{1}<t_{0}<t_{2}$ and both $f_{1}(t_{0})$ and $f_{2}(t_{0})$ are positive, which implies that the positive intersection $f_{1}(t)$ and $f_{2}(t)$ are positive. We conclude that when $t=t_{0}$ , the matrix $G(t_{0})$ is stable, and when $t$ is large, $G(t)$ is again stable. Yet when $t=t_{2}\in(t_{0},\infty)$ , the matrix $G(t_{2})$ is not stable.

∎

Bibliography24

The reference list from the paper itself. Each links out to its DOI / PubMed record.

1[1] Vincent D. Blondel and John N. Tsitsiklis. A survey of computational complexity results in systems and control. Automatica , 36(9):1249–1274, 2000.
2[2] Stephen P. Boyd, L El Ghaoui, E Feron, and V Balakrishnan. Linear Matrix Inequalities in System and Control Theory , volume 15. 1994.
3[3] J. R. Broussard and N. Halyo. Active Flutter Control using Discrete Optimal Constrained Dynamic Compensators. In 1983 American Control Conference , pages 1026–1034, June 1983.
4[4] J.C. Doyle, K. Glover, P.P. Khargonekar, and B.A. Francis. State-space solutions to standard H-2 and H-infinity control problems. IEEE Transactions on Automatic Control , 34(8):831–847, 1989.
5[5] Emmanuel G. Collins Jr. and Debashis Sadhukhan. A comparison of descent and continuation algorithms for H 2 optimal, reduced-order control designs. International Journal of Control , 69(5):647–662, January 1998.
6[6] Maryam Fazel, Rong Ge, Sham M. Kakade, and Mehran Mesbahi. Global Convergence of Policy Gradient Methods for Linearized Control Problems. January 2018.
7[7] Han Feng and Javad Lavaei. On the Exponential Number of Connected Components for the Feasible Set of Optimal Decentralized Control Problems. In To Appear in Proceedings of the 2019 American Control Conference , page 8.
8[8] Ian Goodfellow, Yoshua Bengio, and Aaron Courville. Deep Learning . MIT Press, 2016.

TL;DR

Contribution

Findings

Abstract

Peer Reviews

Videos

Escaping Locally Optimal Decentralized Control Polices via Damping††thanks: Email: [email protected], [email protected]

Abstract

1 Introduction

2 Problem Formulation

Lemma 1**.**

Proof.

3 Continuity

Definition 1**.**

Lemma 2** (Berge Maximum Theorem).**

Theorem 1**.**

Proof.

Theorem 2**.**

Proof.

4 Asymptotic Properties

Theorem 3**.**

Theorem 4**.**

Corollary 1**.**

Proof.

Theorem 5**.**

5 Numerical Experiments

5.1 Systems with a large number of local minima

5.2 Experiments on Random Systems

6 Conclusion

Acknowledgments

Appendix A Notions of continuity

Definition 2**.**

Definition 3**.**

Lemma 3**.**

Proof.

Lemma 4**.**

Proof.

Appendix B Convergence of locally optimal controllers

Theorem**.**

Proof.

Appendix C The Positive Definiteness of Hessian

Theorem**.**

Proof.

Lemma 5** ([18]).**

Appendix D The uniqueness of the continuation direction

Theorem**.**

Lemma 6**.**

Proof.

Lemma 7**.**

Proof.

Lemma 8**.**

Proof.

Lemma 1.

Definition 1.

Lemma 2 (Berge Maximum Theorem).

Theorem 1.

Theorem 2.

Theorem 3.

Theorem 4.

Corollary 1.

Theorem 5.

Definition 2.

Definition 3.

Lemma 3.

Lemma 4.

Theorem.

Theorem.

Lemma 5 ([18]).

Theorem.

Lemma 6.

Lemma 7.

Lemma 8.