A duality-based approach for distributed min-max optimization with   application to demand side management

Ivano Notarnicola; Mauro Franceschelli; Giuseppe Notarstefano

arXiv:1703.08376·cs.DC·March 27, 2017

A duality-based approach for distributed min-max optimization with application to demand side management

Ivano Notarnicola, Mauro Franceschelli, Giuseppe Notarstefano

PDF

Open Access

TL;DR

This paper introduces a novel distributed algorithm for min-max optimization problems with applications to demand side management in smart grids, addressing challenges of coupling and structure that hinder traditional methods.

Contribution

It develops a duality-based distributed approach for complex min-max problems with coupled constraints, not solvable by standard dual decomposition techniques.

Findings

01

Algorithm is proven correct and convergent.

02

Numerical results demonstrate effectiveness in demand management.

03

Addresses challenges of double coupling in distributed optimization.

Abstract

In this paper we consider a distributed optimization scenario in which a set of processors aims at minimizing the maximum of a collection of "separable convex functions" subject to local constraints. This set-up is motivated by peak-demand minimization problems in smart grids. Here, the goal is to minimize the peak value over a finite horizon with: (i) the demand at each time instant being the sum of contributions from different devices, and (ii) the local states at different time instants being coupled through local dynamics. The min-max structure and the double coupling (through the devices and over the time horizon) makes this problem challenging in a distributed set-up (e.g., well-known distributed dual decomposition approaches cannot be applied). We propose a distributed algorithm based on the combination of duality methods and properties from min-max optimization. Specifically, we…

Equations65

z \in Z min subj. to f (z) g (z) ⪯ 0

z \in Z min subj. to f (z) g (z) ⪯ 0

μ max subj. to q (μ) μ ⪰ 0

μ max subj. to q (μ) μ ⪰ 0

z \in Z inf p μ ⪰ 0 sup L (z, μ) \geq μ ⪰ 0 sup z \in X inf p L (z, μ),

z \in Z inf p μ ⪰ 0 sup L (z, μ) \geq μ ⪰ 0 sup z \in X inf p L (z, μ),

L (z^{⋆}, μ) \leq L (z^{⋆}, μ^{⋆}) \leq L (z, μ^{⋆})

L (z^{⋆}, μ) \leq L (z^{⋆}, μ^{⋆}) \leq L (z, μ^{⋆})

z \in Z inf p w \in W sup ϕ (z, w) \geq w \in W sup z \in Z inf p ϕ (z, w)

z \in Z inf p w \in W sup ϕ (z, w) \geq w \in W sup z \in Z inf p ϕ (z, w)

w \in W sup z \in Z inf p ϕ (z, w) = z \in Z inf p w \in W sup ϕ (z, w)

w \in W sup z \in Z inf p ϕ (z, w) = z \in Z inf p w \in W sup ϕ (z, w)

z \in Z min f (z)

z \in Z min f (z)

\displaystyle z(t+1)=P_{Z}\Big{(}z(t)-\gamma(t)\widetilde{\nabla}f(z(t))\Big{)}

\displaystyle z(t+1)=P_{Z}\Big{(}z(t)-\gamma(t)\widetilde{\nabla}f(z(t))\Big{)}

t \to \infty lim p γ (t) = 0, t = 1 \sum \infty γ (t) = \infty, t = 1 \sum \infty γ (t)^{2} < \infty.

t \to \infty lim p γ (t) = 0, t = 1 \sum \infty γ (t) = \infty, t = 1 \sum \infty γ (t)^{2} < \infty.

x^{1}, \dots, x^{N} min subj. to s \in {1, \dots, S} max i = 1 \sum N g_{i s} (x_{s}^{i}) x^{i} \in X_{i}, i \in {1, \dots, N}

x^{1}, \dots, x^{N} min subj. to s \in {1, \dots, S} max i = 1 \sum N g_{i s} (x_{s}^{i}) x^{i} \in X_{i}, i \in {1, \dots, N}

x^{1}, \dots, x^{N}, P min subj. to P x^{i} \in X_{i}, i \in {1, \dots, N} i = 1 \sum N g_{i s} (x_{s}^{i}) \leq P, s \in {1, \dots, S} .

x^{1}, \dots, x^{N}, P min subj. to P x^{i} \in X_{i}, i \in {1, \dots, N} i = 1 \sum N g_{i s} (x_{s}^{i}) \leq P, s \in {1, \dots, S} .

\displaystyle\begin{split}\min_{x^{i},\rho^{i}}\>&\>\rho^{i}\\ \text{subj. to}\>&\>x^{i}\in X_{i}\\ &\>g_{is}(x^{i}_{s})+\sum_{j\in\mathcal{N}_{i}}\big{(}{\lambda^{ij}}(t)-{\lambda^{ji}}(t)\big{)}_{s}\leq\rho^{i},\\ &\hskip 113.81102pts\in\{1,\ldots,S\}\end{split}

\displaystyle\begin{split}\min_{x^{i},\rho^{i}}\>&\>\rho^{i}\\ \text{subj. to}\>&\>x^{i}\in X_{i}\\ &\>g_{is}(x^{i}_{s})+\sum_{j\in\mathcal{N}_{i}}\big{(}{\lambda^{ij}}(t)-{\lambda^{ji}}(t)\big{)}_{s}\leq\rho^{i},\\ &\hskip 113.81102pts\in\{1,\ldots,S\}\end{split}

λ^{ij} (t + 1) = λ^{ij} (t) - γ (t) (μ^{i} (t + 1) - μ^{j} (t + 1))

λ^{ij} (t + 1) = λ^{ij} (t) - γ (t) (μ^{i} (t + 1) - μ^{j} (t + 1))

μ \in^{S} max subj. to i = 1 \sum N q_{i} (μ) 1^{⊤} μ = 1, μ ⪰ 0

μ \in^{S} max subj. to i = 1 \sum N q_{i} (μ) 1^{⊤} μ = 1, μ ⪰ 0

q_{i} (μ)

q_{i} (μ)

μ^{1}, \dots, μ^{N} max subj. to i = 1 \sum N q_{i} (μ^{i}) 1^{⊤} μ^{i} = 1, μ^{i} ⪰ 0, i \in {1, \dots, N} μ^{i} = μ^{j}, for all (i, j) \in E .

μ^{1}, \dots, μ^{N} max subj. to i = 1 \sum N q_{i} (μ^{i}) 1^{⊤} μ^{i} = 1, μ^{i} ⪰ 0, i \in {1, \dots, N} μ^{i} = μ^{j}, for all (i, j) \in E .

\displaystyle\begin{split}\mathcal{L}_{2}(\mu^{1},\ldots,&\mu^{N},\{{\lambda^{ij}}\}_{(i,j)\in\mathcal{E}})\\ &=\sum_{i=1}^{N}\Big{(}q_{i}(\mu^{i})+\sum_{j\in\mathcal{N}_{i}}{\lambda^{ij}}^{\top}(\mu^{i}-\mu^{j})\Big{)}\end{split}

\displaystyle\begin{split}\mathcal{L}_{2}(\mu^{1},\ldots,&\mu^{N},\{{\lambda^{ij}}\}_{(i,j)\in\mathcal{E}})\\ &=\sum_{i=1}^{N}\Big{(}q_{i}(\mu^{i})+\sum_{j\in\mathcal{N}_{i}}{\lambda^{ij}}^{\top}(\mu^{i}-\mu^{j})\Big{)}\end{split}

\displaystyle\begin{split}\mathcal{L}_{2}(\mu^{1},\ldots,&\mu^{N},\{{\lambda^{ij}}\}_{(i,j)\in\mathcal{E}})\\ &=\sum_{i=1}^{N}\Big{(}q_{i}(\mu^{i})+{\mu^{i}}^{\top}\!\!\sum_{j\in\mathcal{N}_{i}}({\lambda^{ij}}-{\lambda^{ji}})\Big{)},\end{split}

\displaystyle\begin{split}\mathcal{L}_{2}(\mu^{1},\ldots,&\mu^{N},\{{\lambda^{ij}}\}_{(i,j)\in\mathcal{E}})\\ &=\sum_{i=1}^{N}\Big{(}q_{i}(\mu^{i})+{\mu^{i}}^{\top}\!\!\sum_{j\in\mathcal{N}_{i}}({\lambda^{ij}}-{\lambda^{ji}})\Big{)},\end{split}

{λ^{ij}}_{(i, j) \in E} min η ({λ^{ij}}_{(i, j) \in E}) = i = 1 \sum N η_{i} ({λ^{ij}, λ^{j i}}_{j \in N_{i}}),

{λ^{ij}}_{(i, j) \in E} min η ({λ^{ij}}_{(i, j) \in E}) = i = 1 \sum N η_{i} ({λ^{ij}, λ^{j i}}_{j \in N_{i}}),

η_{i} ({λ^{ij}, λ^{j i}}_{j \in N_{i}}) = 1^{⊤} μ^{i} = 1, μ^{i} ⪰ 0 max q_{i} (μ^{i}) + μ^{i}^{⊤} j \in N_{i} \sum (λ^{ij} - λ^{j i}) .

η_{i} ({λ^{ij}, λ^{j i}}_{j \in N_{i}}) = 1^{⊤} μ^{i} = 1, μ^{i} ⪰ 0 max q_{i} (μ^{i}) + μ^{i}^{⊤} j \in N_{i} \sum (λ^{ij} - λ^{j i}) .

\frac{\partial ~ η ({ λ ^{ij} } _{(i, j) \in E} )}{\partial λ ^{ij}} = μ^{i}^{⋆} - μ^{j}^{⋆},

\frac{\partial ~ η ({ λ ^{ij} } _{(i, j) \in E} )}{\partial λ ^{ij}} = μ^{i}^{⋆} - μ^{j}^{⋆},

\displaystyle{\mu^{k}}^{\star}\in\mathop{\operatorname{argmax}}_{\mathbf{1}^{\top}\mu^{k}=1,\mu^{k}\succeq 0}\bigg{(}q_{k}(\mu^{k})+{\mu^{k}}^{\top}\sum_{h\in\mathcal{N}_{k}}({\lambda^{kh}}-{\lambda^{hk}})\bigg{)},

\displaystyle{\mu^{k}}^{\star}\in\mathop{\operatorname{argmax}}_{\mathbf{1}^{\top}\mu^{k}=1,\mu^{k}\succeq 0}\bigg{(}q_{k}(\mu^{k})+{\mu^{k}}^{\top}\sum_{h\in\mathcal{N}_{k}}({\lambda^{kh}}-{\lambda^{hk}})\bigg{)},

μ^{i} max subj. to q_{i} (μ^{i}) + μ^{i}^{⊤} j \in N_{i} \sum (λ^{ij} (t) - λ^{j i} (t)) 1^{⊤} μ^{i} = 1, μ^{i} ⪰ 0.

μ^{i} max subj. to q_{i} (μ^{i}) + μ^{i}^{⊤} j \in N_{i} \sum (λ^{ij} (t) - λ^{j i} (t)) 1^{⊤} μ^{i} = 1, μ^{i} ⪰ 0.

λ^{ij} (t + 1) = λ^{ij} (t) - γ (t) (μ^{i} (t + 1) - μ^{j} (t + 1)) .

λ^{ij} (t + 1) = λ^{ij} (t) - γ (t) (μ^{i} (t + 1) - μ^{j} (t + 1)) .

\displaystyle\max_{\mathbf{1}^{\top}\mu^{i}=1,\mu^{i}\succeq 0}\!\bigg{(}\!\min_{x^{i}\in X_{i}}\sum_{s=1}^{S}\mu^{i}_{s}\Big{(}g_{is}(x^{i}_{s})\!+\!\!\sum_{j\in\mathcal{N}_{i}}\!\!({\lambda^{ij}}(t)\!-\!{\lambda^{ji}}(t))_{s}\!\Big{)}\!\!\bigg{)}.

\displaystyle\max_{\mathbf{1}^{\top}\mu^{i}=1,\mu^{i}\succeq 0}\!\bigg{(}\!\min_{x^{i}\in X_{i}}\sum_{s=1}^{S}\mu^{i}_{s}\Big{(}g_{is}(x^{i}_{s})\!+\!\!\sum_{j\in\mathcal{N}_{i}}\!\!({\lambda^{ij}}(t)\!-\!{\lambda^{ji}}(t))_{s}\!\Big{)}\!\!\bigg{)}.

\displaystyle\phi(x^{i},\mu^{i}):=\sum_{s=1}^{S}\mu^{i}_{s}\Big{(}g_{is}(x^{i}_{s})\!+\!\!\sum_{j\in\mathcal{N}_{i}}({\lambda^{ij}}(t)\!-\!{\lambda^{ji}}(t))_{s}\Big{)}

\displaystyle\phi(x^{i},\mu^{i}):=\sum_{s=1}^{S}\mu^{i}_{s}\Big{(}g_{is}(x^{i}_{s})\!+\!\!\sum_{j\in\mathcal{N}_{i}}({\lambda^{ij}}(t)\!-\!{\lambda^{ji}}(t))_{s}\Big{)}

\displaystyle\begin{split}&\max_{\mathbf{1}^{\top}\mu^{i}=1,\mu^{i}\succeq 0}\!\bigg{(}\!\min_{x^{i}\in X_{i}}\sum_{s=1}^{S}\mu^{i}_{s}\Big{(}g_{is}(x^{i}_{s})\!+\!\!\!\sum_{j\in\mathcal{N}_{i}}\!\!({\lambda^{ij}}(t)\!-\!{\lambda^{ji}}(t))_{s}\!\Big{)}\!\!\!\bigg{)}\\ &\!\!=\!\!\!\min_{x^{i}\in X_{i}}\!\!\bigg{(}\!\max_{\mathbf{1}^{\top}\mu^{i}=1,\mu^{i}\succeq 0}\sum_{s=1}^{S}\mu^{i}_{s}\Big{(}g_{is}(x^{i}_{s})\!+\!\!\!\sum_{j\in\mathcal{N}_{i}}\!\!({\lambda^{ij}}(t)\!-\!{\lambda^{ji}}(t))_{s}\!\Big{)}\!\!\!\bigg{)}\!.\end{split}

\displaystyle\begin{split}&\max_{\mathbf{1}^{\top}\mu^{i}=1,\mu^{i}\succeq 0}\!\bigg{(}\!\min_{x^{i}\in X_{i}}\sum_{s=1}^{S}\mu^{i}_{s}\Big{(}g_{is}(x^{i}_{s})\!+\!\!\!\sum_{j\in\mathcal{N}_{i}}\!\!({\lambda^{ij}}(t)\!-\!{\lambda^{ji}}(t))_{s}\!\Big{)}\!\!\!\bigg{)}\\ &\!\!=\!\!\!\min_{x^{i}\in X_{i}}\!\!\bigg{(}\!\max_{\mathbf{1}^{\top}\mu^{i}=1,\mu^{i}\succeq 0}\sum_{s=1}^{S}\mu^{i}_{s}\Big{(}g_{is}(x^{i}_{s})\!+\!\!\!\sum_{j\in\mathcal{N}_{i}}\!\!({\lambda^{ij}}(t)\!-\!{\lambda^{ji}}(t))_{s}\!\Big{)}\!\!\!\bigg{)}\!.\end{split}

\displaystyle\begin{split}\max_{\mu^{i}}\>&\>\sum_{s=1}^{S}\mu^{i}_{s}\Big{(}g_{is}(x^{i}_{s})+\sum_{j\in\mathcal{N}_{i}}({\lambda^{ij}}(t)-{\lambda^{ji}}(t))_{s}\Big{)}\\ \text{subj. to}\>&\>\mathbf{1}^{\top}\mu^{i}=1,\mu^{i}\succeq 0\end{split}

\displaystyle\begin{split}\max_{\mu^{i}}\>&\>\sum_{s=1}^{S}\mu^{i}_{s}\Big{(}g_{is}(x^{i}_{s})+\sum_{j\in\mathcal{N}_{i}}({\lambda^{ij}}(t)-{\lambda^{ji}}(t))_{s}\Big{)}\\ \text{subj. to}\>&\>\mathbf{1}^{\top}\mu^{i}=1,\mu^{i}\succeq 0\end{split}

ρ^{i} min

ρ^{i} min

g_{i s} (x_{s}^{i}) + j \in N_{i} \sum (λ^{ij} (t) - λ^{j i} (t))_{s} \leq ρ^{i}, s \in {1, \dots, S}

\dot{T}^{i} (τ) = - α (T^{i} (τ) - T_{o u t}^{i} (τ)) + Q x^{i} (τ),

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsSmart Grid Energy Management · Advanced Queuing Theory Analysis · Age of Information Optimization

Full text

A duality-based approach for distributed min-max optimization

with application to demand side management

Ivano Notarnicola1, Mauro Franceschelli2, Giuseppe Notarstefano1 The research leading to these results has received funding from the European Research Council (ERC) under the European Union’s Horizon 2020 research and innovation programme (grant agreement No 638992 - OPT4SMART) and from the Italian grant SIR “Scientific Independence of young Researchers”, project CoNetDomeSys, code RBSI14OF6H, funded by the Italian Ministry of Research and Education (MIUR). 1Ivano Notarnicola and Giuseppe Notarstefano are with the Department of Engineering, Università del Salento, Via Monteroni, 73100 Lecce, Italy, [email protected] Franceschelli (corresponding author) is with the Department of Electrical and Electronic Engineering, University of Cagliari, Piazza D’Armi, 09123 Cagliari, Italy, [email protected].

Abstract

In this paper we consider a distributed optimization scenario in which a set of processors aims at minimizing the maximum of a collection of “separable convex functions” subject to local constraints. This set-up is motivated by peak-demand minimization problems in smart grids. Here, the goal is to minimize the peak value over a finite horizon with: (i) the demand at each time instant being the sum of contributions from different devices, and (ii) the local states at different time instants being coupled through local dynamics. The min-max structure and the double coupling (through the devices and over the time horizon) makes this problem challenging in a distributed set-up (e.g., well-known distributed dual decomposition approaches cannot be applied). We propose a distributed algorithm based on the combination of duality methods and properties from min-max optimization. Specifically, we derive a series of equivalent problems by introducing ad-hoc slack variables and by going back and forth from primal and dual formulations. On the resulting problem we apply a dual subgradient method, which turns out to be a distributed algorithm. We prove the correctness of the proposed algorithm and show its effectiveness via numerical computations.

I Introduction

The addition of processing, measurement, communication and control capability to the electric power grid is leading to smart grids, in which smart generators, accumulators and loads can cooperate to execute Demand Side Management (DSM) programs [1]. The goal is to reduce the hourly and daily variations and peaks of electric demand by optimizing generation, storage and consumption. A widely adopted objective in DSM programs is Peak-to-Average Ratio (PAR), defined as the ratio between peak-daily and average-daily power demands. PAR minimization gives raise to a min-max optimization problem if the average daily electric load is assumed not to be affected by the demand response strategy.

In [2] the authors propose a game-theoretic model for PAR minimization and provide a distributed energy-cost-based strategy for the users. A noncooperative-game approach is also proposed in [3], where optimal strategies are characterized and a distributed scheme is designed based on a proximal decomposition algorithm. A key difference of the set-up in [2, 3], compared to the one proposed in our paper, is that in those works each agent needs to know the total load and tariffs in the power distribution system. Moreover, the agents do not cooperate to compute the strategy. In [4] a Model Predictive Control scheme is proposed to optimize micro-grid operations while satisfying a time-varying request and operation constraints using a mixed-integer linear model.

In this paper we propose a novel distributed optimization framework for min-max optimization problems commonly found in DSM problems. Differently from the references above, we consider a cooperative, distributed computation model in which the agents in the network do not have knowledge of aggregate quantities, communicate only with neighboring agents and perform local computations (with no central coordinator) to solve the optimization problem.

Duality is a widely used tool for distributed optimization algorithms as shown, e.g., in the tutorials [5, 6]. These standard approaches do not apply to the framework considered in this paper. In [7] a distributed consensus-based primal-dual algorithm is proposed to solve optimization problems with coupled global cost function and inequality constraints.

Min-max optimization is strictly related to saddle-point problems. In [8] the authors propose a subgradient method to generate approximate saddle-points. A min-max problem is also considered in [9] and a distributed algorithm based on a suitable penalty approach has been proposed. Another class of algorithms exploits the exchange of active constraints among the network nodes to solve constrained optimization problems which include min-max problems, [10, 11]. Although they work under asynchronous, directed communication they do not scale in set-ups as the one in this paper in which the terms of the max function are coupled. Very recently, in [12] the authors proposed a distributed projected subgradient method to solve constrained saddle-point problems with agreement constraints. Although our problem set-up fits in those considered in [12], our algorithmic approach and the analysis are different. In [13, 14] saddle point dynamics are used to design distributed algorithms for standard separable optimization problems.

The contribution of this paper is twofold. First, we propose a novel distributed optimization framework which is strongly motivated by peak power-demand minimization in DSM. The optimization problem has a min-max structure with local constraints at each node. Each term in the max function represents a daily cost (so that the maximum over a given horizon needs to be minimized), while the local constraints are due to the local dynamics and input bounds of the subsystems in the smart grid. The problem is challenging when approached in a distributed way since it is doubly coupled (each term of the max function is coupled among the agents, while the local constraints impose a coupling between different “days” in the time-horizon).

Second, as main paper contribution, we propose a distributed algorithm to solve this class of min-max optimization problems. The algorithm has a very simple and clean structure in which a primal minimization and a dual ascent step are performed. The primal problem has a similar structure to the centralized one. Despite this simple structure, which resembles standard distributed dual methods, the algorithm is not a standard decomposition scheme and the derivation of the algorithm is non-obvious. Specifically, the algorithm is derived by heavily resorting to duality theory and properties of min-max optimization (or saddle-point) problems. In particular, a sequence of equivalent problems is derived in order to decompose the originally coupled problem into locally-coupled subproblems, and thus being able to design a distributed algorithm. An interesting feature of the algorithm is its expression in terms of dual variables of two different problems and of the original primal variables. Since we apply duality more than once and on different problems, this property, although apparently intuitive, was not obvious a priori. Another appealing feature of the algorithm is that every limit point of the primal sequence at each node is a (feasible) optimal solution of the original optimization problem (although this is only convex and not strictly convex). This property is obtained by the minimizing sequence of the local primal subproblems without resorting to averaging schemes, [15]. Finally, since each node only computes the decision variable of interest, our algorithm can solve both large-scale (many agents are present) and big-data (a large horizon is considered) problems.

The paper is structured as follows. In Section II we provide some useful preliminaries on optimization, duality theory and subgradient methods. In Section III we formalize our distributed min-max optimization set-up and present the main contribution of the paper, a novel, duality based distributed optimization method. In Section IV we characterize its convergence properties. Finally, in Section V we corroborate the theoretical results with a numerical example involving peak power minimization in a smart-grid scenario.

Due to space constrains all proofs are omitted in this paper and will be provided in a forthcoming document.

II Preliminaries

II-A Optimization and Duality

Consider a constrained optimization problem, addressed as primal problem, having the form

[TABLE]

where $Z\subseteq^{N}$ is a convex and compact set, $f:^{N}\rightarrow$ is a convex function and each component $g_{s}:^{N}\rightarrow$ , $s\in\{1,\ldots,S\}$ , of $g$ is a convex function.

The following optimization problem

[TABLE]

is called the dual of problem (1), where $q:^{S}\rightarrow$ is obtained by minimizing with respect to $z\in Z$ the Lagrangian function $\mathcal{L}(z,\mu):=f(z)+\mu^{\top}g(z)$ , i.e., $q(\mu)=\min_{z\in Z}\mathcal{L}(z,\mu)$ . Problem (2) is well posed since the domain of $q$ is convex and $q$ is concave on its domain.

It can be shown that the following inequality holds

[TABLE]

which is called weak duality. When in (3) the equality holds, then we say that strong duality holds and, thus, solving the primal problem (1) is equivalent to solving its dual formulation (2). In this case the right-hand-side problem in (3) is referred to as saddle-point problem of (1).

Definition II.1.

A pair $(z^{\star},\mu^{\star})$ is called an primal-dual optimal solution of problem (1) if $z^{\star}\in Z$ and $\mu^{\star}\succeq 0$ , and $(z^{\star},\mu^{\star})$ is a saddle point of the Lagrangian, i.e.,

[TABLE]

for all $z\in Z$ and $\mu\succeq 0$ . $\square$

A more general min-max property can be stated. Let $Z\subseteq^{N}$ and $W\subseteq^{S}$ be nonempty convex sets. Let $\phi:Z\times W\to$ , then the following inequality

[TABLE]

holds true and is called the max-min inequality. When the equality holds, then we say that $\phi$ , $Z$ and $W$ satisfy the strong max-min property or the saddle-point property.

The following theorem gives a sufficient condition for the strong max-min property to hold.

Proposition II.2 ([16, Propositions 4.3]).

Let $\phi$ be such that (i) $\phi(\cdot,w):Z\to$ is convex and closed for each $w\in W$ , and (ii) $-\phi(z,\cdot):W\to$ is convex and closed for each $z\in Z$ . Assume further that $W$ and $Z$ are convex and compact sets. Then

[TABLE]

and the set of saddle points is nonempty and compact. $\square$

II-B Subgradient Method

Consider the following (constrained) optimization problem

[TABLE]

with $Z\subseteq^{N}$ a closed convex set and $f:^{N}\rightarrow$ convex. The (projected) subgradient method is the iterative algorithm

[TABLE]

where $t\in\mathbb{N}$ denotes the iteration index, $\gamma(t)$ is the step-size, $\widetilde{\nabla}f(z(t))$ denotes a subgradient of $f$ at $z(t)$ , and $P_{Z}(\cdot)$ is the Euclidean projection onto $Z$ .

Assumption 1.

The step-size $\gamma(t)\geq 0$ satisfies the following diminishing condition

[TABLE]

Proposition II.3 ([17, Proposition 3.2.6]).

Assume that the subgradients $\widetilde{\nabla}f(z)$ are bounded for all $z\in Z$ and the set of optimal solutions is nonempty. Let the step-size $\gamma(t)\geq 0$ satisfy the diminishing condition in Assumption 1. Then the subgradient method in (5) applied to problem (4) converges in objective value and sequence $z(t)$ converges to an optimal solution. $\square$

III Problem Set-up and Distributed

Optimization Algorithm

In this section we set-up the distributed min-max optimization problem and propose a distributed algorithm to solve it.

III-A Distributed min-max optimization set-up

We consider a network of $N$ processors which communicate according to a connected, undirected graph $\mathcal{G}=(\{1,\ldots,N\},\mathcal{E})$ , where $\mathcal{E}\subseteq\{1,\ldots,N\}\times\{1,\ldots,N\}$ is the set of edges. That is, the edge $(i,j)$ models the fact that node $i$ and $j$ exchange information. We denote by $\mathcal{N}_{i}$ the set of neighbors of node $i$ in the fixed graph $\mathcal{G}$ , i.e., $\mathcal{N}_{i}:=\left\{j\in\{1,\ldots,N\}\mid(i,j)\in\mathcal{E}\right\}$ .

Motivated by applications in Demand Side Management of Smart Grids, we introduce a min-max optimization problem to be solved by the network processors in a distributed way. Specifically, we associate to each processor $i$ a decision vector $x^{i}=[x^{i}_{1},\ldots,x^{i}_{S}]^{\top}\in^{S}$ , a constraint set $X_{i}\subseteq^{S}$ and local cost functions $g_{is}$ , $s\in\{1,\ldots,S\}$ , and set-up the following optimization problem

[TABLE]

where for each $i\in\{1,\ldots,N\}$ the set $X_{i}\subseteq^{S}$ is nonempty, convex and compact, and the functions $g_{is}:\to$ , $s\in\{1,\ldots,S\}$ , are convex.

Note that we use the superscript $i\in\{1,\ldots,N\}$ to indicate that a vector $x^{i}\in^{S}$ belongs to node $i$ , while we use the subscript to identify a vector component, i.e., $x^{i}_{s}$ , $s\in\{1,\ldots,S\}$ , is the $s$ -th component of $x^{i}$ .

Using a standard approach for min-max problems, we introduce an auxiliary variable $P$ to write the so called epigraph representation of problem (6), given by

[TABLE]

Notice that, this problem is convex, but not strictly convex. This means that it is not guaranteed to have a unique solution. This impacts on dual approaches when trying to recover a primal optimal solution, see e.g., [15] and references therein.

III-B Algorithm description

Next, we introduce our distributed optimization algorithm. Informally, the algorithm consists of a two-step procedure. First, each node $i\in\{1,\ldots,N\}$ stores a set of variables $((x^{i}$ , $\rho^{i}),\mu^{i})$ obtained as the primal-dual optimal solution pair of a local min-max optimization problem with a structure similar to the centralized problem. The coupling with the other nodes in the original formulation is replaced by a term depending on neighboring variables ${\lambda^{ij}}$ , $j\in\mathcal{N}_{i}$ . These variables are updated in the second step according to a suitable linear law weighting the difference of neighboring $\mu^{i}$ . Nodes use a diminishing step-size denoted by $\gamma(t)$ and can initialize the variables ${\lambda^{ij}}$ , $j\in\mathcal{N}_{i}$ to zero. In the next table we formally state our Primal Min-Max Dual Subgradient distributed algorithm from the perspective of node $i$ .

The structure of the algorithm and the meaning of the updates will be clear in the constructive analysis carried out in the next section. At this point we want to point out that although problem (8) has the same min-max structure of problem (7), $\rho^{i}$ is not a copy of the centralized cost $P$ , but rather a local contribution to that cost. That is, as we will see, the total cost $P$ will be the sum of the $\rho^{i}$ s.

IV Algorithm Analysis

The analysis of the proposed Primal Min-Max Dual Subgradient distributed algorithm is constructive and heavily relies on duality theory tools.

We start by deriving the equivalent dual problem of (7) which is formally stated in the next lemma.

Lemma IV.1.

The optimization problem

[TABLE]

where $\mathbf{1}:=[1,\ldots,1]^{\top}\in^{S}$ and

[TABLE]

is the dual of problem (7) and strong duality holds. $\square$

In order to make problem (10) amenable for a distributed solution, we can rewrite it in an equivalent form. To this end, we introduce copies of the common optimization variable $\mu$ and coherence constraints having the sparsity of the connected graph $\mathcal{G}$ , obtaining

[TABLE]

Notice that we have also duplicated the simplex constraint so that it becomes a local constraint for each node.

To solve this problem we can use a dual decomposition approach by designing a dual subgradient algorithm. This can be done since the constraints are convex and the cost function concave. A dual subgradient algorithm applied to problem (12) would immediately result into a distributed algorithm if functions $q_{i}$ were available in a closed form.

Intuition suggests that deriving the dual of a dual problem would somehow bring back to a primal formulation. However, we want to stress that:

(i)

problem (12) is dualized rather than problem (10), 2. (ii)

different constraints are dualized, namely the coherence constraints rather than the simplex ones.

We start deriving the dual subgradient algorithm by dualizing only the coherence constraints. Thus, we write the partial Lagrangian

[TABLE]

where ${\lambda^{ij}}\in^{S}$ for all $(i,j)\in\mathcal{E}$ are Lagrange multipliers associated to the constraints $\mu^{i}-\mu^{j}=0$ . By exploiting the undirected nature and the connectivity of communication graph $\mathcal{G}$ , after some algebraic manipulations, we get

[TABLE]

which is separable with respect to $\mu^{i}$ , $i\in\{1,\ldots,N\}$ .

The dual of problem (12) is thus

[TABLE]

where for all $i\in\{1,\ldots,N\}$

[TABLE]

In order to apply a subgradient method to problem (15), we recall, [18, Section 6.1], that

[TABLE]

where $\frac{\tilde{\partial}\eta(\cdot)}{\partial{\lambda^{ij}}}$ denotes the component associated to the variable ${\lambda^{ij}}$ of a subgradient of $\eta$ , and

[TABLE]

for $k=i,j$ . The dual subgradient algorithm for problem (12) can be summarized as follows, for each node $i\in\{1,\ldots,N\}$ :

(S1)

receive ${\lambda^{ji}}(t)$ , $j\in\mathcal{N}_{i}$ and compute a subgradient $\mu^{i}(t+1)$ by solving

[TABLE]

(S2)

exchange with neighbors the updated $\mu^{j}(t+1)$ , $j\in\mathcal{N}_{i}$ , and update ${\lambda^{ij}}$ , $j\in\mathcal{N}_{i}$ , via

[TABLE]

where $\gamma(t)$ denotes the step-size.

It is worth noting that in (17) the value of ${\lambda^{ij}}(t)$ and ${\lambda^{ji}}(t)$ , for $j\in\mathcal{N}_{i}$ , is fixed as highlighted by the index $t$ . Moreover, we want to stress, once again, that the algorithm is not implementable as it is written, since functions $q_{i}$ are not available in closed form. On this regard, here we slightly abuse notation since in (S1)-(S2) we use $\mu^{i}(t)$ as in the Primal Min-Max Dual Subgradient algorithm, but we have not proven the equivalence yet. Since we will prove it in the next lemmas we preferred not to overweight the notation.

Lemma IV.2.

The dual subgradient updates (S1)-(S2), with step-size $\gamma(t)$ satisfying Assumption 1, generate sequences $\{{\lambda^{ij}}(t)\}$ , $(i,j)\in\mathcal{E}$ that converge in objective value to $\eta^{\star}=q^{\star}=P^{\star}$ , optimal costs of (15), (10) and (7), respectively. $\square$

We can explicitly rephrase update (17) by plugging in the definition of $q_{i}$ , given in (11), thus obtaining the following max-min optimization problem

[TABLE]

Notice that this is a local problem at each node $i$ once the value for ${\lambda^{ij}}(t)$ and ${\lambda^{ji}}(t)$ for all $j\in\mathcal{N}_{i}$ are given.

Lemma IV.3.

Max-min optimization problem (18) is the saddle point problem associated to problem (8). Moreover, a primal-dual optimal solution pair of (8), call it $\{(x^{i}(t+1),\rho^{i}(t+1)),\mu^{i}(t+1)\}$ , exists and $(x^{i}(t+1),\mu^{i}(t+1))$ is a solution of (18).

*Proof: * We give a constructive proof which clarifies how problem (8) is derived from (18). Define

[TABLE]

and note that (i) $\phi(\cdot,\mu^{i})$ is closed and convex for all $\mu^{i}\succeq 0$ and (ii) $\phi(x^{i},\cdot)$ is closed and concave (linear over the compact $\mathbf{1}^{\top}\mu^{i}=1$ , $\mu^{i}\succeq 0$ ), for all $x^{i}\in^{S}$ . Thus we can invoke the saddle point Proposition II.2 which allows us to switch the max and min operators, and write

[TABLE]

Since the inner maximization problem depends nonlinearly on $x^{i}$ (which is itself an optimization variable), the solution cannot be obtained without considering the optimization also on $x^{i}$ . We overcome this issue by substituting the inner maximization problem with its equivalent dual. Notice that the inner problem is a linear program when $x^{i}$ are kept fixed, and thus strong duality can be exploited. Introducing a scalar multiplier $\rho^{i}$ associated to the simplex constraint, we have

[TABLE]

is equivalent to its dual

[TABLE]

where the $S$ inequality constraints follow from the minimization of the partial Lagrangian of (21) with respect to $\mu^{i}\succeq 0$ . Plugging formulation (22) in place of the inner maximization in (20), we can write a joint minimization, i.e., minimize simultaneously with respect to $x^{i}$ and $\rho^{i}$ , which leads to (8).

To prove the second part, notice that problem (8) is convex. Then, the problem satisfies the Slater’s constraint qualification and, thus, strong duality holds. Therefore, a primal-dual optimal solution pair $(x^{i}(t+1),\rho^{i}(t+1),\mu^{i}(t+1))$ exists and from the previous arguments the proof follows. $\square$

We point out that the previous lemma shows that performing minimization in (8) turns out to be equivalent to performing step (S1).

We are now ready to state the main result of the paper, namely the convergence of the Primal Min-Max Dual Subgradient distributed algorithm.

Theorem IV.4.

Let $\{(x^{i}(t),\rho^{i}(t))\}$ , $i\in\{1,\ldots,N\}$ , be the sequence generated by the Primal Min-Max Dual Subgradient distributed algorithm, with $\gamma(t)$ satisfying Assumption 1. Then, the sequence $\{\sum_{i=1}^{N}\rho^{i}(t)\}$ converges to the optimal cost $P^{\star}$ of (6) and every limit point of the sequence $\{x^{i}(t)\}$ , $i\in\{1,\ldots,N\}$ , is an optimal (feasible) solution of (6). $\square$

Remark IV.5.

From condition (20) it can be shown that each $\rho^{i}(t)$ is equal to $\eta_{i}(\{{\lambda^{ij}},{\lambda^{ji}}\}_{j\in\mathcal{N}_{i}})$ for all $t\geq 0$ . Since the optimal cost of (15) is equal to the optimal primal cost $P^{\star}$ , then we have that $\operatornamewithlimits{lim\vphantom{p}}_{t\to\infty}\sum_{i}\rho^{i}(t)=\operatornamewithlimits{lim\vphantom{p}}_{t\to\infty}\sum_{i}\eta_{i}(t)=P^{\star}$ . $\square$

V Numerical Simulations

In this section we propose a numerical example in which we apply the proposed method to a network of Thermostatically Controlled Loads (TCLs) (such as air conditioners, heat pumps, electric water heaters), [19].

The dynamical model of the $i$ -th device is given by

[TABLE]

where $T_{i}(\tau)\geq 0$ is the temperature, $\alpha>0$ is a parameter depending on geometric and thermal characteristics, $T^{i}_{out}(\tau)$ is the air temperature outside the device, $x^{i}(\tau)\in\left[0,1\right]$ is the control input, and $Q>0$ is a scaling factor.

We consider a discretized version of the system with constant input over the sampling interval $\Delta\tau$ , i.e., $x^{i}(\tau)=x^{i}_{s}$ for $\tau\in\left[s\Delta\tau,(s+1)\Delta\tau\right)$ , and sampled state $T^{i}_{s}$ ,

[TABLE]

We assume that the power consumption $g_{is}(x^{i}_{s})$ of the $i$ -th device in the $s$ -th slot $[s\Delta\tau,(s+1)\Delta\tau]$ is directly proportional to $x^{i}_{s}$ . For the sake of simplicity we consider $g_{is}(x^{i}_{s})=x^{i}_{s}$ in the numerical example proposed in this section. Thus, optimization problem (6) for this scenario is

[TABLE]

where $X_{i}:=\{x^{i}\in^{S}\mid A_{i}x^{i}\preceq b_{i}\text{ and }x^{i}\in[0,1]^{S}\}$ , with $A_{i}$ and $b_{i}$ obtained by enforcing the dynamics constraints (24) and temperature constraints $T^{i}_{s}\in\left[T_{min},T_{max}\right]$ .

In the proposed numerical example we consider $N=15$ agents communicating according to an undirected connected Erdős-Rényi random graph $\mathcal{G}$ with parameter $0.2$ . We consider a horizon of $S=50$ . Finally, a diminishing step-size sequence $\gamma(t)=(\frac{1}{t})^{0.8}$ at iteration $t$ , which satisfies Assumption 1, is used.

In Figure 1 we show the evolution at each algorithm iteration $t$ of the local objective functions $\rho^{i}(t)$ , $i\in\{1,\ldots,N\}$ , (solid lines) which converge to stationary values. We also plot their sum $\sum_{i=1}^{N}\rho^{i}(t)$ (dotted line) that asymptotically converges to the centralized optimal cost $P^{\star}$ of problem (25) (see Remark IV.5).

In Figure 2 it is shown the profile of an optimal consumption of the devices, i.e., $\sum_{i=1}^{N}{x^{i}_{s}}^{\star}$ , over the horizon $s=1,\ldots,S$ . It can be seen that the proposed method effectively shaves off the peak power demand. In the same figure it also shown an optimal consumption strategy ${x^{i}}^{\star}$ that each single device locally computes.

Finally, in Figure 3 it is shown the convergence rate of the distributed algorithm, i.e., the difference between the centralized optimal cost $P^{\star}$ and the sum of the local costs $\sum_{i=1}^{N}\rho^{i}(t)$ , in logarithmic scale. It can be seen that the proposed algorithm converges to the optimal cost with sublinear rate $O(1/\sqrt{t})$ as expected for a subgradient method.

VI Conclusions

In this paper we have introduced a novel distributed min-max optimization framework motivated by peak minimization problems in Demand Side Management. Standard distributed optimization algorithms cannot be applied to this problem set-up due to a highly nontrivial coupling in the objective function and in the constraints. We proposed a distributed algorithm based on the combination of duality methods and properties from min-max optimization. We proved the correctness of the proposed algorithm and corroborated the theoretical results with a numerical example.

Bibliography19

The reference list from the paper itself. Each links out to its DOI / PubMed record.

1[1] M. Alizadeh, X. Li, Z. Wang, A. Scaglione, and R. Melton, “Demand-side management in the smart grid: Information processing for the power switch,” IEEE Signal Processing Magazine , vol. 29, no. 5, pp. 55–67, 2012.
2[2] A.-H. Mohsenian-Rad, V. W. Wong, J. Jatskevich, R. Schober, and A. Leon-Garcia, “Autonomous demand-side management based on game-theoretic energy consumption scheduling for the future smart grid,” IEEE Transactions on Smart Grid , vol. 1, no. 3, pp. 320–331, 2010.
3[3] I. Atzeni, L. G. Ordóñez, G. Scutari, D. P. Palomar, and J. R. Fonollosa, “Demand-side management via distributed energy generation and storage optimization,” IEEE Transactions on Smart Grid , vol. 4, no. 2, pp. 866–876, 2013.
4[4] A. Parisio, E. Rikos, and L. Glielmo, “A model predictive control approach to microgrid operation optimization,” IEEE Transactions on Control Systems Technology , vol. 22, no. 5, pp. 1813–1827, 2014.
5[5] D. P. Palomar and M. Chiang, “A tutorial on decomposition methods for network utility maximization,” IEEE Journal on Selected Areas in Communications , vol. 24, no. 8, pp. 1439–1451, 2006.
6[6] B. Yang and M. Johansson, “Distributed optimization and games: A tutorial overview,” in Networked Control Systems . Springer, 2010, pp. 109–148.
7[7] T.-H. Chang, A. Nedić, and A. Scaglione, “Distributed constrained optimization by consensus-based primal-dual perturbation method,” IEEE Transactions on Automatic Control , vol. 59, no. 6, pp. 1524–1538, 2014.
8[8] A. Nedić and A. Ozdaglar, “Subgradient methods for saddle-point problems,” Journal of optimization theory and applications , vol. 142, no. 1, pp. 205–228, 2009.

TL;DR

Contribution

Findings

Abstract

Peer Reviews

Videos

Taxonomy

A duality-based approach for distributed min-max optimization

Abstract

I Introduction

II Preliminaries

II-A Optimization and Duality

Definition II.1**.**

Proposition II.2** ([16, Propositions 4.3]).**

II-B Subgradient Method

Assumption 1**.**

Proposition II.3** ([17, Proposition 3.2.6]).**

III Problem Set-up and Distributed

III-A Distributed min-max optimization set-up

III-B Algorithm description

IV Algorithm Analysis

Lemma IV.1**.**

Lemma IV.2**.**

Lemma IV.3**.**

Theorem IV.4**.**

Remark IV.5**.**

V Numerical Simulations

VI Conclusions

Definition II.1.

Proposition II.2 ([16, Propositions 4.3]).

Assumption 1.

Proposition II.3 ([17, Proposition 3.2.6]).

Lemma IV.1.

Lemma IV.2.

Lemma IV.3.

Theorem IV.4.

Remark IV.5.