Building Temperature Control: A Distributed Escort Dynamical Approach

M. Sawant; J. Moyalan; J. Koonamparampath; A. Sheikh; S. Wagh; and N.; Singh

arXiv:1908.05048·math.OC·August 15, 2019

Building Temperature Control: A Distributed Escort Dynamical Approach

M. Sawant, J. Moyalan, J. Koonamparampath, A. Sheikh, S. Wagh, and N., Singh

PDF

Open Access

TL;DR

This paper introduces a distributed escort dynamical approach for building temperature control, optimizing resource allocation among rooms with constraints, and ensuring robustness and smooth temperature regulation.

Contribution

It develops a novel distributed escort dynamical model that incorporates local constraints and reduces central dependency in multi-agent resource allocation.

Findings

01

DED provides smooth temperature trajectories.

02

DED outperforms distributed interior point method.

03

Robustness characterized by evolutionary stable strategies.

Abstract

The constrained multi-agent optimization problem of distributed resource allocation is addressed using the evolutionary game theoretic framework. The issue of building temperature control is analyzed in which the controller is to devise a scheme to distribute available scarce power to every room to regulate their temperature as per the comfort of user in the best possible manner. The paper correlates the global constraint of fixed resource amount with the constant population size. The respective population game is evaluated by means of a dynamical model of the evolutionary game theory to find the necessary control action. The robustness of optimal solution with respect to minor fluctuations in the temperature distribution is characterized using evolutionary stable strategy ( ESS ). Along with the global constraint, the problem formulation constitutes local constraint over an individual…

Tables1

Table 1. TABLE I : List of Symbols

$N$	Total number of zones within the building
$N$	( rooms and walls )
$ℛ$	Set comprising of rooms
$𝒲$	Set comprising of walls
$𝒵$	Set representing all the zones
$α_{i, j}$	Thermal condunctance between $i^{t h}$ and $j^{t h}$ entity
$u_{i}$	Output power of $i^{t h}$ actuator unit
$U$	Total power generation at every instance
$k$	Total number of rooms
$𝒕^{𝒔 𝒆 𝒕}$	Set point temperature profiles
$𝒕^{𝒂}$	Ambient temperature profile
$𝒅^{𝒊}$	Disturbance profile causing temperature deviations
$𝒅^{𝒊}$	at $i^{t h}$ instance
$𝒖^{𝒍 𝒐}$	Lower bounds on the individual actuator outputs
$𝒖^{𝒍 𝒐}$	( local constraints )
$𝒖^{𝒖 𝒑}$	Upper bounds on the individual actuator outputs
$𝒖^{𝒖 𝒑}$	( local constraints )
$𝒢$	Graph ensuring information transfer among
$𝒢$	the neighbouring actuator mechanisms
$𝒕^{𝒊}$	Actual temperature profiles of zones at $i^{t h}$ instance
$𝒙^{𝒊}$	Population distribution at $i^{t h}$ instance
$f_{j}^{i}$	Payoff obtained for employing $j^{t h}$ strategy at $i^{t h}$ instance
$𝒙^{𝒍 𝒐}$	Upper bounds on the population proportions
$𝒙^{𝒖 𝒑}$	Lower bounds on the population proportions
$𝜼$	Escort function accommodating $𝒙^{𝒍 𝒐}$
$𝝃$	Escort function accommodating $𝒙^{𝒖 𝒑}$

Equations76

l_{ij} = {\sum_{j \in S} ρ_{ij}, - ρ_{ij}, if i = j if i \neq = j

l_{ij} = {\sum_{j \in S} ρ_{ij}, - ρ_{ij}, if i = j if i \neq = j

θ_{i} \dot{t}_{i} = j = 1 \sum N α_{i, j} (t_{j} - t_{i}) + α_{i, a} (t_{i}^{a} - t_{i}) + v_{i} (u_{i} + d_{i}), \forall i \in Z

θ_{i} \dot{t}_{i} = j = 1 \sum N α_{i, j} (t_{j} - t_{i}) + α_{i, a} (t_{i}^{a} - t_{i}) + v_{i} (u_{i} + d_{i}), \forall i \in Z

i = 1 \sum k u_{i} = U

i = 1 \sum k u_{i} = U

u_{i}^{l o} \leq u_{i} \leq u_{i}^{u p}, \forall i \in R

u_{i}^{l o} \leq u_{i} \leq u_{i}^{u p}, \forall i \in R

i = 1 \sum k + 1 u_{i} = U

i = 1 \sum k + 1 u_{i} = U

minimize i = 1 \sum k \frac{( t _{i} - t _{i}^{se t} ) ^{2}}{2}, \forall i \in A

minimize i = 1 \sum k \frac{( t _{i} - t _{i}^{se t} ) ^{2}}{2}, \forall i \in A

minimize \frac{( t _{i} - t _{i}^{se t} ) ^{2}}{2}, \forall i \in A

minimize \frac{( t _{i} - t _{i}^{se t} ) ^{2}}{2}, \forall i \in A

D_{i}^{P} : {\dot{t}_{i} = g (t, u_{i}) f_{i} = h (t, u_{i})

D_{i}^{P} : {\dot{t}_{i} = g (t, u_{i}) f_{i} = h (t, u_{i})

D_{i}^{C} : \overset{x}{˙}_{i} = q (f_{i}, f_{j}), \forall j \in N_{i}

D_{i}^{C} : \overset{x}{˙}_{i} = q (f_{i}, f_{j}), \forall j \in N_{i}

E r r D_{i}^{P} : {\overset{e}{˙}_{t_{i}} = e_{g} (e_{t}, e_{u}) e_{f_{i}} = e_{h} (e_{t}, e_{u})

E r r D_{i}^{P} : {\overset{e}{˙}_{t_{i}} = e_{g} (e_{t}, e_{u}) e_{f_{i}} = e_{h} (e_{t}, e_{u})

E r r D_{i}^{C} : \overset{e_{u_{i}}}{˙} = e_{q} (e_{f}, e_{u})

E r r D_{i}^{C} : \overset{e_{u_{i}}}{˙} = e_{q} (e_{f}, e_{u})

\overset{x}{˙}_{i}^{n e x t} = D (x_{i}^{p r ese n t}, f_{i}^{p r ese n t})

\overset{x}{˙}_{i}^{n e x t} = D (x_{i}^{p r ese n t}, f_{i}^{p r ese n t})

x_{i}^{*} f_{i}^{*} > x_{i} f_{i} for all i = 1, 2, ..., k

x_{i}^{*} f_{i}^{*} > x_{i} f_{i} for all i = 1, 2, ..., k

Δ_{k} = {x \in \mathds R^{k} ∣ i = 1 \sum k x_{i} = 1, x_{i} \geq 0}

Δ_{k} = {x \in \mathds R^{k} ∣ i = 1 \sum k x_{i} = 1, x_{i} \geq 0}

Γ = {x_{i}^{l o} \leq x_{i} \leq x_{i}^{u p}}, for all i = 1, 2, ..., k

Γ = {x_{i}^{l o} \leq x_{i} \leq x_{i}^{u p}}, for all i = 1, 2, ..., k

\overset{x_{i}}{˙} = ϕ (x_{i}) (f_{i} (x) - f_{ϕ}) for all i = 1, 2, ..., k

\overset{x_{i}}{˙} = ϕ (x_{i}) (f_{i} (x) - f_{ϕ}) for all i = 1, 2, ..., k

f_{ϕ} = \frac{1}{Φ ( x )} i = 1 \sum k ϕ_{k} (x_{i}) f_{i} (x)

f_{ϕ} = \frac{1}{Φ ( x )} i = 1 \sum k ϕ_{k} (x_{i}) f_{i} (x)

\hat{ϕ} (x) = \frac{1}{Φ x} [ϕ (x_{1}), ϕ (x_{2}), ..., ϕ (x_{k})]

\hat{ϕ} (x) = \frac{1}{Φ x} [ϕ (x_{1}), ϕ (x_{2}), ..., ϕ (x_{k})]

i = 1 \sum k \overset{x}{˙}_{i} = i = 1 \sum k x_{i} f_{i} (x) - f_{ϕ} i = 1 \sum k x_{i} = 0

i = 1 \sum k \overset{x}{˙}_{i} = i = 1 \sum k x_{i} f_{i} (x) - f_{ϕ} i = 1 \sum k x_{i} = 0

\overset{x}{˙}_{i} = ϕ (x_{i}) j \in N_{i} \sum ϕ (x_{j}) [f_{j} (x) - f_{i} (x)] for all i = 1, 2, ..., k

\overset{x}{˙}_{i} = ϕ (x_{i}) j \in N_{i} \sum ϕ (x_{j}) [f_{j} (x) - f_{i} (x)] for all i = 1, 2, ..., k

\overset{x}{˙}_{i} = j = 1 \sum k ρ_{i, j} [f_{j} (x) - f_{i} (x)] for all i = 1, 2, ..., k

\overset{x}{˙}_{i} = j = 1 \sum k ρ_{i, j} [f_{j} (x) - f_{i} (x)] for all i = 1, 2, ..., k

ρ_{i, j}

ρ_{i, j}

ρ_{i, j}

\displaystyle\sum_{i=1}^{k}\dot{x}_{i}=\sum_{i=1}^{k}\bigg{[}\sum_{j=1}^{k}\rho_{i,j}[f_{j}(\boldsymbol{x})-f_{i}(\boldsymbol{x})]\bigg{]}=

\displaystyle\sum_{i=1}^{k}\dot{x}_{i}=\sum_{i=1}^{k}\bigg{[}\sum_{j=1}^{k}\rho_{i,j}[f_{j}(\boldsymbol{x})-f_{i}(\boldsymbol{x})]\bigg{]}=

ρ_{1, 1} (f_{1} - f_{1}) + ρ_{1, 2} (f_{2} - f_{1}) + ... + ρ_{1, k} (f_{k} - f_{1})

+ ρ_{2, 1} (f_{1} - f_{2}) + ρ_{2, 2} (f_{2} - f_{2}) + ... + ρ_{2, k} (f_{k} - f_{2})

:

+ ρ_{k, 1} (f_{1} - f_{k}) + ρ_{k, 2} (f_{2} - f_{k}) + ... + ρ_{k, k} (f_{k} - f_{k})

i = 1 \sum k \overset{x}{˙}_{i} = 0

i = 1 \sum k \overset{x}{˙}_{i} = 0

Δ_{k}^{l o}

Δ_{k}^{l o}

Δ_{k}^{u p}

S^{l o}

S^{l o}

S^{u p}

Δ_{k}^{l o}

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsBuilding Energy and Comfort Optimization · Energy Efficiency and Management · Greenhouse Technology and Climate Control

Full text

Building Temperature Control: A Distributed Escort Dynamical approach

M. Sawant, J. Moyalan, J. Koonamparampath, A. Sheikh, S. Wagh, and N. Singh

EED, VJTI, Mumbai, India M. Sawant, J. Moyalan, J. Koonamparampath, A. Sheikh, S. Wagh and N. Singh are with the Electrical Engineering Department, Veermata Jijabai Technological Institute, Mumbai 400019, INDIA [email protected]

Abstract

The constrained multi-agent optimization problem of distributed resource allocation is addressed using the evolutionary game theoretic framework. The issue of building temperature control is analyzed in which the controller is to devise a scheme to distribute available scarce power to every room to regulate their temperature as per the user’s comfort in the best possible manner. The paper correlates the global constraint of fixed resource amount with the constant population size. The respective population game is evaluated by means of a dynamical model of the evolutionary game theory to find the necessary control action. The robustness of optimal solution with respect to minor fluctuations in the temperature distribution is characterized using evolutionary stable strategy (ESS). Along with the global constraint, the problem formulation constitutes local constraint over an individual control unit located in every room. The classical dynamical models of evolutionary game theory such as replicator dynamics, logit dynamics, etc. fail to incorporate respective constraints. With the escort evolutionary dynamical (ED) model it is possible to address these local constraints through the concept of the intersection of simplices. However, evaluation of these classical dynamics along with ED is driven by expected payoff obtained by the overall population, which renders a centralized implementation approach. To mitigate this central dependency a distributed version of the ED model referred to as distributed escort dynamics (DED) is proposed. The control action devised adopting DED approach is shown to provide smooth trajectory tracking along with the low start-up transience when compared with distributed interior point (DIP) method.

Index Terms:

Consensus problems, Distributed algorithm, Escort dynamics, Graph theory, Resource allocation.

I Introduction

The rapid speed of urbanization has resulted in an ever-increasing number of commercial as well as residential building infrastructures. The energy efficient operation of these buildings is very crucial not only from financial but also from environmental perspective [1],[2]. The variety of building maintenance and regulation processes are taken care of by using heating, ventilation and air conditioning (HVAC) systems [3],[4]. Of these, the process of thermal regulation results in the majority of energy consumption.

In building temperature control (BTC) formulation [5], the main goal of an HVAC system is to regulate the temperature within the rooms as per respective residents’ comfort, while optimizing total energy consumption. The limited availability of the energy presents BTC issue as a multi-objective, multi-agent resource allocation problem. In this, the control mechanism installed within every room acts as a different agent whose objective is to regulate respective room temperature around resident’s desired setpoint. Mainly there are two ways to address such multi-agent optimization problems, i.e. centralized and distributed.

Though centralised approach in which central entity gathers system information and provides an exact allocation scheme, it is a very time-consuming process. Also, the dependency on a central authority raises security and privacy concerns. Moreover, it is not always possible to satisfy the requirement of central authority in distributed scenario [6]. On the other hand, the distributed approach, instead of depending on a global state, solves the optimization problem based on the locally available information [7] of the respective agent’s neighbourhood. The distributed method provides consensus-based approach [5] in which a fair play is established ensuring equal payoff to every agent.

The paper exploits the interacting multi-agent perspective to propose an evolutionary game theory based escort dynamical (ED) approach [8] to address BTC problem. Unlike classical evolutionary dynamics, the ED model is shown to accommodate bounds on the individual control dynamics [9] along with global resource constraint through the intersection of simplices. Moreover, based on a distributed version of the classical evolutionary model discussed in [10], a consensus-based distributed implementation of ED model referred to as distributed escort dynamics (DED) model is proposed. This model with the property of positive invariantness is shown to converge to an equilibrium state corresponding to equal payoff to every agent.

The performance evaluation of DED implementation is carried out through comparative analysis with widely used resource allocation protocol [5], distributed interior point (DIP) method based BTC problem implementation. It has been shown that DED approach provides smooth trajectory tracking along with a low start-up transience as compared to the DIP approach. The major contributions of the proposed research work are:

•

Implementation of an evolutionary game theoretical based approach to address resource allocation problem of BTC.

•

Distributed version ED model has been proposed and convergence analysis of the same has been provided.

The rest of the paper is structured as follows. Section II describes the BTC process and provides the control objective. In section III, ED dynamics have been introduced, also, the distributed version of ED, referred to as DED is proposed. Section IV analyzes the BTC problem implementation using DED. Representative case studies and concluding remarks are present thereafter.

II Mathematical Prerequisite and Notations

The vector quantities are represented using bold lowercase symbols. The symbolic representation $x_{j}^{i}$ , where $i,j$ belongs to class of whole numbers, represents $j^{th}$ component of vector $\boldsymbol{x}$ at $i^{th}$ instance. The identity matrix is represented as $I$ . The difference notation $|\cdot|$ represents the Euclidean norm.

II-A Graph Theory

In the multi-agent system considered in this paper, modeling of the communication network with the help of graph allows the agents to coordinate their decisions as given in [11]. A graph is described by the triplet $\mathcal{C}=(\mathcal{S},\mathcal{L},\mathcal{A})$ . $\mathcal{S}={\{1,...,k}\}$ represents the set of nodes, $\mathcal{L}\subseteq\mathcal{S}\times\mathcal{S}$ represents the set of edges connecting the nodes and matrix $\mathcal{A}$ represents a $k\times k$ nonnegative matrix whose elements satisfy: $\rho_{ij}=1$ if and only if $(i,j)\in\mathcal{L}$ , and $\rho_{ij}=0$ if and only if $(i,j)\notin\mathcal{L}$ . The nodes and edges of the graph corresponds to agents and communication channels of the multi-agent system respectively. The neighbours of node $i$ i.e. the set of nodes that are able to receive/send information from/to node $i$ is represented by $\mathcal{N}_{i}={\{j\in\mathcal{S}:(i,j)\in\mathcal{L}}\}$ .

The graphical models utilized throughout the text are assumed to constitute several properties, i.e.

Assumption 1

The graph does not contain self loops, $\rho_{ii}=0\text{ }\forall i\in\mathcal{S}$ i.e.

Assumption 2

The graph is undirected. $\rho_{ij}=\rho_{ji}\text{ }\forall i,j\in\mathcal{S}$

Assumption 3

The graph is connected, i.e. for every pair of nodes in $\mathcal{S}$ there exists either direct or indirect path which connects the two.

For the graph $\mathcal{C}$ , the Laplacian matrix $L(\mathcal{C})$ is a $k\times k$ dimensional matrix with individual entities defined as,

[TABLE]

III Building Temperature Control process

A building can be considered to be partitioned mainly into two sets, a set of $k$ rooms $\mathcal{R}$ and $m$ walls $\mathcal{W}$ also referred to as zones Fig. 1. Let $\mathcal{Z}$ defines the composition of sets $\mathcal{R}$ and $\mathcal{W}$ i.e. the total number of zones. Every room has a desired temperature profile $\boldsymbol{t^{set}}$ defined over a time period by its resident referred to as the setpoint temperature profile. The temperature within every room is controlled by a central thermal unit (CTU) [12]. In this case, a ventilation system connected to CTU is spread throughout the building and is also connected to every room. The CTU controls the airflow which affects the room temperature through an actuator mechanism installed therein.

III-A The Temperature Dynamics

The building temperature evaluated over a certain time duration exhibits a dynamic behaviour. At any time instance, the temperature within any zone gets directly affected by temperature of adjacent zones. Also, the ambient temperature of the surrounding affects the overall building temperature [13] by directly interfering with the temperature of zones which are in direct contact with it. The effect is mainly dependent on the value of thermal conductance between the two interacting bodies. Also, the impromptu opening and closing of windows and doors, and resident’s body temperature affects the temperature of the respective zone.

The combined effect of all the factors on the temperature of the zone is represented through a dynamical equation as,

[TABLE]

where, $\alpha_{i,j}$ is the thermal conductance between $i^{th}$ and $j^{th}$ zone. Its value is positive if respective zones are in direct contact with each other, else it is zero. Similarly, $\alpha_{i,a}$ represents the thermal conductance between outside environment and zone (walls) which is in direct contact with it. $\theta_{i}$ represents the thermal capacitance value for the $i^{th}$ zone while $N$ is the total number of zones.

The control parameter $u_{i}$ corresponds to the output of the actuator installed in $i^{th}$ zone. The parameter $v_{i}$ is binary variable, with possible values between $\{0,1\}$ . Positive value of $v_{i}$ ensures that the $i^{th}$ zone that is being evaluated in indeed a room. The thermal disturbance such as impromptu opening and closing of windows and/or effect of body temperature of resident with respect to the $i^{th}$ zone is characterized as $d_{i}$ . The detailed description of BTC process can be found in [12].

III-B The Control Architecture

The CTU generates a fixed $U$ amount of power at every instance which is to be distributed among actuator mechanisms so as to modulate respective rooms’ temperature around desired setpoints.

[TABLE]

However, in this case, because of an actuator output constraints, the individual actuator output $u_{i}$ is bounded within upper and lower bounds.

[TABLE]

Depending on the constraints value (4), the (3) may or may not hold true. To accommodate this uncertainty, a positive semidefinite variable $u_{k+1}$ is introduced, so that,

[TABLE]

In the BTC framework, the variable $u_{k+1}$ corresponds to the unused power at respective time instance.

III-C BTC Problem Definition

The building represents a global system which comprises of various subsystems i.e. different zones. These zones interact with each other according to dynamical model (2). The global objective here is to utilize optimal power to regulate individual room temperature around its setpoint value.

[TABLE]

The global objective function (6) can be decoupled and solved locally at individual subsystem level, i.e.

[TABLE]

Hence, the task is to develop individual actuator implementation strategies to attain the local objective (7), while satisfying the global (5) and local constraints (4), respectively.

IV BTC as a Dynamics Distributed Resource Allocation problem

Unlike general resource allocation problem formulations where objective function $f_{i}$ is real valued function of resource variable $x_{i}$ , the payoff formulation in BTC problem is dependent on the state (7) i.e. temperature values of rooms. These states are driven through a dynamical model (2) which is affected by the resource being allocated (8).

[TABLE]

Here $D^{P}_{i}$ represents the state dynamical model which is driven by instantaneous state values $\boldsymbol{t}$ and control parameter which also corresponds to resource $u_{i}$ .

The system model is interconnected with the controller module $D^{C}_{i}$ which perform the resource allocation task according to a function $u$ , (9).

[TABLE]

Fig. 2 provides the schematic representation of overall system. In DED approach to BTC formulation, functions $g(\cdot),h(\cdot),\text{ and }q(\cdot)$ corresponds to (2), (7), and (20), respectively.

The dynamical system model (8) is considered to be reached equilibrium point when output consensus is attained [5]. When a set of subsystems is considered to be reached output consensus if $lim_{t\rightarrow\inf}|f_{i}-f_{j}|=0$ , for all $i,j=0,1,2,...,k$ , where $f_{i}$ is the output of $i^{t}h$ subsystem i.e. room.

Hence, to obtain the steady state, the controller model (9) must drive the (8) to the output consensus. Also, controller parameter should satisfy system defined constraints such as global resource constraint (5) and local bounds (4).

IV-A Convergence Analysis to the Output Consensus

Let $(\boldsymbol{t^{*}},\boldsymbol{u^{*}})$ be the rest point of the dynamical model Fig. 2. To analyze the system with respect to its convergence to an output consensus, the system dynamics (8), (9) are represented in error coordinates with respect to $(\boldsymbol{t^{*}},\boldsymbol{u^{*}})$ .

[TABLE]

The feedback interconnection Fig. 2 is considered to satisfy the uniqueness assumption. The statement is as follows;

Assumption 4

For a feedback interconnection of plant (8) and controller (9) dynamics represents in error coordinates with respect to $(\boldsymbol{t^{*}},\boldsymbol{u^{*}})$ as (10), (11), if $e_{g}(0,\boldsymbol{e_{u}})=0,$ then $e_{u_{i}}=0,\forall i=1,2,...,k.$

The Assumption 4 ensures the uniqueness of the rest point $(\boldsymbol{t^{*}},\boldsymbol{u^{*}})$ i.e. feedback interconnection dynamics Fig. 2 will only have one convergence point. For the feedback interconnection of multi-agent systems satisfying the Assumption 4, the Theorem 3.2.1 from [5] provides a set of sufficient conditions which ensures the convergence of feedback interconnection Fig. 2 to output consensus. The reformulated theorem is stated as,

Theorem 1

For a feedback interconnection of system dynamics (8), (9) satisfying the Assumption 4, let $(\boldsymbol{t^{*}},\boldsymbol{u^{*}})$ be an equilibrium point, where control law $q(\cdot)$ is of the form (9), if following conditions hold true:

C1.

The communication graph $\mathcal{G}$ for multi-agent system (9) is connected. 2. C2.

The plant dynamics (8) expressed in error coordinates with respect to $(\boldsymbol{t^{*}},\boldsymbol{u^{*}})$ , (10), is strictly passive from the input $e_{u}$ to the output $e_{f}$ with radially unbounded storage function. 3. C3.

$u^{*}$ * and $u(0)$ satisfy the global resource constraint (5).*

Then, (8) reaches output consensus.

The condition C1 ensures that there always exists a path between two distinct agents to facilitate information transfer. The passive nature of plant dynamics is ensured by the fulfillment of condition C2. Moreover, the interconnection of two passive systems results in the convergence to an equilibrium point. In order to utilize this property of passive interconnection not only plant dynamics (8) but also controller dynamics (9) must be passive. According to the proposition provided in [5], the passive nature of controller dynamics is ensured by the fulfillment of the condition C3. The proposition is reformulated as follows,

Proposition 1

Given that the $\boldsymbol{u^{*}}$ satisfies the constant population requirement (5), if $\boldsymbol{u(0)}$ satisfies (5) and graph $\mathcal{G}$ is connected, then multi-agent dynamics in error coordinates (11) is passive and lossless from the input $\boldsymbol{e_{f}}$ to the output $\boldsymbol{-e_{u}}$ .

In this paper, an escort dynamics based distributed algorithm whose interconnection with the BTC dynamics (2) as shown in Fig. 2 satisfies the conditions enlisted in * Theorem 1*, providing convergence to an output consensus.

V Escort Evolutionary Game Dynamics

The evolutionary game theory analyzes the interactions among the large number of players with comparatively less number of strategies. Instead of analyzing individual player dynamics, it evaluates the evolution of the spread of available strategies over the entire population. During this evolution process, every player within the population chooses a certain strategy according to some classical dynamical revision protocols such as, replicator dynamics, logit dynamics, projection dynamics, etc. These models, also referred to as classical dynamical models are of the form.

[TABLE]

These dynamical models drives the population proportions towards best population proportion, such that,

[TABLE]

where, $\boldsymbol{x}=[x_{1},x_{2},...,x_{k}]^{T}$ represents population state vector. $x_{i}$ corresponds to the proportion of population employing $i^{th}$ pure strategy. Similarly, $f_{i}$ corresponds to the payoff received for employing $i^{th}$ strategy. $\boldsymbol{x^{*}}$ and $\boldsymbol{f^{*}}$ represent best possible population proportion vector and payoff vector, respectively.

The normalized population proportion vector $\boldsymbol{x}$ can be considered as a population distribution on a $k-1$ dimensional statistical simplex manifold.

[TABLE]

hence, the evolution towards best possible population distribution corresponds to the motion of initial population state vector $\boldsymbol{x_{0}}$ over $(k-1)$ dimensional manifold of $\Delta_{k}$ , (14), as shown in Fig. 3.

In the perspective of BTC process, the population proportion vector $\boldsymbol{x}$ corresponds to the actuator output profile $\boldsymbol{u}$ . Also, the global resource constraint (5) gets reflected in simplex formulation (14). In evolutionary game theoretical framework, the local bounds on the actuator mechanism (4) are correlated as lower and upper bounds on individual strategy proportion (15).

[TABLE]

Then the task is to determine best possible population distribution corresponding to the best output profile for individual actuator mechanism.

However, the classical dynamical models of the form (12) do not have any provisions to accommodate local individual proportion constraints (15). In [14], the mixture of classical dynamical models is considered to incorporate such proportion bound. In this paper, the evolutionary game theoretical model of ED [8] has been used to address this issue.

V-A Escort Dynamics Features

The ED dictates the evolution of normalised population distribution over $k$ pure strategies under the influence of payoff achieved [8]. The continuous time representation of the ED is,

[TABLE]

where, $\phi(x_{i})$ is the positive semidefinite function of population proportion, also referred to as escort function. Also, $f_{i}(\boldsymbol{x})$ and $f_{\phi}$ represents the payoff obtained for employing $i^{th}$ pure strategy and weighted average payoff, respectively.

The weighted average payoff $f_{\phi}$ is computed as,

[TABLE]

where, $\Phi(\boldsymbol{x})=\sum_{i=1}^{k}{\phi_{i}(x_{i})f_{i}(\boldsymbol{x})}$ . The presence of $f_{\phi}$ within dynamical model (16) renders centralised implementation approach. For positive escort functions, (17) can be considered as the expected value of the payoff function over a probability distribution defined by escort functions as,

[TABLE]

summation of (16) over entire population for all $k$ pure strategies is computed as

[TABLE]

that is initially defined population size, $\sum_{i=1}^{k}{x_{i}}=1$ , remains constant throughout the evolution for $t>0$ . Hence, the definition of escort function is very crucial to constrain population proportion within more restrictive region.

However, the presence of weighted average term $f_{\phi}$ within ED (16) demands that at every revision instance the player must know the payoff received by every other player belonging to population. This center oriented arrangement is quite restrictive in nature. Also, corresponding real life implementation will require more communicational as well as computational infrastructure. To mitigate this central dependency the paper proposed a graph theoretic based distributed version of ED referred to as DED.

V-B Distributed Escort Dynamics formulation

The central dependency of ED (16) model is removed through a graph theoretical based approach. In this case, at every revision instance the $i^{th}$ strategy proportion evolution is governed by the local information obtained from its neighbourhood $\mathcal{N}_{i}$ . The proposed DED model is represented as,

[TABLE]

Here, $\phi(\cdot)$ is the escort function. Instead of depending on the expected average payoff as in (16), the DED dynamics $i^{th}$ proportion (20) is driven by the payoffs obtained by neighbourhood proportion. The DED model can be considered as consensus based algorithm. i.e. the DED dynamics (20) reaches steady state either if respective $\phi(\cdot)$ becomes zero or if the payoff values obtained by respective proportion is same as that of its neighbouring proportions.

The DED dynamics (20) can also be represented as,

[TABLE]

where, $\rho_{i,j}$ governs the connectivity between the two distinct population proportions. Its value represents the weight of the links between two adjacent population proportions. Here, the connectivity graph $\mathcal{G}$ is assumed to be undirected, hence $\rho_{i,j}$ equals $\rho_{j,i}$ . The link weightage $\rho_{i,j}$ is formulated as,

[TABLE]

V-B1 The Positive Invariantness of DED

Similar to (19), the summation of (21) over all strategy proportion is evaluated as,

[TABLE]

As the connectivity graph $\mathcal{G}$ is undirected the expression (23) equates to zero. i.e.

[TABLE]

equation (24) ensures that the population size remains constant throughout the evolution process. i.e. if for the initial distribution $\sum_{i=1}^{k}x_{i}=1$ then it will remain the same for all $t>0$ . This property is referred to as positive invariantness.

The property of positive invariantness limits the possible population distribution vectors within the manifold corresponding to $\sum_{i=1}^{k}x_{i}=1$ . However, to restrict the population dynamics within more constrained bounds (15) the escort formulation is very crucial. In order to accommodate system defined local constraints (4) on actuator output which correlates into bounded nature of individual proportion coefficient $x_{i}$ for all $i=1,2,...,k$ , (15) the escort function formulation is discussed.

V-C Escort function formulation for BTC problem

The geometric representation of individual strategy proportion bounds (15) corresponds to the intersection of two $k-1$ dimensional simplices, $\Delta_{k}^{lo}$ and $\Delta_{k}^{up}$ , where

[TABLE]

Structures $\Delta_{k}^{lo}$ and $\Delta_{k}^{up}$ along with $\Delta_{k}$ are convex in nature which allow them to represent just with the help of their vertices. Let $S^{lo}$ and $S^{up}$ be $k\times k$ dimensional matrices whose column vectors represent vertices of $\Delta_{k}^{lo}$ and $\Delta_{k}^{up}$ , respectively. Construction of $S^{lo}$ and $S^{up}$ is given as,

[TABLE]

where column spaces of matrices $X^{lo}$ and $X^{up}$ are spanned by $\boldsymbol{x^{lo}}$ and $\boldsymbol{x^{up}}$ respectively. $\sigma^{lo}$ and $\sigma^{up}$ are scalars which are multiplied with identity matrix $I$ and added with column vectors of $X^{lo}$ and $X^{up}$ so as to form $S^{lo}$ and $S^{up}$ . This structure of matrices, $S^{lo}$ and $S^{up}$ provides an alternative way to define constraint simplices, $\Delta_{k}^{lo}$ and $\Delta_{k}^{up}$ as,

[TABLE]

Hence it is possible to represent known population state $\boldsymbol{x}$ onto $\Delta_{k}^{lo}$ and $\Delta_{k}^{up}$ in terms of $\boldsymbol{\eta}$ and $\boldsymbol{\xi}$ respectively.

[TABLE]

As $\sum_{i=1}^{k}x_{i}=1$ , (31) and (32) can be interpreted in simplified form as,

[TABLE]

Here, $\boldsymbol{\eta}$ which correlates with lower bounds $\boldsymbol{x^{lo}}$ is increasing in nature while $\boldsymbol{\xi}$ is monotonically decreasing in nature which corresponds to upper bounds $\boldsymbol{x^{up}}$ . It is so because $\sigma^{lo}>0$ and $\sigma^{up}<0$ , which ensures that intersection of $\Delta_{k}^{lo}$ and $\Delta_{k}^{up}$ is not empty, [15].

To incorporate the both lower and upper bounds (15) the final escort function formulation consists multiplication of (33) and (34).

[TABLE]

The escort function formulation (35) ensures that if the initial distribution estimate belongs to the interior of the constraints (15) then the individual strategy proportion will always remain bounded during the evolution process.

VI Escort Dynamical approach for BTC problem

The BTC temperature dynamics (2) with objective function (7) depicts the plant dynamics (10) whereas the DED dynamics (20) represents the controller model $q(\cdot)$ in (9).

The Assumption 4 demands controller dynamics (9) to subdue as system state dynamics (8) attains equilibrium point. If $\boldsymbol{e_{t}}=0$ , i.e. $\boldsymbol{t}-\boldsymbol{t^{*}}=0$ , then according to (7) the respective output vector $\boldsymbol{f}=0$ . This in turn will diminish the controller dynamics given by (20), i.e. $e_{x_{i}}=0$ for all $i=1,2,...,k$ . Hence, BTC dynamics (2) controlled by proposed DED protocol (20) satisfy Assumption 4.

The connectivity Assumption 3 of the system inherent communication topology ensures the fulfillment of C1 in Theorem 1. The dynamics (2) is shown to be satisfying condition C2 in [5]. Moreover, according to the positive invariantness property (23) the DED dynamics (20) satisfies C3. The fulfillment of C3 ensures that the DED model is passive in nature. Hence, the interconnection of two passive dynamics (2) and (20) results in the stable rest point $(t^{*},x^{*})$ which corresponds to the attainment of output consensus.

Hence, according to theorem 1 the BTC temperature dynamics (2) for individual room with respective objective (7), driven by consensus like DED dynamic (20) will reach equilibrium point obtaining output consensus i.e. welfare scheme ensuring equal payoff to every room.

The BTC process is a dynamic in nature where at every instance the temperature values gets affected by thermal disturbance $\boldsymbol{d}$ and surrounding temperature $\boldsymbol{t^{a}}$ according to (2). This temperature is then regulated around setpoint temperature $\boldsymbol{t^{set}}$ using actuator mechanism which is driven by DED dynamics (20).The algorithmic implementation of BTC problem using DED dynamics is provided in Algorithm 1. **1. BTC using DED approach

**

1: Initialize: $\boldsymbol{t^{set}},\boldsymbol{t^{a}},\boldsymbol{t^{1}},\boldsymbol{W_{z}},\boldsymbol{W_{a}},\boldsymbol{X},\boldsymbol{x^{1}},$ $\boldsymbol{x^{lo}},\boldsymbol{x^{up}},x^{total}$

2: Define: $\sigma^{lo}=x^{total}-\sum_{j=1}^{k}{x^{lo}_{j}}$ $\sigma^{up}=x^{total}-\sum_{j=1}^{k}{x^{up}_{j}}$

3: Compute: $\eta_{j}=\frac{1}{\sigma^{lo}}(x_{j}-x^{lo}_{j}),$ $\xi_{j}=\frac{1}{\sigma^{up}}(x_{j}-x^{up}_{j}),$ $\phi(x_{j})=\eta_{j}\times\xi_{j}$ for all $j=1,2,...,k$

4: Compute: $\boldsymbol{f^{i}}=[f_{1}^{i},f_{2}^{i},...,f_{k}^{i}]^{T}$ where, $f_{j}^{i}=t_{j}^{i}-t_{j}^{set_{i}}$

5: Evaluate: $\boldsymbol{\dot{x}^{i}}=[\dot{x}_{1}^{i},\dot{x}_{2}^{i},...,\dot{x}_{k}^{i}]$ where, $\dot{x}_{j}^{i}$ is evaluated according to (20)

6: Compute: $\boldsymbol{x^{i+1}}=[x_{1}^{i+1},x_{2}^{i+1},...,x_{k}^{i+1}]$ where, $x_{j}^{i+1}=x_{j}^{i}+\dot{x}_{j}^{i}$

7: Evaluate: $\boldsymbol{\dot{t}^{i}}=[\dot{t}_{1}^{i},\dot{t}_{2}^{i},...,\dot{t}_{N}^{i}]$ where, $\dot{t}_{j}^{i}$ is evaluated according to (2)

8: Compute: $\boldsymbol{t^{i+1}}=[t_{1}^{i+1},t_{2}^{i+1},...,t_{N}^{i+1}]$ where, $t_{j}^{i+1}=t_{j}^{i}+\dot{t}_{j}^{i}$

9: Assign: $i\leftarrow{i+1}$

VII REPRESENTATIVE CASE STUDY AND RESULTS

The BTC issue formulated in Section II has been addressed using the DED approach. For comparison purpose the same problem has been implemented using consensus-based resource allocation protocol of DIP, refer Appendix-A for its mathematical formulation.

VII-A Operational Scenario

For implementation purpose building with $50$ rooms surrounded by the comparatively cold environment is analyzed over a period of an entire day, i.e. $k=50$ . The ambient temperature of the surrounding is varying as shown in Fig. 4(a). Fig. 4(b) depicts the desired temperature trajectories for the room temperatures throughout the day, where $50$ rooms are sectioned into $3$ groups viz., $1-17,18-34,35-50$ . It is possible to define a separate temperature profile for each individual room however to neatly display the performance the rooms are divided into $3$ groups. Initially, every room is at $13^{0}C$ temperature whereas initial actuator output is considered to be $0.5\text{ kWh}$ . Every actuator output is locally constrained within the range of $0-3.25\text{ kWh}$ while global constrained is obtained by restricting the cumulative output of all actuators $130\text{ kWh}$ at every given instance.

VII-B Observations

Observations include the comparison between desired trajectories and actually obtained trajectories of 50 rooms sectioned in 3 groups. The colour notations used to represent these trajectories is shown in Fig. 5.

Results obtained for implementation of building temperature control using DED are represented in Fig. 6(a), 7(a), and 8(a). In the initial operational phase fast increase in actuator outputs is observed which results in on average $2^{0}C$ temperature increment above desired set points. Corresponding oscillations can also be observed in payoff values when payoff values converge to zero, temperature profiles at respective interval attain desired temperature values, Fig. 8(a). Within one damped oscillatory cycle temperature deviations are reduced and desired trajectories are traced thereafter Fig. 6(a). After initial perturbations, actuator profiles display smooth variations, Fig. 7(a).

Fig. 6(b), 7(b), and 8(b) represent observations regarding implementation of the DIP method for the same problem. Like DED, DIP displayed oscillatory behaviour in the initial phase, also referred to as startup transience, but it persisted for more than one oscillation cycles, Fig. 7(b). Similar fluctuations are reflected in initial temperature profiles, as temperature varies back and forth of the desired value, Fig. 6(b). Peak temperature, as well as actuator output overshoots in case of DIP, are higher than that of DED, which results in larger temperature deviations shown in Fig. 8(b).

Also, temperature trajectories utilizing DED approach are closer to desired temperature trajectories than that of trajectories implementing DIP. When ambient temperature of the surrounding is above the average value the corresponding actuator outputs are low, Fig. 7. It has been observed that the initial peak overshoot is directly proportional to the initial difference between the desired and actual temperature.

VII-C Analysis

In an ideal implementation algorithm, real-time trajectories should convergence fast to desired temperature trajectories without any overshoots. DED, DIP being a step size dependent first-order dynamical protocols show some initial oscillatory behaviour.

VII-C1 Presence of startup transience

The oscillatory response for DED approach is less as compared to DIP approach, Fig. 9 and 10. In DIP approach, control parameters i.e. actuator outputs are constrained by means of a barrier function, (38), which actively participates in manipulating actuator parameter, keeping it in the centre of the predefined range. When actuator value reaches toward one of the boundaries the payoff value changes drastically which in turn force it towards the centre. This operation introduces oscillatory behaviour over a couple of cycles, Fig. 9(b) and 10(b). On the other hand, the inclusion of constraint directly in the dynamical equation (16) using escort function guarantees a comparatively smooth transition from initial conditions to the desired ones, without violating the constraints, Fig. 10(a) and 10(a). This is because payoff values are not directly affected by changes in actuator output.

VII-C2 Desired trajectory tracking

In DED approach, deviations in temperature value is defined as a payoff. As operations continue every sub-system (room) reach on a common consensus value of payoff function. The negative difference between actual and required temperature value increases the actuator output value and vice versa. Hence algorithm implementing DED tracks desired trajectory closely, Fig. 6(a). While in DIP approach along with temperature deviations cost function also includes barrier function. Hence when DIP reaches consensus, the consensus value gets biased by the barrier function value, which introduces a marginal error between actual temperature trajectories and desired trajectories, Fig. 6(b).

Lower actuator output value in comparatively warm surrounding environment, Fig. 7 corresponds to efficient energy utilization.

VIII Conclusion

The issue of BTC in the resource allocation framework has been analyzed using evolutionary game theory. The distributed version of evolutionary dynamics of ED has been utilized to address resource allocation problem within system-defined constraints. The local bounds on the individual actuator bounds are incorporated within the DED framework through the concept of the intersection of simplices. The consensus like DED approach is shown to attain output consensus for dynamic resource allocation problem with global resource constraint. The performance analysis on the metric of smooth trajectory tracking and low startup transience of BTC mechanism implementation through DED approach and DIP approach is carried out. Unlike DED where local bounds on individual actuator output are accommodated through barrier formulation, the DED approach restricts the dynamics within the dynamics through escort function formulation. This escort function formulation is shown to have better performance as compared to DIP approach in terms of mechanism longevity through low startup transience than that of DIP.

This research work can be extended to dynamic resource allocation problems where the constraint set is time-varying. This can be incorporated through varying domains of the intersection of simplices.

Appendix

VIII-A Distributed Interior Point method

Considering the BTC problem described in Section II, DIP protocol used to obtain next control vector values, is given as,

[TABLE]

where, $(1\times k)$ dimensional payoff vector is constructed as

[TABLE]

here, $b_{i}(\boldsymbol{v}_{i})$ is defined as the derivative of barrier function $\mathcal{B}(\boldsymbol{v})$ with respect to $v_{i}$ is added with original objective of the $i^{th}$ sub-system to form payoff function for respective sub-system.

Barrier function restricts the control vector values within a predefined constraints. Logarithmic barrier function is one of the barrier function, [5], given as

[TABLE]

Bibliography15

The reference list from the paper itself. Each links out to its DOI / PubMed record.

1[1] P. Nejat, F. Jomehzadeh, M. M. Taheri, M. Gohari, and M. Z. A. Majid, “A global review of energy consumption, co 2 emissions and policy in the residential sector (with an overview of the top ten co 2 emitting countries),” Renewable and sustainable energy reviews , vol. 43, pp. 843–862, 2015.
2[2] S. Lechtenböhmer and A. Schüring, “The potential for large-scale savings from insulating residential buildings in the eu,” Energy efficiency , vol. 4, no. 2, pp. 257–270, 2011.
3[3] L. Yang, H. Yan, and J. C. Lam, “Thermal comfort and building energy consumption implications–a review,” Applied energy , vol. 115, pp. 164–173, 2014.
4[4] L. Pérez-Lombard, J. Ortiz, and C. Pout, “A review on buildings energy consumption information,” Energy and buildings , vol. 40, no. 3, pp. 394–398, 2008.
5[5] G. D. Obando Bravo, “Distributed methods for resource allocation: a passivity based approach,” Ph.D. dissertation, Nantes, Ecole des Mines, 2015.
6[6] R. A. Murphey, “An approximate algorithm for a weapon target assignment stochastic program,” in Approximation and Complexity in Numerical Optimization . Springer, 2000, pp. 406–421.
7[7] Y. Chevaleyre, P. E. Dunne, U. Endriss, J. Lang, M. Lemaitre, N. Maudet, J. Padget, S. Phelps, J. A. Rodriguez-Aguilar, and P. Sousa, “Issues in multiagent resource allocation,” Informatica , vol. 30, no. 1, 2006.
8[8] M. Harper, “Escort evolutionary game theory,” Physica D: Nonlinear Phenomena , vol. 240, no. 18, pp. 1411–1415, 2011.