A Robust Utility Learning Framework via Inverse Optimization

Ioannis C. Konstantakopoulos; Lillian J. Ratliff; Ming Jin and; S. Shankar Sastry; Costas Spanos

arXiv:1704.07933·cs.GT·April 27, 2017

A Robust Utility Learning Framework via Inverse Optimization

Ioannis C. Konstantakopoulos, Lillian J. Ratliff, Ming Jin and, S. Shankar Sastry, Costas Spanos

PDF

TL;DR

This paper introduces a robust utility learning framework using inverse optimization, incorporating heteroskedastic inference and ensemble methods to model heterogeneous user preferences in smart infrastructure applications.

Contribution

It presents a novel correlated utility learning approach that estimates noise covariance and leverages ensemble techniques for improved forecasting in heterogeneous user settings.

Findings

01

Effective utility estimation in a toy Bertrand-Nash game

02

Successful application to social energy efficiency experiments

03

Enhanced forecasting accuracy with ensemble methods

Abstract

In many smart infrastructure applications flexibility in achieving sustainability goals can be gained by engaging end-users. However, these users often have heterogeneous preferences that are unknown to the decision-maker tasked with improving operational efficiency. Modeling user interaction as a continuous game between non-cooperative players, we propose a robust parametric utility learning framework that employs constrained feasible generalized least squares estimation with heteroskedastic inference. To improve forecasting performance, we extend the robust utility learning scheme by employing bootstrapping with bagging, bumping, and gradient boosting ensemble methods. Moreover, we estimate the noise covariance which provides approximated correlations between players which we leverage to develop a novel correlated utility learning framework. We apply the proposed methods both to a toy…

Tables7

Table 1. TABLE I: Mean Square Error (MSE) of forecasting using the proposed robust utility learning methods vs cOLS estimators for Bertrand-Nash competition. The best performing method is indicated in bold text for each of the firms.

Firm 1	bagging	boosting	bumping	cOLS
MSE	0.05	0.51	0.65	1.62
Firm 2	bagging	boosting	bumping	cOLS
MSE	1.58	0.71	0.89	2.54

Table 2. TABLE II: Root Mean Square Error (RMSE), Mean Absolute Error (MAE) and Mean Absolute Scaled Error (MASE) [ 41 ] of forecasting using the proposed robust utility learning methods vs cOLS estimators for both data sets in default lighting setting 20 20 20 . The best performing method is indicated in bold text for each of the data sets, dynamic and average.

Dynamic, ${\hat{f}}_{i}$	bagging	boosting	bumping	cOLS
RMSE	8.31	10.11	12.56	22.53
MAE	5.20	6.55	6.38	18.35
MASE	2.08	6.38	2.55	7.34
Averaged, ${\hat{f}}_{i}$	bagging	boosting	bumping	cOLS
RMSE	2.05	1.68	1.96	9.36
MAE	1.58	1.31	1.48	6.01
MASE	0.71	0.59	0.67	2.69

Table 3. TABLE III: The cFGLS estimator value and the bagging, gradient boosting and bumping ensemble methods bias approximation for the most active users. We utilized the dynamic data set from the period in which the default lighting setting was set to 20 20 20 . In bold, we denote the occupants with nearly unbiased estimators.

Id	cFGLS	Bagging Bias	Boosting Bias	Bumping Bias
2	-0.7	0.11	0.17	0.02
6	0.5	1.12	1.77	0.93
8	298.1	-176.9	-370.3	120.5
14	337.5	-186.3	-400.2	149.7
20	-0.8	0.07	0.21	-0.53

Table 4. TABLE IV: Estimated covariance matrix for the most active players using the ( a ) dynamic data set and ( b ) average data set. The colored column-row pairs indicate the agents whose utilities we modify to create the correlated game; the column indicates the agent(s) whose estimated parameter is used to modify the row agent’s utility. In particular, agent 2 2 2 ’s utility function is modified by by agent 20 20 20 ’s estimated parameter ( red ), agent 8 8 8 ’s utility function is modified by agent 14 14 14 ’s estimated parameter ( green ), and agent 14 14 14 ’s utility function is modified by agent 2 2 2 ’s and agent 8 8 8 ’s estimated parameter ( blue ). Note that agents 2 2 2 and 14 14 14 are anti-correlated, where agents 8 8 8 and 14 14 14 (resp. agents 2 2 2 and 20 20 20 ) are positively correlated. Agents 2 2 2 and 20 20 20 are passive players, voting more for comfort than winning, where agents 8 8 8 and 14 14 14 vote more aggressively.

Id

2

6

8

14

20

Table 5. (a)

Id

2

6

8

14

20

Table 6. (b)

Id

2

6

8

14

20

Table 7. TABLE V: Root Mean Square Error (RMSE), Mean Absolute Error (MAE) and Mean Absolute Scaled Error (MASE) of forecasting using the estimated correlated utility functions. We estimated correlated utility functions g ^ i ( ⋅ ; { θ j } j ∈ 𝒦 i ) subscript ^ 𝑔 𝑖 ⋅ subscript subscript 𝜃 𝑗 𝑗 subscript 𝒦 𝑖 \hat{g}_{i}(\cdot;\{\theta_{j}\}_{j\in\mathcal{K}_{i}}) using parameters from the bagging, bumping, boosting, and cOLS methods for both data sets in default lighting setting 20 20 20 .

Dynamic, ${\hat{g}}_{i}$	bagging	boosting	bumping	cOLS
RMSE	6.38	9.58	8.82	8.44
MAE	4.59	6.81	5.52	5.58
MASE	1.84	2.72	2.21	2.23
Averaged, ${\hat{g}}_{i}$	bagging	boosting	bumping	cOLS
RMSE	2.18	1.63	2.36	2.83
MAE	1.75	1.27	1.92	2.30
MASE	0.78	0.56	0.86	1.03

Equations88

f_{i} (x_{i}, x_{- i}) = f_{i}^{nom} (x_{i}, x_{- i}) + f_{i}^{inc} (x_{i}, x_{- i}) .

f_{i} (x_{i}, x_{- i}) = f_{i}^{nom} (x_{i}, x_{- i}) + f_{i}^{inc} (x_{i}, x_{- i}) .

max {f_{i} (x_{i}, x_{- i}) ∣ x_{i} \in C_{i}} .

max {f_{i} (x_{i}, x_{- i}) ∣ x_{i} \in C_{i}} .

f_{i} (x_{i}, x_{- i}) \geq f_{i} (x_{i}^{'}, x_{- i}) \forall x_{i}^{'} \in C_{i} .

f_{i} (x_{i}, x_{- i}) \geq f_{i} (x_{i}^{'}, x_{- i}) \forall x_{i}^{'} \in C_{i} .

f_{i} (x_{i}, x_{- i}) + ε \geq f_{i} (x_{i}^{'}, x_{- i}) \forall x_{i}^{'} \in C_{i} .

f_{i} (x_{i}, x_{- i}) + ε \geq f_{i} (x_{i}^{'}, x_{- i}) \forall x_{i}^{'} \in C_{i} .

L_{i} (x_{i}, x_{- i}, μ_{i}) = f_{i} (x_{i}, x_{- i}) + \sum_{j \in A_{i} (x_{i})} μ_{i, j} h_{i, j} (x_{i})

L_{i} (x_{i}, x_{- i}, μ_{i}) = f_{i} (x_{i}, x_{- i}) + \sum_{j \in A_{i} (x_{i})} μ_{i, j} h_{i, j} (x_{i})

ω (x, μ) = [D_{1} L_{1} (x, μ_{1})^{⊤} \dots D_{p} L_{p} (x, μ_{p})^{⊤}]^{⊤}

ω (x, μ) = [D_{1} L_{1} (x, μ_{1})^{⊤} \dots D_{p} L_{p} (x, μ_{p})^{⊤}]^{⊤}

f_{i} (x; θ_{i}) = ⟨ ϕ_{i} (x_{i}, x_{- i}), θ_{i} ⟩ + \overset{ˉ}{f}_{i} (x)

f_{i} (x; θ_{i}) = ⟨ ϕ_{i} (x_{i}, x_{- i}), θ_{i} ⟩ + \overset{ˉ}{f}_{i} (x)

x^{(k)} = (x_{j}^{(k)})_{j \in S^{k}} .

x^{(k)} = (x_{j}^{(k)})_{j \in S^{k}} .

r_{s, i}^{(k)} (θ_{i}, μ_{i})

r_{s, i}^{(k)} (θ_{i}, μ_{i})

r_{c, i}^{j, (k)} (μ)

r_{c, i}^{j, (k)} (μ)

r_{c, i}^{(k)} (μ_{i}) = [r_{c, i}^{1, (k)} (μ_{i}) \dots r_{c, i}^{ℓ_{i}, (k)} (μ_{i})] .

r_{c, i}^{(k)} (μ_{i}) = [r_{c, i}^{1, (k)} (μ_{i}) \dots r_{c, i}^{ℓ_{i}, (k)} (μ_{i})] .

μ, θ min \sum_{i = 1}^{p} \sum_{k = 1}^{n_{i}} χ_{i} (r_{s, i}^{(k)} (θ, μ), r_{c, i}^{(k)} (μ))

μ, θ min \sum_{i = 1}^{p} \sum_{k = 1}^{n_{i}} χ_{i} (r_{s, i}^{(k)} (θ, μ), r_{c, i}^{(k)} (μ))

s.t. θ_{i} \in Θ_{i}, μ_{i} \geq 0 \forall i \in {1, \dots, p}

X_{i}^{(k)} = [D_{i} h_{i} (x_{i}^{(k)}) \hat{h}_{i} (x_{i}^{(k)}) D_{i} ϕ_{i} (x^{(k)})) 0_{ℓ_{i} \times m_{i}}],

X_{i}^{(k)} = [D_{i} h_{i} (x_{i}^{(k)}) \hat{h}_{i} (x_{i}^{(k)}) D_{i} ϕ_{i} (x^{(k)})) 0_{ℓ_{i} \times m_{i}}],

\hat{h}_{i} (x_{i}) = diag (h_{i, 1} (x_{i}), \dots, h_{i, ℓ_{i}} (x_{i})),

\hat{h}_{i} (x_{i}) = diag (h_{i, 1} (x_{i}), \dots, h_{i, ℓ_{i}} (x_{i})),

D_{i} h_{i} (x_{i}) = [D_{i} h_{i, 1} (x_{i}) \dots D_{i} h_{i, ℓ_{i}} (x_{i})],

D_{i} h_{i} (x_{i}) = [D_{i} h_{i, 1} (x_{i}) \dots D_{i} h_{i, ℓ_{i}} (x_{i})],

β = [μ_{1}^{1} \dots μ_{1}^{ℓ_{1}} θ_{1} \dots μ_{p}^{1} \dots μ_{p}^{ℓ_{p}} θ_{p}]^{⊤} \in R^{(ℓ_{i} + 1) p}

β = [μ_{1}^{1} \dots μ_{1}^{ℓ_{1}} θ_{1} \dots μ_{p}^{1} \dots μ_{p}^{ℓ_{p}} θ_{p}]^{⊤} \in R^{(ℓ_{i} + 1) p}

Y_{i} = [\overset{ˉ}{f}_{i} (x^{(1)}) 0_{ℓ_{i}} \dots \overset{ˉ}{f}_{i} (x^{(n_{i})}) 0_{ℓ_{i}}]^{⊤} .

Y_{i} = [\overset{ˉ}{f}_{i} (x^{(1)}) 0_{ℓ_{i}} \dots \overset{ˉ}{f}_{i} (x^{(n_{i})}) 0_{ℓ_{i}}]^{⊤} .

\min\limits_{\beta}\left\{\|Y-X\beta\|_{2}\big{|}\ \beta\in\mathcal{B}\right\}

\min\limits_{\beta}\left\{\|Y-X\beta\|_{2}\big{|}\ \beta\in\mathcal{B}\right\}

Y = X β + ϵ, β \in B

Y = X β + ϵ, β \in B

cov (ϵ ∣ X) = G ≻ 0, G \in R^{n_{d} \times n_{d}} .

cov (ϵ ∣ X) = G ≻ 0, G \in R^{n_{d} \times n_{d}} .

(G^{- \frac{1}{2}} Y) = (G^{- \frac{1}{2}} X) β + (G^{- \frac{1}{2}} ϵ), β \in B

(G^{- \frac{1}{2}} Y) = (G^{- \frac{1}{2}} X) β + (G^{- \frac{1}{2}} ϵ), β \in B

\hat{B}_{i, k} = [(\hat{B}_{i, k})_{l, j})]_{l, j = 1}^{ℓ_{i} + 1}

\hat{B}_{i, k} = [(\hat{B}_{i, k})_{l, j})]_{l, j = 1}^{ℓ_{i} + 1}

\hat{G} = diag (\frac{e _{1}^{2}}{( 1 - b _{1} ) ^{δ_{1}}}, \frac{e _{2}^{2}}{( 1 - b _{2} ) ^{δ_{2}}}, \dots, \frac{e _{n_{d}}^{2}}{( 1 - b _{n_{d}} ) ^{δ_{n_{d}}}})

\hat{G} = diag (\frac{e _{1}^{2}}{( 1 - b _{1} ) ^{δ_{1}}}, \frac{e _{2}^{2}}{( 1 - b _{2} ) ^{δ_{2}}}, \dots, \frac{e _{n_{d}}^{2}}{( 1 - b _{n_{d}} ) ^{δ_{n_{d}}}})

\tilde{Y} = X \hat{β}_{cFGLS} + Φ (e) ε,

\tilde{Y} = X \hat{β}_{cFGLS} + Φ (e) ε,

\hat{β}_{bag} = \frac{1}{N} \sum_{j = 1}^{N} \hat{β}_{cFGLS, j}

\hat{β}_{bag} = \frac{1}{N} \sum_{j = 1}^{N} \hat{β}_{cFGLS, j}

\hat{C}_{β} = \frac{1}{N} \sum_{j = 1}^{N} (\hat{β}_{cFGLS, j} - \hat{β}_{bag}) (\hat{β}_{cFGLS, j} - \hat{β}_{bag})^{⊤} .

\hat{C}_{β} = \frac{1}{N} \sum_{j = 1}^{N} (\hat{β}_{cFGLS, j} - \hat{β}_{bag}) (\hat{β}_{cFGLS, j} - \hat{β}_{bag})^{⊤} .

\hat{β}_{bump} = ar g \hat{β}_{cFGLS, j} min ∥ \tilde{Y} - X \hat{β}_{cFGLS, j} ∥_{2}^{2}

\hat{β}_{bump} = ar g \hat{β}_{cFGLS, j} min ∥ \tilde{Y} - X \hat{β}_{cFGLS, j} ∥_{2}^{2}

D_{i} (p_{1}, p_{2}, ξ) = θ_{i, 1} + θ_{i, 2} p_{1} + θ_{i, 3} p_{2} + ν ξ

D_{i} (p_{1}, p_{2}, ξ) = θ_{i, 1} + θ_{i, 2} p_{1} + θ_{i, 3} p_{2} + ν ξ

\hat{f}_{i} (x_{i}, x_{- i}; \hat{θ}_{i}) = \overset{ˉ}{f}_{i} (x_{i}, x_{- i}) + ⟨ ϕ_{i} (x_{i}, x_{- i}), \hat{θ}_{i} ⟩

\hat{f}_{i} (x_{i}, x_{- i}; \hat{θ}_{i}) = \overset{ˉ}{f}_{i} (x_{i}, x_{- i}) + ⟨ ϕ_{i} (x_{i}, x_{- i}), \hat{θ}_{i} ⟩

\overset{g}{^}_{i} (x_{i}, x_{- i})

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Full text

A Robust Utility Learning Framework via Inverse Optimization

Ioannis C. Konstantakopoulos*, Lillian J. Ratliff*, Ming Jin, S. Shankar Sastry, Costas J. Spanos *Authors contributed equallyI. Konstantakopoulos, M. Jin, S. Sastry, and C. Spanos are with the Electrical Engineering and Computer Sciences Department, University of California, Berkeley, Berkeley, CA 94720. email: $\{$ ioanniskon, jinming, spanos, sastry $\}$ @eecs.berkeley.eduL. Ratliff is with the Electrical Engineering Department, University of Washington, Seattle, WA 98195. email: [email protected]

Abstract

In many smart infrastructure applications, flexibility in achieving sustainability goals can be gained by engaging end-users. However, these users often have heterogeneous preferences that are unknown to the decision-maker tasked with improving operational efficiency. Modeling user interaction as a continuous game between non–cooperative players, we propose a robust parametric utility learning framework that employs constrained feasible generalized least squares estimation with heteroskedastic inference. To improve forecasting performance, we extend the robust utility learning scheme by employing bootstrapping with bagging, bumping, and gradient boosting ensemble methods. Moreover, we estimate the noise covariance which provides approximated correlations between players which we leverage to develop a novel correlated utility learning framework. We apply the proposed methods both to a toy example arising from Bertrand-Nash competition between two firms as well as to data from a social game experiment designed to encourage energy efficient behavior amongst smart building occupants. Using occupant voting data for shared resources such as lighting, we simulate the game defined by the estimated utility functions to demonstrate the performance of the proposed methods.

Index Terms:

Game Theory, Inverse Optimization, Smart Building Energy Efficiency

I Introduction

Due to pervasive utilization of Internet of Things and Cyber-Physical Systems sensing/actuating platforms, we are increasingly observing human decision-makers being integrated into operational and managerial decisions in infrastructure systems. Their actions can be leveraged to increase both resilience and sustainability thereby making smart infrastructure a worthwhile investment. Smart buildings, being no exception, are a fundamental component of emerging smart cities; their efficient design and operation enables flexibility—e.g., by automatically shifting or curtailing demand during peak hours—in making urban spaces sustainable. More abstractly, in many infrastructure systems there is often an entity acting as a planner (e.g., facility managers, departments of transportation, etc.) that introduces incentives or control policies to coordinate autonomously acting agents in the system (e.g., selfish human decision-makers) so that their collective behavior leads to system-level efficiency gains.

One approach to designing such policies is to leverage game-theoretic models of decision-making in an optimization framework to produce policies that encourage or induce behavior that optimizes an objective [1, 2]. Often the planner has at best a prior on the decision-making model of the individual agents. Such information asymmetries lead to inefficiencies [1, 3]. In this paper, we propose a framework for estimating decision-making models of self-interested decision-makers consuming a shared resource (e.g., lighting in a smart building) that can be leveraged in control or incentive design to aid in closing the efficiency gap.

To concretize ideas, consider a smart building—an example we will return to throughout the text. A facilities manager may be incentivized or even tasked to encourage energy efficient behavior if they are accountable for energy costs or are required, e.g., to maintain an operational excellence measure (see, e.g., [4, 5]). At the same time, the facilities manager generally must also ensure user comfort and productivity [6]. Beyond these motivations, demand response (DR) programs are being rolled out by utility companies and third-party solution providers with the goal of correcting for improper load forecasting. Participating consumers decide to change their consumption when DR events are called [7]. The facilities manager may be required to keep this schedule.

Smart building technologies enable new avenues for facilities managers to keep such a prescribed schedule via automation or integrating the end-user. Yet, in office buildings the occupants, as employees, typically are not responsible for paying for the energy resources they consume. Hence, there is often a misalignment between the incentives of the facilities manager and the occupants. Social games are a means to engage the occupants to address these inefficiencies. In Section VI, we describe one such social game that we designed and implemented on the UC Berkeley campus, aimed at incentivizing energy efficient consumption of shared resources by leveraging building automation.

The broader purpose of this paper is to present a general framework that leverages game-theoretic concepts to learn models of players’ decision-making in competitive environments such as the building energy social game described above. The framework supports learning agents’ preferences over shared resources as well as understanding how preferences change as a function of external stimuli such as physical control or incentives. Such a framework can be used in the design of incentive mechanisms that realign agents’ preferences with those of the planner—which often represent system-level performance criteria—through fair compensation.

More concretely, we model decision-making agents as utility maximizers and, using inverse optimization and game-theoretic techniques, we derive a robust scheme to infer their utility functions. At the core of our approach is the fact that we model the agents as non-cooperative players in a game playing according to a Nash equilibrium strategy. From this point of view, agents are strategic entities that make decisions based on their own preferences despite others. The game-theoretic framework both allows for qualitative insights to be made about the outcome of such selfish behavior—more so than a simple prescriptive model—and, more importantly, can be leveraged in designing mechanisms for incentivizing agents.

We assume a parametric form of utility function for each player that is dependent on the decisions of others. Correlations between players’ decisions are not known a priori. Assuming observations are approximately Nash equilibria, we use first– and second–order conditions on player utility functions to construct a constrained regression model. The result is as a constrained Generalized Least Squares (cGLS) problem with non-spherical noise error terms. Using constrained Feasible Generalized Least Squares (cFGLS), an implementable version of cGLS, we utilize heteroskedastic inference to approximate the correlated errors.

Noting that data sets of observed decisions often may be small relative to the number of model parameters in practice, we employ bootstrapping to generate pseudo-data from which we learn additional estimators. The bootstrapping process allows us to derive an asymptotic approximation of the bias and standard error of an estimator. We utilize ensemble methods such as bagging, bumping, and gradient boosting to extract an estimator from the pseudo-data generated estimators that results in a reduced forecasting error. The ensemble methods are robust under noise and autocorrelated error terms. We apply the robust utility learning framework to a model of Bertrand-Nash competition between firms in order to illustrate the framework and its performance.

Building on the robust utility learning framework, we use the approximated standard error to derive an innovative utility learning method in which we modify players’ utility functions to create a correlated game. The resulting correlated utility learning method leverages correlations between players and the ensemble estimators to minimize the estimation error by optimizing scaling coefficients that appear in the correlated game utility functions. Applying this method results in a significant improvement over the constrained Ordinary Least Squares (cOLS) estimations and outperforms many of the ensemble methods. It also provides insights into how players interact with one another and indicates which players are potentially forming coalitions. Moreover, this technique is amenable to online implementation after an initial training period so that by using cOLS estimators in the correlated utility learning framework, our adaptive incentive design schemes, introduced in [8, 3], can be made robust.

To demonstrate the efficacy of both the robust and correlated utility learning frameworks, we apply them to data generated from the smart building social game experiment we conducted. We show that estimating the players’ utility functions via the proposed methods results in a predictive model that outperforms several other standard techniques such as Ordinary Least Squares (OLS).

The rest of the paper is organized as follows. We describe the abstracted game framework for modeling the interaction of agents as well as define equilibrium concepts in Section II. In Section III, we formulate the robust utility learning framework and provide an algorithm for implementing it. Section IV contains the Bertrand-Nash competition example and we present the correlated utility learning framework in Section V. In Section VI, we describe the social game experimental setup on the UC Berkeley campus within the CREST center111http://crest.berkeley.edu/, provide a brief literature review, and present the results of both proposed utility learning methods applied to data from the social game. We make concluding remarks and discuss future directions in Section VIII.

II Game Framework

In this section, we abstract the agents’ decision-making processes in a game–theoretic framework.

II-A Agent Decision-Making Model

Consider $p$ agents222We refer to the decision-makers as agents and use the term interchangeably with players.—i.e. decision-making entities—indexed by the set $\mathcal{I}=\{1,\ldots,p\}$ . Each agent is modeled as a utility maximizer that seeks to select $x_{i}\in\mathbb{R}$ by optimizing

[TABLE]

where $f_{i}^{\text{nom}}(x_{i},x_{-i})$ and $f_{i}^{\text{inc}}(x_{i},x_{-i})$ are the nominal and incentive components, respectively, of agent $i$ ’s utility function and where $x_{-i}=(x_{1},\ldots,x_{i-1},x_{i+1},\ldots,x_{n})\in\mathbb{R}^{n-1}$ is the collective choices of all agents excluding the $i$ –th agent333Note that while for notational simplicity we assume that $x_{i}\in\mathbb{R}$ , the work easily extends to a higher dimensional choice vector for each agent..

The choice $x_{i}$ abstracts the agent’s decision; it could represent, e.g., how much of a particular resource they choose to consume. The nominal component of $f_{i}$ captures the agent’s individual preferences over $x_{i}$ and may depend on the decisions of others $x_{-i}$ . The incentive component models the portion of the agent’s utility that can be designed by the planner; it also may depend on the decisions of other agents.

Agent $i$ ’s optimization problem is also subject to constraints; the constraint set is given by $\mathcal{C}_{i}=\{x_{i}|\ h_{i,j}(x_{i})\geq 0,j=1,\ldots,\ell_{i}\}$ where each $h_{i,j}$ is assumed to be a concave function of $x_{i}$ . Such constraints may encode cyber or physical constraints arising from the underlying system—in the social game example presented in Section VI-C, we will see that these constraints are physical bounds. Thus, given $x_{-i}$ , agent $i$ faces the following optimization problem:

[TABLE]

II-B Game Formulation

The game $(f_{1},\ldots,f_{p})$ is a continuous game on a convex strategy space $\mathcal{C}=\mathcal{C}_{1}\times\cdots\times\mathcal{C}_{p}$ . To model the outcome of the strategic interactions of agents, we use the Nash equilibrium concept.

Definition 1.

A point $x\in\mathcal{C}$ is a Nash equilibrium for the game $(f_{1},\ldots,f_{p})$ on $\mathcal{C}$ if, for each $i\in\mathcal{I}$ ,

[TABLE]

We say $x\in\mathcal{C}$ is an $\varepsilon$ –Nash equilibrium for $\varepsilon>0$ if the above inequality is relaxed:

[TABLE]

We say a point is a local Nash equilibrium (respectively, a $\varepsilon$ –local Nash equilibrium) if there exists $W_{i}\subset\mathcal{C}_{i}$ such that $x_{i}\in W_{i}$ and the above inequalities hold for all $x_{i}^{\prime}\in W_{i}$ .

If each $f_{i}$ is concave in $x_{i}$ and $\mathcal{C}$ is convex, then the game is a $p$ –person concave game. In the seminal work by Rosen [9], it was shown that a (pure) Nash equilibrium exists for every concave game.

The Lagrangian of agent $i$ ’s optimization problem is given by

[TABLE]

where $\mathcal{A}_{i}(x_{i})$ is the active constraint set at $x_{i}$ and $\mu=(\mu_{1},\ldots,\mu_{p})$ with $\mu_{i}=(\mu_{i,j})_{j=1}^{\ell_{i}}$ are the Lagrange multipliers. Assuming appropriate smoothness conditions on each $f_{i}$ and $h_{i,j}$ , the differential game form [3],[10]—which characterizes the first–order conditions of the game—is given by

[TABLE]

where $D_{i}L_{i}$ denotes the derivative of $L_{i}$ with respect to $x_{i}$ .

Consider agent $i$ ’s optimization problem (2) with $x_{-i}$ fixed and where each $f_{i}$ and $h_{i,j}$ for $j\in\{1,\ldots,\ell_{i}\}$ , $i\in\mathcal{I}$ are concave, twice continuously differentiable functions. Then, assuming an appropriate constraint qualification condition [11], the necessary and sufficient conditions for optimality of a point $x_{i}$ are as follows: there exists $\mu_{i}\in\mathbb{R}^{\ell_{i}}_{+}$ such that (i) $D_{i}L_{i}(x,\mu_{i})=0$ ; (ii) $\mu_{i}h_{i,j}(x_{i})=0$ for each $j\in\{1,\ldots,\ell_{i}\}$ ; (iii) $h_{i,j}(x_{i})\geq 0$ for each $j\in\{1,\ldots,\ell_{i}\}$ . Regardless of the concavity assumption, the point $x_{i}$ is a local maximizer if $\mu_{i,j}>0$ and $z^{\top}D_{ii}^{2}L_{i}(x,\mu_{i})z<0$ for all $z\neq 0$ such that $D_{i}h_{i,j}(x_{i})^{\top}z=0$ for $j\in A_{i}(x_{i})$ . Such conditions motivate the following definition.

Definition 2 (Differential Nash

Equilibrium).

Consider a game $(f_{1},\ldots,f_{p})$ on $\mathcal{C}$ where $f_{i}$ and $h_{i,j}$ for each $j\in\{1,\ldots,\ell_{i}\}$ and $i\in\mathcal{I}$ are twice continuously differentiable. A point $x\in\mathcal{C}\subset\mathbb{R}^{p}$ is a differential Nash equilibrium if there is a $\mu\in\mathbb{R}^{\sum_{i=1}^{p}\ell_{i}}$ such that the pair $(x,\mu)$ satisfies (i) $\omega(x,\mu)=0$ ; (ii) for each $i\in\mathcal{I}$ , $z^{\top}D_{ii}L_{i}(x,\mu_{i})z<0$ for all $z\neq 0$ such that $D_{i}h_{i,j}(x_{i})^{\top}z=0$ , and $\mu_{i,j}>0$ for $j\in A_{i}(x_{i})$ . If, for a given $\varepsilon>0$ , (i’) $\omega(x,\mu)=\varepsilon$ with all the other conditions being satisfied, then $x$ is a $\varepsilon$ –differential Nash equilibrium.

The above definition extends the definition of a differential Nash (if we restrict to Euclidean spaces), first appearing in [10], to constrained games on Euclidean spaces. Using this definition, we can also extend Proposition 1 of [10], again where strategy spaces are restricted to be subsets Euclidean.

Proposition 1.

A differential Nash equilibrium of the $p$ –person concave game $(f_{1},\ldots,f_{p})$ on $\mathcal{C}$ is a local Nash equilibrium.

The proof is straightforward and we leave it to Appendix A. The proposition says that the conditions of Definition 2 are sufficient for a local Nash. In contrast to single-agent optimization problems, for games, the second order conditions do not imply the equilibrium is isolated [10]. A sufficient condition guaranteeing that a Nash equilibrium $x$ is isolated is that the Jacobian of $\omega(x,\mu)$ , denoted $D\omega(x,\mu)$ , is invertible [3].

We use (necessary and sufficient) optimality conditions on individual player optimization problems holding other players’ strategies fixed to formulate the utility learning framework.

III Robust Utility Learning

In previous work, we have explored utility learning and incentive design as a coupled problem both in theory [3] and in practice [8, 12, 13]. In the present work, we re-examine the utility learning problem using statistical methods that serve to improve estimation and prediction accuracy.

Looking forward, our aim is to fold the new estimation scheme into the overall incentive design framework. This goal motivates why we are interested in learning more than a simple predictive model for agents, but rather a utility-based forecasting framework that accounts for individual preferences.

We parameterize $f_{i}$ by $\theta_{i}=(\theta_{i1},\ldots,\theta_{im_{i}})\in\mathbb{R}^{m_{i}}$ and a finite set of basis functions $\{\phi_{ij}(x_{i},x_{-i})\}_{j=1}^{m_{i}}$ such that

[TABLE]

where $\phi_{i}=[\phi_{i,1}\ \cdots\ \phi_{i,m_{i}}]^{\top}$ and $\bar{f}_{i}(x)$ is a function that captures a priori knowledge of the agent’s utility function (e.g., the incentive component designed by the planner).

III-A Base Utility Estimation Framework

We start by describing the basic utility estimation framework using equilibrium conditions for the game played between the players. The utility learning framework we propose is quite broad in that it encompasses a wide class of continuous games. In previous works [12, 3, 13] we have shown that the utility learning problem can be formulated as a convex optimization problem by using first– and second–order conditions for Nash equilibria. Let us briefly review this formulation as it serves as the basis for the robust utility learning method.

Each observation $x^{(k)}$ is assumed to be an $\varepsilon$ –approximate differential Nash equilibrium where the superscript notation $(\cdot)^{(k)}$ indicates the $k$ –th observation. For each observation $x^{(k)}$ , it may be the case that only a subset of the players, say $\mathcal{S}^{k}\subset\mathcal{I}$ at observation $k$ , participate in the game. Then notationally each observation is such that

[TABLE]

If player $i$ participates in $n_{i}$ instances of the game, then there are $n_{i}$ observations for that player. Let $n=\sum_{i=1}^{p}n_{i}$ be the total number of observations.

We can consider first–order optimality conditions for each player’s optimization problem and define a residual function capturing the degree of suboptimality of $x_{i}^{(k)}$ [8],[14]. Indeed, for player $i$ ’s optimization problem, let the residual of the stationarity condition be given by

[TABLE]

and the residual of the complementary conditions be given by

[TABLE]

Define

[TABLE]

Using data from the players’ decisions (e.g., lighting votes from the social game experiment which we describe in Section VI-A), the base utility learning framework consists of solving the optimization problem given by

[TABLE]

where $\Theta_{i}$ is a constraint set on the parameters $\theta_{i}$ that captures prior information about the objective, $\chi:\mathbb{R}^{p}\times\mathbb{R}^{\sum_{i=1}^{p}\ell_{i}}\rightarrow\mathbb{R}_{+}$ is a non-negative, convex penalty function satisfying $\chi(z_{1},z_{2})=0$ if and only if $z_{1}=0$ and $z_{2}=0$ , i.e. any norm on $\mathbb{R}^{p}\times\mathbb{R}^{\sum_{i=1}^{p}\ell_{i}}$ , and the inequality $\mu_{i}\geq 0$ is element-wise.

The goal of this optimization problem—which is a finite dimensional optimization problem in the $\theta_{i}$ ’s—is to find $\theta_{i}$ for each player such that $(\hat{f}_{i})_{i\in\mathcal{I}}$ is consistent (or approximately consistent) with the data. As is noted in [14], we also remark that it is important that the sets $\Theta_{i}$ contain enough prior information about the objectives $f_{i}$ in order to prevent trivial solutions. For example, if it is the case that $\bar{f}_{i}(x^{(k)})=0$ for each $k$ and each $\Theta_{i}=\mathbb{R}^{m_{i}}$ then the trivial solution $\theta_{i}=\boldsymbol{0}_{m_{i}}$ is feasible. For many applications some a priori knowledge on part of the utility functions of players may be encoded in each $\Theta_{i}$ (e.g., choosing $\Theta_{i}$ such that $\theta_{1i}=1$ or similarly selecting the incentive component of the utility, a design possibility for the planner [3]) or through other normalization techniques to prevent such trivial solutions. In the context of the social game application (in Section VI-C), we explicitly discuss how to construct this constraint set in such a way that we ensure the estimated utility functions are concave which in turn guarantees that there exists a Nash equilibrium to the estimated game.

III-B Robust Utility Learning

Let us now formulate a robust version of the utility learning framework that allows us to reduce our forecasting error and learn the noise structure which can be leveraged in extracting pseudo–coalitions between players which we describe in the sequel.

Define

[TABLE]

where

[TABLE]

and $n_{d}=(\ell_{i}+1)n$ is the total number of data points. The regressor matrix is then defined as $X=\text{diag}(X_{1},\cdots,X_{p})\in\mathbb{R}^{n_{d}\times(\ell_{i}+1)p}$ where $X_{i}=[(X_{i}^{(1)})^{\top}\ \cdots\ (X_{i}^{(n_{i})})^{\top}]^{\top}$ . Define the regression coefficient

[TABLE]

and the observation matrix $Y=[Y_{1}\ \cdots\ Y_{p}]^{\top}\in\mathbb{R}^{(\ell_{i}+1)p}$ where

[TABLE]

Using the Euclidean norm for $\chi$ in (P) leads to an OLS problem with inequality constraints—i.e. a constrained OLS (cOLS):

[TABLE]

where $\mathcal{B}=\{\beta|\ \theta_{i}\in\Theta_{i},\mu_{i}\geq 0,\ \forall i\in\mathcal{I}\}$ . Enforcing that each of the constraint sets $\Theta_{i}$ is encoded by inequalities on $\theta_{i}$ , the above stated problem can be viewed as a classical multiple linear regression model with inequality constraints described by the data generation process

[TABLE]

where $\epsilon=(\epsilon_{1},\ldots,\epsilon_{p})$ is the error term satisfying: (i) $E(\epsilon|X)=0^{n_{d}\times 1}$ ; (ii) $\text{cov}(\epsilon|X)=\sigma^{2}I^{n_{d}\times n_{d}}$ ; (iii) $\{\epsilon_{i}\}_{i=1}^{p}$ independent and identically distributed (i.i.d) with a zero mean and $\sigma^{2}$ variance. In addition, we assume $\epsilon$ is nonspherical [15]. With this general statistical model we are able to describe a data generation processes in which the error terms are correlated or lack constant variance. This fact will be leveraged in creating coalitions between players as we describe in Section V.

Mathematically the nonspherical errors are modelled by

[TABLE]

One drawback of this technique is that, given nonspherical standard errors, the cOLS estimator is biased—that is, it does not satisfy the Best Linear Unbiased Estimator (BLUE) property, a result of the Gauss–Markov theorem [15, Theorem 1, Chapter 5]. However, we can derive an unbiased estimator by multiplying (17) on the left with $G^{-\frac{1}{2}}$ . This leads to the cGLS statistical model given by

[TABLE]

which now satisfies the BLUE property. In general, the explicit form of $\text{cov}(\epsilon|X)=G$ is unknown. We use the residuals (17) to infer the noise by imposing structural constraints on $G$ .

We remark that there are many types of noise structures that can be used for imposing structure on $G$ . We provide two example noise structures that could be used. The first is block diagonal structure [15, Chapter 5]; in particular, we impose that $G=\text{blkdiag}({K}_{1},\cdots,{K}_{p})\in\mathbb{R}^{n_{d}\times n_{d}}$ where ${K}_{i}=\text{blkdiag}(B_{i,1},\ldots,B_{i,n_{i}})\in\mathbb{R}^{(\ell_{i}+1)n_{i}\times(\ell_{i}+1)n_{i}}$ with each $B_{i,k}\in\mathbb{R}^{(\ell_{i}+1)\times(\ell_{i}+1)}$ . Estimating $\beta$ with cOLS, we get $\hat{\beta}_{\text{cOLS}}$ with residual vector $e=Y-X\hat{\beta}_{\text{cOLS}}\in\mathbb{R}^{(\ell_{i}+1)n}$ . The residual vector $e$ can be decomposed into residuals for each player by writing $e=[e_{1}^{\top}\ \cdots\ e_{p}^{\top}]^{\top}$ . We use $e_{i}$ to compute an estimate $\hat{K}_{i}$ of ${K}_{i}$ which is, in turn, used to compute $\hat{G}$ . The residuals come in triplets since for each $k$ , $Y_{i}^{(k)}\in\mathbb{R}^{\ell_{i}+1}$ . For ease of presentation and comprehension, we will use a paired index for the residuals instead of a single index. For example, for player $i$ , there are $n_{i}$ instances at which we have $\ell_{i}$ observations. Let $(e_{i})_{k,j}=(e_{i})_{(\ell_{i}+1)(k-1)+j}$ where $k\in\{1,\ldots,n_{i}\}$ and $j\in\{1,\ldots,(\ell_{i}+1)\}$ . With the residuals, we can then form estimates $\hat{B}_{i,k}\in\mathbb{R}^{(\ell_{i}+1)\times(\ell_{i}+1)}$ of $B_{i,k}$ where $\hat{B}_{i,k}$ takes the form

[TABLE]

with $(\hat{B}_{i,k})_{j,j}=n_{i}^{-1}\sum_{t=1}^{n_{i}}e_{t,j}^{2}$ and $(\hat{B}_{i,k})_{l,j}=n_{i}^{-1}\sum_{t=1}^{n_{i}}e_{t,j}e_{t,l}$ for $j\neq l$ . We provide this noise structure as an example because in our formulation we allow for constraints on the players’ optimization problems so that for each iteration $k$ , we in fact have multidimensional observations as can be seen in (12).

The second noise structure we consider is adapted from the $\text{HC}_{4}$ estimator [16] and is given by

[TABLE]

where $\delta_{i}=\text{min}\left\{4,n_{d}b_{i}/(\sum_{i=1}^{n_{d}}b_{i})\right\}$ and the $b_{i}$ ’s are the diagonal elements of $B=X(X^{\top}X)^{-1}X^{\top}$ . With this structure, the penalty for each residual increases with $b_{i}/\sum_{j=1}^{n_{d}}b_{j}$ . As with the previous noise structure, we use the fitted cOLS estimator $\hat{\beta}_{\text{cOLS}}$ and residuals to get an initial $\hat{G}$ . We selected to present this noise structure because it is computationally efficient compared to many other noise structures.

In both cases, we substitute the inferred noise, $\hat{G}$ , into the cGLS statistical model (19) to get the one–step constrained Feasible GLS (cFGLS) estimators. We iterate between the estimation of $\hat{G}$ and $\beta_{\text{cFGLS}}$ either until convergence or for a fixed number of iterations to prevent overfitting. To resolve this trade-off and find the optimal iteration size we adopt a simple cross validation method.

III-C Boosting with Ensemble Methods

In this subsection, we describe several ensemble methods. Combined with a bootstrapping process, ensemble methods not only boost the size of what can often be a small data set in practice but also allow us to improve the estimator performance and explore the bias–variance tradeoff.

III-C1 Bootstrapping and Bagging

Bootstrapping is a technique for asymptotic approximation of the bias and standard error of an estimator in a complex and noisy statistical model [15],[17]. We employ wild bootstrapping to generate a pseudo-data set from which we generate several weak estimators that we then combine using bagging. While we assume that $E(Y|X)=X\beta$ , we also allow for heteroskedasticity by conditioning on the residual transformations that we imposed in the noise structure. Wild bootstrapping is a technique of parametric bootstrapping that is consistent with heteroskedastic inference and cFGLS data generation.

The bootstrapping process can be described in two steps: First, we fit our cFGLS model which gives us $\hat{\beta}_{\text{cFGLS}}$ . Then, generate $N$ replicates of pseudo–data using the data generation process

[TABLE]

where $\tilde{Y}\in\mathbb{R}^{n_{d}}$ is the new observation vector (pseudo-observations), $\hat{\beta}_{\text{cFGLS}}\in\mathbb{R}^{n_{d}}$ is the cFGLS estimator, $\varepsilon\sim N(0,I^{n_{d}\times n_{d}})$ , $e\in\mathbb{R}^{n_{d}}$ is the residual vector given by $e=\tilde{Y}-X\hat{\beta}_{\text{cFGLS}}$ and $\Phi:\mathbb{R}^{n_{d}}\rightarrow\mathbb{R}^{n_{d}}$ is a nonlinear transformation such that $\Phi(e)=\hat{G}^{\frac{1}{2}}\in\mathbb{R}^{n_{d}\times n_{d}}$ . Since $E(\Phi(e)\varepsilon|X)=\Phi(e)E(\varepsilon|X)=\Phi(e)E(\varepsilon)=\mathbf{0}_{n_{d}\times n_{d}}$ , using the data generation process in (24), we resample from i.i.d variables.

Bagging in regression models and trees is a technique for reducing the overall variance [17]. Using the $N$ replicates of pseudo–data generated by wild bootstrapping, we train $N$ different models. We combine the resulting bootstrapped estimators by averaging:

[TABLE]

where $\hat{\beta}_{\text{cFGLS},j}$ is the estimator using the $j$ –th pseudo–data sample. Bagging works efficiently with high variance models and does not hurt the overall performance of the statistical model. We refer to the bagged estimates as bagged mega-learners since they combine several weak learners/estimators. Using wild bootstrapping, the empirical covariance matrix of $\hat{\beta}$ is an asymptotic approximation of the covariance matrix and is given by

[TABLE]

Asymptotic estimation of the empirical covariance matrix reveals hidden structures between players and is what we leverage in the correlation utility learning procedures.

III-C2 Bootstrapping and Bumping

In a similar fashion as the bagging ensemble method, we combine bumping—a method for fitting cFGLS estimators by using a random search over the model space [18]—with the wild bootstrapping generated pseudo-data. In particular, we apply a stochastic search over several different statistical models coming from a similar data process—i.e. the data process in (24).

We add the original training data sample to the $N$ replicates of pseudo-data generated by the wild bootstrapping process and we use this data to estimate $N+1$ cFGLS estimators. We evaluate these estimators on the training set and select the one with the least training error. The cFGLS bumping estimator is given by

[TABLE]

where $\hat{\beta}_{\text{cFGLS},j}$ ’s are the cFGLS estimators from derived from the bootstrapped data.

III-C3 Gradient Boosting

We combine $L_{2}$ –gradient boosting—which is a repeated least squares fitting of residuals [19]—with cFGLS. Gradient boosting is a boosting technique that uses an $L_{2}$ loss function combined with a gradient descent update method for combining weak learners at each iteration. Boosting estimators are trained in sequence using a weighted version of the original data set. In general, boosting methods are extremely useful for combining models by incrementally training each new model by emphasizing the errors of the previous training instances. They are used extensively in classification methods such as logistic regression and support vector machines.

Repeated residual fitting is applied until we reach iteration $m_{\text{stop}}$ , a stopping criteria selected using Akaike Information Criterion (AIC) to avoid overfitting [20]. . The procedure is detailed in Algorithm $1$ .

IV Application to Bertrand-Nash Competition

Let us illustrate the framework and its performance of the robust utility learning framework before moving on by applying it to estimate market demand functions under Bertrand-Nash equilibrium (see, e.g., [21, 22, 23]). The toy model can be thought of as an abstraction of Bertrand-price setting for commodities such as oil, gas, and coal [24, 25].

Consider two firms competing to sell their product by setting the price $p_{1}$ and $p_{2}$ for firm $1$ and $2$ , respectively. The firms utility functions are their revenue, i.e. $f_{i}(p_{1},p_{2})=p_{i}D_{i}(p_{1},p_{2},\xi)$ where $D_{i}$ is the demand function for firm $i$ and $\xi\sim\mathcal{N}(1.5,0.5)$ is a random variable that captures the fact that demand is dependent on economic indicators in addition to the prices set by the firms. In this stylized example, we consider linear demand functions given by

[TABLE]

where $\theta_{i}=(\theta_{i,j})_{j=1}^{3}$ are unknown parameters to be estimated and $\nu=1.5$ is a known parameter. The prices are constrained to be in the interval $[0,\bar{p}]$ where $\bar{p}\in\mathbb{R}_{+}$ is the upper bound. We let $\theta_{1}=(-1.0,0.5,-1)$ and $\theta_{2}=(0.3,-1,0.3)$ be the ground truth values for the parameters we wish to estimate. Thus, $\bar{f}_{i}(p_{1},p_{2})=\nu\xi$ and examining the marginal revenue functions $D_{i}f_{i}(p_{1},p_{2})$ we have that $\phi_{1}(p_{1},p_{2})=[1\ 2p_{1}\ p_{2}]^{\top}$ , and $\phi_{2}=[1\ p_{1}\ 2p_{2}]^{\top}$ .

In order to generate the data set we add a noise term $\varepsilon\sim\mathcal{N}(0,0.5)$ to the marginal revenue functions, i.e. $D_{i}f_{i}(p_{1},p_{2})+\varepsilon$ , and solve for the Bertrand-Nash equilibrium. We simulate the game between the firms $600$ times. In the robust utility learning framework, for this example, we employ the HC4 noise structure and compute the cOLS, cFGLS, bagging, boosting and bumping estimators. We use a $10$ –fold cross validation proceedure to prevent over-fitting. Table I contains error using two metrics for both firms. Figure 1 shows the forecast for part of the testing set using cOLS and each of the ensemble methods as compared to the ground truth. While bagging performed best for firm $1$ and boosting for firm 2 in the particular instantiation of this toy example, the performance more generally is dependent on the noise structure in the demand and marginal revenue functions, the sample size, and the dynamics between the two firms. However, it is interesting to point out that as we increase the variance on $\xi$ , each of the ensemble methods performance stay relatively the same yet the cOLS error increases significantly.

V Correlated Utility Learning

In this section, we describe how learned correlations between players can be leveraged to boost estimator performance. We add a second step to the estimation procedure in which we craft a new game where players’ utilities are composed of their original estimated utility plus some combination of other players’ utilities weighted by the estimated correlation between players.

When the correlations between players are positive, we are creating what we refer to as pseudo-coalitions since players are not explicitly agreeing to collude in the game but rather are doing so implicitly. The degree of coalition is discovered by the robust utility learning process through estimating the empirical covariance $\hat{C}_{\beta}$ , i.e. asymptotic approximation of the covariance matrix—of $\hat{\beta}_{\text{est}}$ where we use the notation $\hat{\beta}_{\text{est}}$ to abstractly denote the estimator derived from whichever of the methods described in the previous section is employed. On the other hand, when the correlations between players are negative, by combining their utilities we aim to take advantage of active players’ richer data sets in predicting the behavior of players with less variation and frequency in their observed actions.

We refer to the learned utility— $\hat{f}_{i}$ for player $i$ —from the robust utility learning framework as the base utility and it is given by

[TABLE]

where $\hat{\theta}_{i}$ is extracted from $\hat{\beta}_{\text{est},i}$ .

Using the correlations we learn when we estimate $\hat{f}_{i}$ , we construct a new utility $\hat{g}_{i}$ by combining scaled versions of a subset (potentially all) of the other agents’ utilities that are correlated with agent $i$ . We formulate an optimization problem to deterimine the scaling coefficients. The correlated utility $\hat{g}_{i}$ for player $i$ is given by

[TABLE]

where $\mathcal{K}_{i}\subset\mathcal{I}_{i}$ a subset of the players correlated with player $i$ , $\sigma_{i,i}$ is the estimated variance of player $i$ determined by the empirical covariance matrix, $\sigma_{i,j}$ is the covariance between the parameter estimates for player $i$ and $j$ also determined by the empirical covariance matrix, and $z_{i,j}$ are scaling constants to be optimized. We refer to the resulting game as an approximated correlation game444We remark that there exists an equilibrium concept called correlated equilibrium [26] which generalizes a Nash equilibrium by characterizing correlations between randomized strategies; we mention this only to alleviate any potential confusion. The equilibrium concept we utilize for the approximated correlation game is still a pure Nash equilibrium and there is no coordinating mechanism..

Given the form of $\hat{g}_{i}$ , our goal is to select the scaling constants $z_{i,j}$ in order to reduce the forecasting error. Analogous to the base utility learning framework presented in Section III-A, using our training data, we formulate a convex optimization problem using optimality conditions on each player’s individual optimization problem where we assume that player $i$ is optimizing $\hat{g}_{i}$ with respect to its own choice variable $x_{i}$ . In particular, we solve a convex optimization problem formulated as follows. Define the vector $z_{i}\in\mathbb{R}^{|\mathcal{K}_{i}|}$ by $z_{i}=({z_{i,j}})_{j\in\mathcal{K}_{i}}$ and let $z=(z_{i})_{i\in\mathcal{I}}$ . For player $i$ ’s optimization problem $\max\{\hat{g}_{i}(x_{i},x_{-i})|\ x_{i}\in\mathcal{C}_{i}\}$ , let the residual of the stationarity condition be given by

[TABLE]

and the residual of the complementary conditions be given by

[TABLE]

As before, let $r_{\text{c},i}^{(k)}(\mu_{i})=[r_{\text{c},i}^{1,(k)}(\mu_{i})\ \cdots\ r_{\text{c},i}^{\ell_{i},(k)}(\mu_{i})]$ . Define $Q_{i}\in\mathbb{R}^{n_{i}\times|\mathcal{K}_{i}|}$ by

[TABLE]

and $q_{i}\in\mathbb{R}^{n_{i}}$ by

[TABLE]

Then, we have the following convex optimization problem to determine the scaling factors $z_{i,j}$ :

[TABLE]

Solving P’ gives us estimated correlated utilities $\hat{g}_{i}$ for each $i\in\mathcal{I}$ that we then use to forecast the players’ decisions.

VI Application to Smart Building Social Game

We now specialize the robust and correlated utility learning frameworks to the smart building social game.

VI-A Social Game Experimental Set-Up

Our experimental setup is in a collaboratory space—an open, shared work space with cubicles—within the CREST center on the UC Berkeley campus. We crafted a social game such that occupants in this collaboratory freely vote according to their usage preferences of shared resources and are rewarded with points based on how energy efficient their strategy is in comparison with the other occupants. We employ a lottery mechanism consisting of three Amazon gift cards executed bi-weekly to reward occupants; occupants with more points are more likely to win the lottery.

The office is divided into five lighting zones and two heating, ventilating, and air conditioning (HVAC) zones. In this space, there is a total $20$ occupants who are eligible to participate in the social game. If the occupants are not present in the office, they are excluded from the game at that time instant. When they arrive at the office, they can rejoin the game. To enforce the rule that those who are not present in the space cannot vote remotely, we executed a simple presence detection algorithm based on their power usage [27, 28].

We have installed a Lutron555http://www.lutron.com/en-US/Pages/default.aspx system for precise control of the lighting setting (dim level of the lights) in the office as well as desk–level energy monitoring devices (i.e. ACME wireless sensors [29]) to meter the energy usage of each occupant. In addition, we have modified the HVAC system so that it can be precisely controlled. We have verified prior to our experiment that implemented control of these systems results in expected performance.

We have developed a platform to interface with the occupants as well as manage and process collected data. The platform includes a web portal and mobile app that the occupants may use to participate in the game. It also allows for occupants to visualize different aspects of the social game—e.g., the lighting setting and the energy efficiency level of different occupants or the entire building—as well as view the point level and historical voting record of other occupants among many other statistics. Figure 2 shows the user interface for viewing points and logging votes. Figure 3a shows a visualization of the current light level using a green–to–red color scale with green being more energy efficient. The current temperature is also displayed. Figure 3b shows a visualization of each present and participating occupant’s energy efficiency level.

In this paper, we report on a social game experiment conducted based only the lighting shared resource666We remark that while our experimental platform is capable of conducting a social game that includes lights, HVAC, and personal energy consumption, we only report on an experiment that focuses on lighting in order to isolate combined effects from these different resources. In on-going experiments, we are examining all aspects jointly.. Prior to the start of the social game experiment, the lighting setting was $90$ % of the maximum possible lighting setting. At the start of the social game experiment, we set a default lighting setting which acts as the suggested lighting setting and is the dim level setting in the office if, e.g., no occupants are participating in the game. Throughout the game, we adjust the default lighting setting as well as the points. The lottery mechanism coupled with the points we distribute compose the incentive component of the feedback to the participants while the default lighting level is the physical control component of the feedback. These two mechanisms act as our control inputs and our feedback mechanism to the participants. We seek to design them by taking into consideration the preferences of the participants. In this way, these mechanisms close the loop around the participant and with our proposed utility learning scheme, these mechanisms can be modified to encourage more energy efficient resource consumption.

The game is designed to leverage interactions amongst occupants, who win points based on how energy efficient their lighting vote is compared to others. An occupant’s vote is for the lighting setting in their zone as well as for neighboring zones. The occupants select their desired lighting setting in the continuous interval $[0,100]$ where each value represents the percentage of the maximum lighting setting possible in the space. The occupants can vote as frequently as they like and the average of all the occupants’ current votes sets the implemented lighting setting in the collaboratory. An occupant can leave the lighting setting as the default level after logging in or they can change it depending on their preferences and other environmental factors that may affect their choice.

The experimental trials reported on in this paper were conducted over the period of $285$ days777The period of the experiment was $2014/3/3$ – $2014/12/14$ .. Experiments with $4$ different default levels, $\{10\%,20\%,60\%,90\%\}$ , were conducted, covering a spectrum of lighting conditions. Since occupants were allowed to vote whenever they chose, their response rate per day varies. The data set we collected consists of occupant votes (meaning the lighting level they select) over the period of investigation as well as the points that were distributed to each occupant. We collected 6,885 votes over the period of the experiment.

VI-B Brief Background

In order to place the work pertaining to building energy efficiency in the context of the state of the art, we briefly overview existing approaches.

Recognizing that HVAC systems are responsible for a large portion of building energy consumption, many control theoretic approaches such as [30, 31] derive model predictive and distributed control polices for HVAC systems. While these control theoretic approaches make efforts to account for the presence of occupants, they tend to ignore occupant behaviors and, more importantly, their heterogeneous preferences.

There are other works that make strides towards incorporating behavioral models of occupants; e.g., the authors of [32] employ a multi-agent systems approach to develop a framework for incorporating occupant comfort preferences and the authors of [33] develop behavioral models for lighting usage. In a more active approach, the authors of [34] develop a collaborative setting definition paradigm in which occupants and facilities managers submit preferences and requirements and a rule engine tries to resolve them in order to create a universal control policy. While occupants’ preferences are taken as inputs to the building control design, it is not clear that it is possible to satisfy all the occupants’ comfort preferences simultaneously with those of the facilities manager; hence, the misalignment between preferences and incentives remains.

In our approach, on the other hand, we leverage a social game that creates (friendly) competition between users and employs incentives to resolve conflicting preferences by compensating users. Within the energy application domain, gamification has been largely used for education or awareness (see, e.g., [35, 36]). There are works that are closely related to ours in the sense that they also recognize that occupants are self-interested participants in smart buildings and try to account for their strategic behavior. For example, in [37], the authors develop an interesting scheme for engaging occupants directly in DR. Analogous to our approach, occupants are modeled as utility maximizers in a game theoretic context where they are incentivized to curtail their consumption in response to an event. Our approach differs in that we focus on shared resources such as lighting and HVAC instead of personal devices (e.g., desk appliances). Furthermore, it is assumed in [37] that the type space (i.e. their preferences) of the users is a known finite set of two possible values. We do not assume the facility manager knows the utility function or the type of the users and we propose an algorithm for learning this utility function from observations of decisions.

While incorporating occupant preferences into building automation is not novel in and of itself, we propose an innovative algorithm for learning occupant preferences in competitive environments and, moreover, learn how their actions are correlated. Such correlations can be leveraged in improving incentive mechanisms to shape users’ preferences thereby providing more flexibility. Our method is applied to real-world data from experimental trials we conducted as opposed to simulations as is the case with many existing works. Furthermore, it is agnostic to the application and could be applied in general to other scenarios in which users are competing for constrained but shared resources. For example, the utility learning method can be easily adapted to learning preferences of individual buildings interacting with an aggregator or learning preferences of drivers seeking on-street parking [7]. In each of these cases, there exists a planner—the aggregator or department of transportation—tasked with managing a resource being consumed by self-interested users.

VI-C Occupant Decision-Making Model

Each agent’s vote $x_{i}$ is constrained to be in the interval $[0,100]\subset\mathbb{R}$ . Let $\bar{x}$ denote the average of the lighting votes and the setting that is implement—e.g., at observation instance indexed by $k$ , $\bar{x}^{(k)}=\frac{1}{|\mathcal{S}^{k}|}\sum_{j\in\mathcal{S}^{k}}x_{j}^{(k)}$ . We model each agent’s utility as being composed of two basis functions that capture the tradeoff between desired lighting (satisfaction) and desire to win. The lighting satisfaction an occupant feels may be a function of several factors including their productivity (ability to perform their job) as well as physical comfort. We abstractly model their desired lighting level using a Taguchi loss function, $\psi_{i}(x_{i},x_{-i})=-\left(\bar{x}-x_{i}\right)^{2}$ , which is interpreted as modeling occupant dissatisfaction in such a way that it is increasing as variation increases from their reported desired lighting setting (their vote) [38].

We acknowledge that an agent may have some internal desired lighting level that is different than its vote; e.g., the agent may realize that voting an extreme value pushes the average toward a more desirable setting. This type of gaming results in moral hazard type issues which can be addressed in the incentive design step [1, 2]. Thus, we set this type of gaming aside for the time being, and focus instead on the unknown preferences—a different kind of asymmetric information that leads to adverse selection—between lighting and winning.

Points are distributed by the planner using the relationship $\rho(x_{b}-x_{i})(p(x_{b}-\bar{x}))^{-1}$ where $x_{b}$ is the baseline setting for the lights. For the experiment $x_{b}=90$ %, i.e. the lighting setting used before the implementation of the social game. However, we model each occupant as having a winning basis function given by $\phi_{i}(x_{i},x_{-i})=-\rho c\left({x_{i}}\right)^{2}$ where $\rho$ is the total number of points distributed by the planner and $c$ is a scaling factor that is used primarily to scale the two terms of the utility function given that we artificially inflate the points offered in order to increase their appeal to players and thus induce greater participation888Inflating the points is a process of framing [39]—that is, dependent on how the reward system is presented to agents greatly impacts their participation. Framing is routinely used in rewards programs for credit cards among many other point-based programs. The scaling factor $c$ in the winning function removes the framing effect from the estimation procedure. It is selected to ensure the scale of the two basis functions are similar.. The form of the winning function can be interpreted as capturing the perception that by voting zero, the occupant is selecting the action that will provide the greatest return of points given that points are awarded based on how energy efficient their vote is compared to others999We explored other forms of the winning function including the $\log$ function, a quasi-concave function that is typically used to represent how individuals value money since it represents the diminishing returns property well [12]. However, the quadratic form of the function we report on here significantly outperformed other choices so that, for the purpose of a prescriptive model, it captures the agents’ perceptions about the point distribution mechanism and their value more accurately..

Hence, the utility functions for the social game are modeled as $f_{i}(x_{i},x_{-i};\theta_{i})=\theta_{i}\phi_{i}(x_{i},x_{-i})+\psi_{i}(x_{i},x_{-i})$ . The constraint sets $\mathcal{C}_{i}$ for each player are determined by the box constraints on the lighting vote for that player, i.e. $\mathcal{C}_{i}=\{x_{i}\in\mathbb{R}|\ h_{i,j}(x_{i})\geq 0,\ j\in\{1,2\}\}$ where $h_{i,1}(x_{i})=100-x_{i}$ and $h_{i,2}(x_{i})=x_{i}$ .

In order to formulate (P) for the social game application, we need to determine the admissible parameter sets $\Theta_{i}$ , $i\in\mathcal{I}$ in such a way that we ensure the estimated utility functions are concave and such that equilibria of the estimated game are isolated. We derive a lower bound $\theta_{\text{LB}}$ such that all $\theta_{i}\in\Theta_{i}=\{\theta_{i}\in\mathbb{R}|\ \theta_{i}>\theta_{\text{LB}}\}$ , $i\in\mathcal{I}$ induce games with these characteristics. To this end, we utilize the second derivative condition on players’ utility functions; that is, if for each $i\in\mathcal{I}$ , $D_{i,i}^{2}f_{i}(x)<0$ , then the game is concave. Computing $D_{i,i}^{2}f_{i}$ and using some algebra, we have that $\theta_{i}>-(c\rho)^{-1}(1-p^{-1})^{2}$ where the right-hand side is a negative non-increasing function of $p$ . Thus, concavity is ensured regardless of the number of players by setting $p=2$ , the minimum number of players in a non-cooperative game. Then, given fixed $\rho$ and $0<\zeta<<1$ , the lower bound $\bar{\theta}_{\text{LB}}=-(4c\rho)^{-1}+\zeta$ will guarantee the estimated game is concave.

If $D\omega(x,\mu)$ is invertible, we know that differential Nash equilibria are isolated [10]. Hence, we can augment the constraint sets $\Theta_{i}$ to encode this condition. Given the structure of the utility functions, $D\omega(x,\mu)$ is simply the game Hessian $H=[H_{i,j}]_{j,i=1}^{p}$ with $H_{i,i}=D_{i,i}^{2}f_{i}$ and $H_{i,j}=D_{i,j}^{2}f_{i}$ . Hence, if $H$ is invertible, then the differential Nash are isolated; this is guaranteed for $p\geq 4$ provided the constraint defined by $\bar{\theta}_{\text{LB}}=-(4c\rho)^{-1}+\zeta$ using $\zeta=10^{-2}$ . Indeed, let $H(p)$ denote the game Hessian as a function of the number of players and note that for a particular $p$ , with some simple algebra, it is easy to write $H(p)$ as a off-diagonal matrix constant matrix such that $H_{ii}=d_{i}+\alpha$ and $H_{i,j}=\alpha$ where $d_{i}=-2(1-1/p)-2c\rho\theta_{i}$ and $\alpha=2(p-1)/p^{2}$ . It is straightforward to verify by determining the eigenvalues of $H$ as $p$ varies via the method described in [40] that for $p\geq 4$ , $H$ will be invertible . For the social game data, at each observation indexed by $k$ , the number of participating players is at least $4$ . Thus, to ensure concavity and isolated equilibria of the estimated social game, we define $\Theta_{i}=\{\theta_{i}\in\mathbb{R}|\ \theta_{i}>\bar{\theta}_{\text{LB}}\}$ with $\bar{\theta}_{\text{LB}}=-(c\rho 4)^{-1}+\zeta$ with $\zeta=10^{-2}$ .

VII Utility Learning Results

We now present the results of the proposed robust utility learning method applied to data collected from the social game experiment.

As we previously described, our data set consists of the votes logged by the players which vote throughout the day. We present estimation results for the complete data set of all the votes—which we refer to as the dynamic data set—and estimation results for an aggregated data set constructed by taking the average of a players’ votes over the course of each day in the experiment—this is referred to as the average data set. While this aggregation significantly reduces the size of our data set, it smooths the players’ voting profiles and increases the size of active players in each game—occupants may arrive or leave the office when they so choose. This average data set also reduces the computational load, which may be beneficial to a facilities manager in the incentive design process, especially if the incentive scheme is quasi-static and uses historical data to generate the next incentive. The dynamic data set is much richer, being composed of every vote (a total of $6,885$ votes) the occupants made throughout the duration of the experiment ( $285$ days). The time from one vote to the next may be several minutes to hours depending on the activity of the occupants. This data set is much larger and thus, increases the computational load. However, it allows us to extract more distinct player profiles and can support real-time incentive design schemes.

We present results for both data sets using data from the period of the experiment in which the default lighting setting was $20$ %—the results for the other default lighting settings are similar. The period of the experiment where the default lighting setting was $20$ % consisted of $42$ days and thus the size of the averaged data set is $42$ . Over this period there were $220$ votes by occupants, which is the size of the dynamic data set. We divide each of the data sets into training ( $80\%$ of the data) and testing ( $20\%$ of the data) sets and apply each of the methods discussed in Section III. We apply a $10$ –fold cross validation [17] procedure to limit overfitting.

VII-A Forecasting via Robust Utility Learning

We estimate the parameters using cFGLS and the ensemble methods bagging, bumping, and boosting for both the average and dynamic data sets. For gradient boosting, we use the HC4 noise structure (see (21)) since the leverage values $b_{ii}$ of $B$ are larger [16]; in each of the other methods, we used the block diagonal noise structure (see (20)).

Using the estimated utility functions, we simulate the game using a projected gradient descent algorithm which is known to converge for concave games [42]. In Figure 4a and 4b, we compare the ground truth voting data to the predictions for each of the learning schemes using the dynamic and averaged data sets, respectively. Our proposed robust models—i.e. using the estimated parameters obtained via bagging, bumping, and boosting—capture most of the variation in the true votes (in both data sets) and significantly outperform cOLS. In Table II, using three metrics—Root Mean Square Error (RMSE), Mean Absolute Error (MAE), and Mean Absolute Scaled Error (MASE)—we report the forecasting error for each of the methods.

The estimated models using our robust utility learning methods significantly reduce the forecasting error as compared to cOLS. The cOLS method has particularly poor forecasting performance on the dynamic data set since it does not capture the correlated error terms describing the interactions between users. Moreover, our robust methods perform better than cOLS with the averaged data set even though the sample size is small.

As for the ensemble methods, bagging outperforms the other three methods when using the dynamic data set. On the other hand, for the averaged data set, gradient boosting gives the least forecasting error. This is in large part due to the fact that we use the HC4 noise structure. Since the average data set has been smoothed, we expect less correlation between players and the HC4 noise structure captures this.

VII-B Estimated Utility Functions

Figure 5 shows the estimated utility functions and their contour plots for occupants $2$ and $8$ —passive and aggressive occupants respectively—using the parameters obtained via the bagging ensemble method with the dynamic data set. We remark that we do not observe the actual value of agents’ utilities; we instead observe only the agents’ decisions. The purpose of the figures is to show the estimated utility shapes for players with significantly different voting profiles (the observable we have). The particular occupants we selected represent players that prefer winning to lighting satisfaction (occupant $8$ ) and players that prefer lighting satisfaction to winning (occupant $2$ ). In particular, occupant $2$ ’s estimated utility function appears to be higher at greater lighting settings. Exactly the opposite occurs for occupant $8$ whose estimated utility function indicates that despite changes in the average lighting vote of other players, occupant $8$ aggressively votes for a zero lighting setting which returns the most points.

For comparison—and to highlight the improvement that the robust utility learning framework offers—in Figure 6 we show the estimated utility function for occupant $8$ using cOLS. What we see is a very different utility function that indicates occupant $8$ cares more about lighting satisfaction than winning—indicated by the fact that its utility is not maximized at zero. This is misleading since occupant $8$ predominately votes for zero. This is significant since incentive/control design based on such an erroneous utility function may lead to very poor performance and occupant dissatisfaction.

VII-C Bias Approximation and Bias–Variance Tradeoff

Forecasting accuracy can be enhanced by allowing for a small amount of bias if it results in a large reduction in variance. For a process $Y=X\theta+\epsilon$ , the Mean Square Error (MSE) characterizes the bias–variance tradeoff:

[TABLE]

Introducing bias in exchange for reduced variance is widely used in ridge regression and in lasso techniques in the form of a priori knowledge [17]. In our robust utility learning framework, we introduce noise structures that approximate the true data process so that we can fit cFGLS estimators that are nearly unbiased for those players whose historical voting record has a large amount of variation.

We approximate the bias for each of the estimators. In Table III, we present cFGLS estimates obtained using the dynamic data during the time window in which the default lighting setting was $20$ %101010The results for the other default lighting settings are similar. for selected occupants—the most active players—as well as the approximated bias for the estimates generated by bagging, bumping, and boosting.

Figures 8 and 7 contain histograms of the cFGLS estimators obtained using the bootstrapped average and dynamic data, respectively. In each of these histograms, we also indicate the original cFGLS111111This is the cFGLS estimator produced using the original average and dynamic data sets and not the bootstrapped data sets. (indicated in red), bagging (indicated in blue), bumping (indicated in green), and boosting (indicated in orange) estimators with dashed vertical lines.

The histogram in Figure 8 contains the cFGLS estimators for occupant $2$ . This histogram is representative of the other occupants for the average data set. We see that the original cFGLS, bagging, bumping, and boosting estimators each show some amount of bias. This is largely due to the fact that the average data set has a small sample size.

On the other hand, in Figure 7a we show the histogram of cFGLS estimators for occupant $2$ produced via bootstrapped dynamic data and we can see that the original cFGLS estimator (vertical red line) is nearly unbiased, indicated by the approximate Gaussian distribution around the cFGLS estimate. This is generally true for the occupants with the most variation and frequency in their voting record. However, bagging, bumping, and boosting produce estimates that are slightly biased in exchange for a reduction in estimator variance—see (33).

Occupant $2$ is representative of players which prefer to focus on lighting satisfaction as opposed to winning whereas occupant $8$ is representative of players which prefer winning to lighting satisfaction. While a very active voter, frequently participating in the game, occupant $8$ ’s voting record has little variation (the majority of the time $x_{8}=0$ ). Figure 7b contains the cFGLS estimators for occupant $8$ and we see that each of the estimators are slightly biased. Again, these estimators introduce bias in exchange for a reduction in variance.

VII-D Forecasting via Approximated Correlated Game

We now show the results for the correlated utility learning method. Let us use the notation

[TABLE]

where recall that $\mathcal{K}_{i}\subset\mathcal{I}$ is the index set for the players whose parameters are used to modify player $i$ ’s utility function in generating the correlated game and $\hat{\theta}_{j}$ is the estimated parameter from the utility learning methods including cOLS, cFGLS, bagging, bumping, and boosting. We use the notation $\hat{g}_{i}(\cdot;\{\hat{\theta}_{j}\}_{j\in\mathcal{K}_{i}}\})$ as short-hand.

In Table IV, we show a subset of the estimated covariance matrices obtained using the dynamic and average data sets. Using these values, we construct the following correlated game. Player $2$ ’s utility function is modified by player $20$ ’s:

[TABLE]

where $\mathcal{K}_{2}=\{2,20\}$ . Player $2$ and $20$ are passive players in that their votes tend to be strongly related to their lighting satisfaction as opposed to increasing their chances of winning. They are also very active players, having a lot of variation in their voting record. These two players are positively correlated with one another (see the red cells in Table IV).

On the other hand, player $8$ and $14$ are aggressive players in that their votes tend to be much lower indicating a greater desire to win points. These players are also positively correlated (see the green cell’s in Table IV). With this in mind, we modify player $8$ ’s utility function by player $14$ ’s:

[TABLE]

where $\mathcal{K}_{8}=\{8,14\}$ .

Player $14$ is also negatively correlated with player $2$ . Hence, player $14$ ’s utility function is modified by player $2$ ’s and $8$ ’s utilities. That is, with $\mathcal{K}_{14}=\{2,8,14\}$ , we have

[TABLE]

All the other players’ utilities in the correlated game remain unchanged; that is, they are taken to be $\hat{g}_{i}=\hat{f}_{i}$ , $i\in\mathcal{I}/\{2,8,14\}$ .

These player combinations were selected since, through the correlated game, we aim to improve our estimators by leveraging correlations between players. In particular, the goal is to utilize information learned from players with the most variation in their votes in improving the estimates of players who consistently vote the same value or have a limited participation record.

In Table V, we present the RMSE, MAE, and MASE for the estimated correlated game $\{\hat{g}_{i}(\cdot;\{\hat{\theta}_{j}\}_{j\in\mathcal{K}_{i}})\}_{i\in\mathcal{I}}$ where the $\hat{\theta}_{j}$ ’s are taken to be the cOLS, bagging, boosting, and bumping estimators. Comparing these results to those in Table II, we see that correlated estimation schemes applied to the dynamic data set reduce the estimation error for almost every method. Moreover, correlated bagging outperforms bagging, the best performing ensemble method, by all three metrics. For the average data set, correlated boosting outperforms the best performing ensemble method, boosting, again by all three metrics.

In Figure 9, we show the forecast produced by the correlated utility learning method using the cOLS, bagging, bumping, and boosting estimators and the ground truth test data. Figure 9a and 9b are the forecasts for the dynamic and average data sets, respectively.

What is perhaps most interesting is that, for both data sets, the correlated cOLS results improve the forecasting error as compared to cOLS and the results are not significantly different than the other ensemble methods. This can be seen in Table V and Figure 9. The importance of this finding is that correlated cOLS has the potential to be integrated into an online algorithm. The classical cOLS can be performed online and is, thus, amenable to an online incentive design framework [3, 8]. However, as we have seen, the ensemble methods significantly outperform cOLS. Determining the estimated covariance matrix requires solving a generalized least squares (GLS) and noise covariance estimation problem [44]. Given that the estimated correlated game using cOLS parameters provides nearly the same estimation error as the ensemble methods, these methods can be adapted to estimate the correlated game parameters and then introduced into an adaptive incentive design framework. We are currently exploring this extension as the ultimate objective is to utilize the learned utilities in an incentive design framework, preferably one that can be executed in an adaptive/online manner. This will support a more robust online utility learning and incentive design algorithm.

VIII Discussion

We presented a general framework for robust utility learning using a heteroskedastic inference adaptation to cGLS and we leveraged learned correlations between players in constructing a correlated utility learning framework that matches the robust utility learning errors while also being amenable to online implementation. The latter is important for integrating the proposed utility learning techniques with adaptive control or online incentive design. For example, it has been shown that static programs for encouraging energy efficiency are subject to the rebound effect in which participants often return to less efficient behavior after some time [45, 46]. By integrating our utility learning framework with incentive design, we will be able to create an adaptive model that learns how users’ preferences change over time and thus, generate the appropriate incentives to ensure active participation.

To demonstrate the utility learning methods, we applied them to data collected from a smart building social game we conducted where occupants vote for shared resources and participate in a lottery. We were able to estimate nearly unbiased estimators for several agent profiles and significantly reduce the forecasting error as compared to cOLS. The robust utility learning framework enables us to effectively close the loop around smart building occupants by providing the foundation for learning a decision-making model that can be integrated into the incentive or control design process. While we apply the method to smart building social game data, it can be applied more generally to scenarios with the task of inverse modeling of competitive agents and provides a useful tool for many smart infrastructure applications where learning decision–making behavior is crucial.

Acknowledgment

We thank Mr. Christopher Hsu, Applications Programmer at CREST laboratory, who developed and deployed the web portal application of the social game at UC Berkeley.

Appendix A Proof of Proposition 1

Proof of Proposition 1.

Suppose the assumptions hold. The constraints for each player do not depend on other players’ choice variables. We can hold $x_{-i}^{\ast}$ fixed and apply Proposition 3.3.2 [11] to the $i$ -th player’s optimization problem $\max\left\{f_{i}(x_{i},x_{-i}^{\ast})\ |\ x_{i}\in\mathcal{C}_{i}\right\}$ . Since each $f_{i}$ is concave and each $\mathcal{C}_{i}$ is a convex set, $x_{i}^{\ast}$ is a global optimum of the $i$ -th player’s optimization problem under the assumptions. Since this is true for each of the $i\in\{1,\ldots,n\}$ players, $x^{\ast}$ is a Nash equilibrium. ∎

Bibliography46

The reference list from the paper itself. Each links out to its DOI / PubMed record.

1[1] P. Bolton and M. Dewatripont, Contract theory . MIT press, 2005.
2[2] J.-J. Laffont and D. Martimort, The Theory of Incentives: The Principal–Agent Model . Princeton University Press, 2002.
3[3] L. J. Ratliff, “Incentivizing efficiency in societal-scale cyber-physical systems,” Ph.D. dissertation, University of California, Berkeley, 2015.
4[4] O. E. P. Office, “Operational excellence program office progress report—toward a sustainable future,” University of California Berkeley, Tech. Rep., April 2015.
5[5] A. Aswani and C. Tomlin, “Incentive design for efficient building quality of service,” in Proc. 50th Annu. Allerton Conf. Communication, Control, and Computing , 2012, pp. 90–97.
6[6] M. Jin, N. Bekiaris-Liberis, K. Weekly, C. J. Spanos, and A. M. Bayen, “Occupancy detection via environmental sensing,” IEEE Transactions on Automation Science and Engineering , pp. 1–13, 2016.
7[7] M. Jin, W. Feng, P. Liu, C. Marnay, and C. Spanos, “Mod-dr: Microgrid optimal dispatch with demand response,” Applied Energy , vol. 187, pp. 758–776, 2017.
8[8] L. J. Ratliff, R. Dong, H. Ohlsson, and S. S. Sastry, “Incentive design and utility learning via energy disaggregation,” in Proc. 19th World Congress of the Int. Federation of Automatic Control , 2014.

TL;DR

Contribution

Findings

Abstract

Peer Reviews

Videos

A Robust Utility Learning Framework via Inverse Optimization

Abstract

Index Terms:

I Introduction

II Game Framework

II-A Agent Decision-Making Model

II-B Game Formulation

Definition 1**.**

Definition 2** **(Differential Nash

Proposition 1**.**

III Robust Utility Learning

III-A Base Utility Estimation Framework

III-B Robust Utility Learning

III-C Boosting with Ensemble Methods

III-C1 Bootstrapping and Bagging

III-C2 Bootstrapping and Bumping

III-C3 Gradient Boosting

IV Application to Bertrand-Nash Competition

V Correlated Utility Learning

VI Application to Smart Building Social Game

VI-A Social Game Experimental Set-Up

VI-B Brief Background

VI-C Occupant Decision-Making Model

VII Utility Learning Results

VII-A Forecasting via Robust Utility Learning

VII-B Estimated Utility Functions

VII-C Bias Approximation and Bias–Variance Tradeoff

VII-D Forecasting via Approximated Correlated Game

VIII Discussion

Acknowledgment

Appendix A Proof of Proposition 1

Proof of Proposition 1.

Definition 1.

Definition 2 (Differential Nash

Proposition 1.