Bifurcation Mechanism Design -- From Optimal Flat Taxes to Improved   Cancer Treatments

Ger Yang; Georgios Piliouras; David Basanta

arXiv:1704.08754·cs.GT·May 1, 2017

Bifurcation Mechanism Design -- From Optimal Flat Taxes to Improved Cancer Treatments

Ger Yang, Georgios Piliouras, David Basanta

PDF

Open Access

TL;DR

This paper explores how bifurcation phenomena can be harnessed to design mechanisms that improve system performance, demonstrated through flat taxes for social welfare and innovative cancer treatments.

Contribution

It introduces bifurcation-based mechanisms that leverage system instabilities to achieve better outcomes in social and medical applications.

Findings

01

Flat taxation can enhance social welfare by destabilizing inefficient equilibria.

02

Bifurcation mechanisms can lead to improved cancer treatment strategies.

03

Transient parameter changes can induce permanent beneficial system states.

Abstract

Small changes to the parameters of a system can lead to abrupt qualitative changes of its behavior, a phenomenon known as bifurcation. Such instabilities are typically considered problematic, however, we show that their power can be leveraged to design novel types of mechanisms. Hysteresis mechanisms use transient changes of system parameters to induce a permanent improvement to its performance via optimal equilibrium selection. Optimal control mechanisms induce convergence to states whose performance is better than even the best equilibrium. We apply these mechanisms in two different settings that illustrate the versatility of bifurcation mechanism design. In the first one we explore how introducing flat taxation can improve social welfare, despite decreasing agent "rationality", by destabilizing inefficient equilibria. From there we move on to consider a well known game of tumor…

Equations125

\bm{A}=\left(\begin{array}[]{cc}a_{11}&a_{12}\\ a_{21}&a_{22}\end{array}\right),\quad\bm{B}=\left(\begin{array}[]{cc}b_{11}&b_{12}\\ b_{21}&b_{22}\end{array}\right)

\bm{A}=\left(\begin{array}[]{cc}a_{11}&a_{12}\\ a_{21}&a_{22}\end{array}\right),\quad\bm{B}=\left(\begin{array}[]{cc}b_{11}&b_{12}\\ b_{21}&b_{22}\end{array}\right)

x_{N E} \in ar g x \in [0, 1] max x^{T} A y_{N E},

x_{N E} \in ar g x \in [0, 1] max x^{T} A y_{N E},

x_{QR E}

x_{QR E}

y_{QR E}

x_{QR E}

x_{QR E}

y_{QR E}

S W (x, y) = x y (a_{11} + b_{11}) + x (1 - y) (a_{12} + b_{21}) + y (1 - x) (a_{21} + b_{12}) + (1 - x) (1 - y) (a_{22} + b_{22})

S W (x, y) = x y (a_{11} + b_{11}) + x (1 - y) (a_{12} + b_{21}) + y (1 - x) (a_{21} + b_{12}) + (1 - x) (1 - y) (a_{22} + b_{22})

P o A (S) = \frac{max _{(x, y) \in [0, 1]^{2}} S W ( x , y )}{min _{(x, y) \in S} S W ( x , y )},

P o A (S) = \frac{max _{(x, y) \in [0, 1]^{2}} S W ( x , y )}{min _{(x, y) \in S} S W ( x , y )},

\displaystyle\dot{x}_{i}=x_{i}\bigg{[}(\bm{A}\bm{y})_{i}-\bm{x}^{T}\bm{A}\bm{y}+T_{x}\sum_{j}x_{j}\ln(x_{j}/x_{i})\bigg{]},

\displaystyle\dot{x}_{i}=x_{i}\bigg{[}(\bm{A}\bm{y})_{i}-\bm{x}^{T}\bm{A}\bm{y}+T_{x}\sum_{j}x_{j}\ln(x_{j}/x_{i})\bigg{]},

\bm{A}_{D}=\left(\begin{array}[]{cc}a_{11}-a_{21}&0\\ 0&a_{22}-a_{12}\end{array}\right),\quad\bm{B}_{D}=\left(\begin{array}[]{cc}b_{11}-b_{21}&0\\ 0&b_{22}-b_{12}\end{array}\right)

\bm{A}_{D}=\left(\begin{array}[]{cc}a_{11}-a_{21}&0\\ 0&a_{22}-a_{12}\end{array}\right),\quad\bm{B}_{D}=\left(\begin{array}[]{cc}b_{11}-b_{21}&0\\ 0&b_{22}-b_{12}\end{array}\right)

A=\left(\begin{array}[]{cc}10&0\\ 0&5\end{array}\right),\quad B=\left(\begin{array}[]{cc}2&0\\ 0&4\end{array}\right)

A=\left(\begin{array}[]{cc}10&0\\ 0&5\end{array}\right),\quad B=\left(\begin{array}[]{cc}2&0\\ 0&4\end{array}\right)

\bm{A}_{D}=\left(\begin{array}[]{cc}a_{X}&0\\ 0&b_{X}\end{array}\right),\quad\bm{B}_{D}=\left(\begin{array}[]{cc}a_{Y}&0\\ 0&b_{Y}\end{array}\right)

\bm{A}_{D}=\left(\begin{array}[]{cc}a_{X}&0\\ 0&b_{X}\end{array}\right),\quad\bm{B}_{D}=\left(\begin{array}[]{cc}a_{Y}&0\\ 0&b_{Y}\end{array}\right)

x = \frac{e ^{\frac{1}{T _{x}} y a_{X}}}{e ^{\frac{1}{T _{x}} y a_{X}} + e ^{\frac{1}{T _{x}} (1 - y) b_{X}}},

x = \frac{e ^{\frac{1}{T _{x}} y a_{X}}}{e ^{\frac{1}{T _{x}} y a_{X}} + e ^{\frac{1}{T _{x}} (1 - y) b_{X}}},

T_{X}^{I} (x, y) = \frac{- ( a _{X} + b _{X} ) y + b _{X}}{ln ( \frac{1}{x} - 1 )},

T_{X}^{I} (x, y) = \frac{- ( a _{X} + b _{X} ) y + b _{X}}{ln ( \frac{1}{x} - 1 )},

T_{X}^{I I} (x, T_{y})

T_{X}^{I I} (x, T_{y})

y^{I I} (x, T_{y})

T_{I} = max {0, \frac{b _{Y} - a _{Y}}{2 ln ( a _{X} / b _{X} )}}

T_{I} = max {0, \frac{b _{Y} - a _{Y}}{2 ln ( a _{X} / b _{X} )}}

y^{I I} (x, T_{y}) + x (1 - x) ln (\frac{1}{x} - 1) \frac{\partial y ^{I I}}{\partial x} (x, T_{y}) = \frac{b _{X}}{a _{X} + b _{X}}

y^{I I} (x, T_{y}) + x (1 - x) ln (\frac{1}{x} - 1) \frac{\partial y ^{I I}}{\partial x} (x, T_{y}) = \frac{b _{X}}{a _{X} + b _{X}}

A=\left(\begin{array}[]{cc}\epsilon&1\\ 0&1+\epsilon^{\prime}\end{array}\right),\quad B=\left(\begin{array}[]{cc}1+\epsilon&0\\ 1&\epsilon^{\prime}\end{array}\right)

A=\left(\begin{array}[]{cc}\epsilon&1\\ 0&1+\epsilon^{\prime}\end{array}\right),\quad B=\left(\begin{array}[]{cc}1+\epsilon&0\\ 1&\epsilon^{\prime}\end{array}\right)

P o A (QR E) \geq P o A (N E), P o S (QR E) \leq P o S (N E)

P o A (QR E) \geq P o A (N E), P o S (QR E) \leq P o S (N E)

S =

S =

\cap {{x \in [\frac{b _{Y}}{a _{Y} + b _{Y}}, 1], y \in [\frac{1}{2}, 1]} \cup {x \in [0, \frac{b _{Y}}{a _{Y} + b _{Y}}], y \in [0, \frac{1}{2}]}}

Q_{(s_{t}, a_{t + 1})} (t + 1) = (1 - α) Q_{(s_{t}, a_{t + 1})} (t) + α (r (t + 1) + γ a^{'} max Q_{(s_{t + 1}, a^{'})} (t))

Q_{(s_{t}, a_{t + 1})} (t + 1) = (1 - α) Q_{(s_{t}, a_{t + 1})} (t) + α (r (t + 1) + γ a^{'} max Q_{(s_{t + 1}, a^{'})} (t))

Q_{a}^{i} (t + 1) = Q_{a}^{i} (t) + α [r_{a}^{i} (t) - Q_{a}^{i} (t)]

Q_{a}^{i} (t + 1) = Q_{a}^{i} (t) + α [r_{a}^{i} (t) - Q_{a}^{i} (t)]

x_{a}^{i} (t) = \frac{e ^{Q_{a}^{i} (t) / T_{i}}}{\sum _{a^{'}} e ^{Q_{a^{'}}^{i} (t) / T_{i}}}

x_{a}^{i} (t) = \frac{e ^{Q_{a}^{i} (t) / T_{i}}}{\sum _{a^{'}} e ^{Q_{a^{'}}^{i} (t) / T_{i}}}

\dot{Q}_{a}^{x} (t)

\dot{Q}_{a}^{x} (t)

\dot{Q}_{a}^{y} (t)

\overset{x}{˙}_{i}

\overset{x}{˙}_{i}

\overset{y}{˙}_{i}

\overset{x}{˙}_{i}

\overset{x}{˙}_{i}

\overset{y}{˙}_{i}

H (x, y) = H (x) + H (y) = - i \sum x_{i} ln x_{i} - i \sum y_{i} ln y_{i}

H (x, y) = H (x) + H (y) = - i \sum x_{i} ln x_{i} - i \sum y_{i} ln y_{i}

\dot{H} (x, y) > 0

\dot{H} (x, y) > 0

\dot{x}_{i}=x_{i}T_{x}\bigg{[}-\ln x_{i}+\sum_{j}x_{j}\ln x_{j}\bigg{]}

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsEvolution and Genetic Dynamics · Mathematical Biology Tumor Growth · Gene Regulatory Network Analysis

Full text

Bifurcation Mechanism Design – From Optimal Flat Taxes to Improved Cancer Treatments

Ger Yang

University of Texas at Austin

Department of Electrical and Computer Engineering

Georgios Piliouras

Singapore University of Technology and Design

Engineering Systems and Design (ESD)

David Basanta

Integrated Mathematical Oncology

H. Lee Moffitt Cancer Center and Research Institute

Abstract

Small changes to the parameters of a system can lead to abrupt qualitative changes of its behavior, a phenomenon known as bifurcation. Such instabilities are typically considered problematic, however, we show that their power can be leveraged to design novel types of mechanisms. Hysteresis mechanisms use transient changes of system parameters to induce a permanent improvement to its performance via optimal equilibrium selection. Optimal control mechanisms induce convergence to states whose performance is better than even the best equilibrium. We apply these mechanisms in two different settings that illustrate the versatility of bifurcation mechanism design. In the first one we explore how introducing flat taxation can improve social welfare, despite decreasing agent “rationality”, by destabilizing inefficient equilibria. From there we move on to consider a well known game of tumor metabolism and use our approach to derive novel cancer treatment strategies.

This work is supported by the National Science Foundation, under grant CNS-0435060, grant CCR-0325197 and grant EN-CS-0329609.

1 Introduction

The term bifurcation, which means splitting in two, is used to describe abrupt qualitative changes in system behavior due to smooth variation of its parameters. Bifurcations are ubiquitous and permeate all natural phenomena. Effectively, they produce discrete events (e.g., rain breaking out) out of smoothly varying, continuous systems (e.g., small changes to humidity, temperature). Typically, they are studied through bifurcation diagrams, multi-valued maps that prescribe how each parameter configuration translates to possible system behaviors (e.g., Figure 1).

Bifurcations arise in a natural way in game theory. Games are typically studied through their Nash correspondences, a multi-valued map connecting the parameters of the game (i.e., payoff matrices) to system behavior, in this case Nash equilibria. As we slowly vary the parameters of the game typically the Nash equilibria will also vary smoothly, except at bifurcation points where, for example, the number of equilibria abruptly changes as some equilibria appear/disappear altogether. Such singularities may a have huge impact both on system behavior and system performance. For example, if the system state was at an equilibrium that disappeared during the bifurcation, then a turbulent transitionary period ensues where the system tries to reorganize itself at one of the remaining equilibria. Moreover, the quality of all remaining equilibria may be significantly worse than the original. Even more disturbingly, it is not a-priori clear that the system will equilibrate at all. Successive bifurcations that lead to increasingly more complicated recurrent behavior is a standard route to chaos Devaney (1992), which may have devastating effects to system performance.

Game theorists are particularly aware of the need to produce “robust” predictions that are not inherently bound to a specific, exact instantiation of the payoff parameters of the game Roughgarden (2009). The typical way to approach this problem has been to focus on more expansive solution concepts, e.g., $\epsilon$ -approximate Nash equilibria or even outcomes approximately consistent to regret-minimizing learning. These approaches, however, do not really address the problem at its core as any solution concept defines a map from parameter space to behavioral space and no such map is immune to bifurcations. If pushed hard enough any system will destabilize. The question is what happens next?

Well, a lot of things may happen. It is intuitively clear that if we are allowed to play around arbitrarily with the payoffs of the agents then we can reproduce any game and no meaningful analysis is possible. Using payoff entries as controlling parameters is problematic for another reason. It is not clear that there exists a compelling parametrization of the payoff space that captures how real life decision makers deviate from the Platonic ideal of the payoff matrix. Instead, we focus on another popular aspect of economic theory, agent “rationality”.

We adopt a standard model of boundedly rational learning agents. Boltzmann Q-learning dynamics Watkins (1989); Watkins and Dayan (1992); Tan (1993) is a well studied behavioral model in which agents are parameterized by a temperature/rationality term $T$ . Each agent keeps track of the collective past performance of his actions (i.e., learns from experience) and chooses an action according to a Boltzmann/Gibbs distribution with parameter $T$ . When applied to a multi-agent game the behavioral fixed points of Q-learning are known as quantal response equilibria (QRE) McKelvey and Palfrey (1995). Naturally, QREs depend on the temperature $T$ . As $T\rightarrow 0$ players become perfectly rational and play approaches a Nash equilibrium,111Mixed strategies in the QRE model are sometimes interpreted as frequency distributions of deterministic actions in a large population of users. This population interpretation of mixed strategies is standard and dates back to Nash Nash (1950). Depending on context, we will use either the probabilistic interpretation or the population one. whereas as $T\rightarrow\infty$ all agents use uniformly random strategies. As we vary the temperature the QRE( $T$ ) correspondence moves between these two extremes producing bifurcations along the way at critical points where the number of QREs changes (Figure 1).

Our goal in this paper is to quantify the effects of these rationality-driven bifurcations to the social welfare of two player two strategy games. At this point a moment of pause is warranted. Why is this a worthy goal? Games of small size ( $2\times 2$ games in particular) hardly seem like a subject worthy of serious scientific investigation. This, however, could not be further from the truth.

First, the correct way to interpret this setting is from the point of population games where each agent is better understood as a large homogeneous population (e.g. men and women, attackers and defenders, cells of type A and cells of type B). Each of a handful of different types of users has only a few meaningful actions available to them. In fact, from the perspective of applied game theory only such games with a small number of parameters are practically meaningful. The reason should be clear by now. Any game theoretic modeling of a real life scenario is invariably noisy and inaccurate. In order for game-theoretic predictions to be practically binding they have to be robust to these uncertainties. If the system intrinsically has a large number of independent parameters e.g., 20, then this parameter space will almost certainly encode a vast number of bifurcations, which invalidate any theoretical prediction. Practically useful models need to be small.

Secondly, game theoretic models applied for scientific purposes typically are small. Specifically, the exact setting studied here with Boltzmann Q-learning dynamics applied in $2\times 2$ games has been used to model the effects of taxation to agent rationality Wolpert et al. (2012) (see Section 6.1 for a more extensive discussion) as well as to model the effects of treatments that trigger phase transitions to cancer dynamics Kianercy et al. (2014) (see Section 6.2). Our approach yields insights to explicit open questions in both of these applications areas. In fact, direct application of our analysis can address similar inquiries for any other phenomenon modeled by Q-learning dynamics applied in $2\times 2$ games.

Finally, the analysis itself is far from straightforward as it requires combining sets of tools and techniques that have so far been developed in isolation from each other. On one hand, we need to understand the behavior of these dynamical systems using tools from topology of dynamical systems whose implications are largely qualitative (e.g. prove the lack of cyclic trajectories). On the other hand, we need to leverage these tools to quantify at which exact parameter values bifurcations occur and produce price-of-anarchy type of guarantees which by definition are quantitative. As far as we know, this is the first instance of a fruitful combination of these tools. In fact, not only do we show how to analyze the effects of bifurcations to system efficiency, we also show how to leverage this understanding (e.g. knowledge of the geometry of the bifurcation diagrams) to design novel types of mechanisms with good performance guarantees.

Our contribution.

We introduce two different types of mechanisms, hysteresis and optimal control mechanisms.

Hysteresis mechanisms use transient changes to the system parameters to induce permanent improvements to its performance via optimal (Nash) equilibrium selection. The term hysteresis is derived from an ancient Greek word that means “to lag behind”. It reflects a time-based dependence between the system’s present output and its past inputs. For example, let’s assume that we start from a game theoretic system of Q-learning agents with temperature $T=0$ and assume that the system has converged to an equilibrium. By increasing the temperature beyond some critical threshold and then bringing it back to zero, we can force the system to provably converge to another equilibrium, e.g., the best (Nash) equilibrium (Figure 1, Theorem 4). Thus, we can ensure performance equivalent to that of the price of stability instead of the price of anarchy. One attractive feature of this mechanism is that from the perspective of the central designer it is rather “cheap” to implement. Whereas typical mechanisms require the designer to continuously intervene by (e.g., by paying the agents) to offset their greedy tendencies this mechanism is transient with a finite amount of total effort from the perspective of the designer. Further, the idea that game theoretic systems have effectively systemic memory is rather interesting and could find other applications within algorithmic game theory.

Optimal control mechanisms induce convergence to states whose performance is better than even the best Nash equilibrium. Thus, we can at times even beat the price of stability (Theorem 5). Specifically, we show that by controlling the exploration/exploitation tradeoff we can achieve strictly better states than those achievable by perfectly rational agents. In order to implement such a mechanism it does not suffice to identify the right set of agents’ parameters/temperatures so that the system has some QRE whose social welfare is better than the best Nash. We need to design a trajectory through the parameter space so that this optimal QRE becomes the final resting point.

2 Preliminaries

2.1 Game Theory Basics: $2\times 2$ games

In this paper, we focus on $2\times 2$ games. We define it as a game with two players, and each player has two actions. We write the payoff matrices of the game for each player as

[TABLE]

respectively. The entry $a_{ij}$ denotes the payoff for Player $1$ when he chooses action $i$ and his opponent chooses action $j$ ; similarly, $b_{ij}$ denotes the payoff for Player $2$ when he chooses action $i$ and his opponent chooses action $j$ . We define $x$ as the probability that the Player $1$ chooses his first action, and $y$ as the probability that Player $2$ chooses his first action. We also define two row vectors $\bm{x}=(x,1-x)^{T}$ and $\bm{y}=(y,1-y)^{T}$ as the strategy for each player. For simplicity, we denote the $i$ -th entry of vector $\bm{x}$ by $x_{i}$ . We call the tuple $(x,y)$ as the system state or the strategy profile.

An important solution concept in game theory is the Nash equilibrium, where each user cannot make profit by unilaterally changing his strategy, that is:

Definition 1 (Nash equilibrium).

A strategy profile $(x_{NE},y_{NE})$ is a Nash equilibrium (NE) if

[TABLE]

We call $(x_{NE},y_{NE})$ a pure Nash equilibrium (PNE) if both $x_{NE}\in\{0,1\}$ and $y_{NE}\in\{0,1\}$ . Nash equilibrium assumes each user is fully rational. However, in real world, this assumption is impractical. An alternative solution concept is the quantal response equilibrium McKelvey and Palfrey (1995), where it assumes that each user has bounded rationality:

Definition 2 (Quantal response equilibrium).

A strategy profile $(x_{QRE},y_{QRE})$ is a Quantal response equilibrium (QRE) with respect to temperature $T_{x}$ and $T_{y}$ if

[TABLE]

Analogous to the definition of Nash equilibria, we can consider the QREs as the case that each player is not only maximizing the expected utility but also maximizing the entropy. We can see that the QREs are the solutions to maximizing the linear combination of the following program:

[TABLE]

This formulation has been widely seen in Q-learning dynamics literature (e.g Cominetti et al. (2010); Wolpert et al. (2012); Coucheney et al. (2013)). With this formulation, we can find that the two parameters $T_{x}$ and $T_{y}$ controls the weighting between the utility and the entropy. We call $T_{x}$ and $T_{y}$ the temperatures, and their value defines the level of irrationality. If $T_{x}$ and $T_{y}$ are zero, then both players are fully rational, and the system state is a Nash equilibrium. However, if both $T_{x}$ and $T_{y}$ are infinity, then each player is choosing his action according to a uniform distribution, which corresponds to the fully irrational players.

2.2 Efficiency of an equilibrium

The performance of a system state can be measured via the social welfare. Given a system state $(x,y)$ , we define the social welfare as the sum of the expected payoff of all users in the system:

Definition 3.

Given $2\times 2$ game with payoff matrices $\bm{A}$ and $\bm{B}$ , and a system state $(x,y)$ , the social welfare is defined as

[TABLE]

In the context of algorithmic game theory, we can measure the efficiency of a game by comparing the social welfare of a equilibrium system state with the best social welfare. We call the strategy profile that achieves the maximal social welfare as the socially optimal (SO) strategy profile. The efficiency of a game is often described as the notion of price of anarchy (PoA) and price of stability (PoS). They are defined as

Definition 4.

Given $2\times 2$ game with payoff matrices $A$ and $B$ , and a set of equilibrium system states $S\subseteq[0,1]^{2}$ , the price of anarchy (PoA) and the price of stability (PoS) are defined as

[TABLE]

3 Our Model

3.1 Q-learning Dynamics

In this paper, we are particularly interested in the scenario when both players’ strategies are evolving under Q-learning dynamics:

[TABLE]

The Q-learning dynamics has been studied because of its connection with multi-agent learning problems. For example, it has been shown in Sato and Crutchfield (2003); Tuyls et al. (2003) that the Q-learning dynamics captures the system evolution of a repeated game, where each player learns his strategy through Q-learning and Boltzmann selection rules. More details are provided in Appendix A.

An important observation on the dynamics (2) is that it demonstrates the exploration/ exploitation tradeoff Tuyls et al. (2003). We can find that the right hand side of equation (2) is composed of two parts. The first part $x_{i}[(\bm{A}\bm{y})_{i}-\bm{x}^{T}\bm{A}\bm{y}]$ is exactly the vector field of replicator dynamic Sandholm (2009). Basically, the replicator dynamics drives the system to the state of higher utility for both players. As a result, we can consider this as a selection process in terms of population evolutionary, or an exploitation process from the perspective of a learning agent. Then, for the second part $x_{i}[T_{x}\sum_{j}x_{j}\ln(x_{j}/x_{i})],$ we show in the appendix that if the time derivative of $\bm{x}$ contains this part alone, this results in the increase of the system entropy.

The system entropy is a function that captures the randomness of the system. From the population evolutionary perspective, the system entropy corresponds to the variety of the population. As a result, this term can be considered as the mutation process. The level of the mutation is controlled by the temperature parameters $T_{x}$ and $T_{y}$ . Besides, in terms of the reinforcement learning, this term can be considered as an exploration process, as it provides the opportunity for the agent to gain information about the action that does not look the best so far.

3.2 Convergence of the Q-learning dynamics

By observing the Q-learning dynamics (2), we can find that the interior rest points for the dynamics are exactly the QREs of the $2\times 2$ game. It is claimed in Kianercy and Galstyan (2012) without proof that the Q-learning dynamics for a $2\times 2$ game converges to interior rest points of probability simplexes for any positive temperature $T_{x}>0$ and $T_{y}>0$ . We provide a formal proof in Appendix B. The idea is that for positive temperature, the system is dissipative and by leveraging the planar nature of the system it can be argued that it converges to fixed points.

3.3 Rescaling the Payoff Matrix

At the end of this section, we discuss the transformation of the payoff matrices that preserves the dynamics in (2). This idea is proposed in Hofbauer and Hopkins (2005) and Hofbauer and Sigmund (1998), where the rescaling of a matrix is defined as follows

Definition 5 (Hofbauer and Sigmund (1998)).

$\bm{A}^{\prime}$ * and $\bm{B}^{\prime}$ is said to be a rescaling of $\bm{A}$ and $\bm{B}$ if there exist constants $c_{j},d_{i}$ , and $\alpha>0$ , $\beta>0$ such that $a_{ij}^{\prime}=\alpha a_{ij}+c_{j}$ and $b_{ji}^{\prime}=\beta b_{ji}+d_{i}$ .*

It is clear that rescaling the game payoff matrices is equivalent to updating the temperature parameters of the two agents in (2). So, wlog it suffices to study the dynamics under the assumption that the $2\times 2$ payoff matrices $\bm{A}$ and $\bm{B}$ are in the following diagonal form.

Definition 6.

Given $2\times 2$ matrices $\bm{A}$ and $\bm{B}$ , their diagonal form is defined as

[TABLE]

Note that although rescaling the payoff matrices to their diagonal form preserves the equilibria, it does not preserves the social optimality, i.e. the socially optimal strategy profile in the transformed game is not necessary the socially optimal strategy profile in the original game.

4 Hysteresis Effect and Bifurcation Analysis

4.1 Hysteresis effect in Q-learning dynamics: An example

We begin our discussion with an example:

Example 1 (Hysteresis effect).

Consider a $2\times 2$ game with reward matrices

[TABLE]

There are two PNEs in this game, $(x,y)=(0,0)$ and $(1,1)$ . By fixing some $T_{y}$ , we can plot different QREs with respect to $T_{x}$ as in Figure 3 and Figure 3. For simplicity, we only show the value of $x$ in the figure, since according to (4), given $x$ and $T_{y}$ , the value of $y$ can be uniquely determined. Assuming the system follows the Q-learning dynamics, as we slowly vary $T_{x}$ , $x$ tends to stay on the line segment that is the closest to where it was originally corresponding to a stable but inefficient fixed point. We consider the following process:

The initial state is $(0.05,0.14)$ , where $T_{x}\approx 1$ and $T_{y}\approx 2$ . We plot $x$ versus $T_{x}$ by fixing $T_{y}=2$ in Figure 3. 2. 2.

Fix $T_{y}=2$ , and increase $T_{x}$ to where there is only one QRE correspondence. 3. 3.

Fix $T_{y}=2$ , and decrease $T_{x}$ back to $1$ . Now $x\approx 0.997$ .

In the above example, we can find that although at the end the temperature parameters are set back to their initial value, the system state ends up to be a totally different equilibrium. This behavior is known as the hysteresis effect. In this section, we would like to answer the question when this is going to happen. Further, in the next section, we will answer how can we take advantage of this phenomenon.

4.2 Characterizing QREs

We consider the bifurcation diagrams for QREs in $2\times 2$ games. Without loss of generality, we consider a properly rescaled $2\times 2$ game with payoff matrices in the diagonal form:

[TABLE]

Also, we can assume the action indices are ordered properly and rescaled properly so that $a_{X}>0$ and $|a_{X}|\geq|b_{X}|$ . For simplicity, we assume $a_{X}=b_{X}$ and $b_{X}=b_{Y}$ do not hold at the same time. At QRE, we have

[TABLE]

Given $T_{x}$ and $T_{y}$ , there could be multiple solutions to (4). However, we find that if we know the equilibrium states, then we can recover the temperature parameters. We solve for $T_{x}$ and $T_{y}$ in (4), and then we get

[TABLE]

We call this the first form of representation, where $T_{x}$ and $T_{y}$ are written as functions of $x$ and $y$ . Here the capital subscripts for $T_{X}$ and $T_{Y}$ indicate that they are considered as functions. A direct observation to (5) is that both of them are continuous function over $(0,1)\times(0,1)$ except for $x=1/2$ and $y=1/2$ .

An alternative way to describe the QRE is to write $T_{x}$ and $y$ as a function of $x$ and parameterize with respect to $T_{y}$ in the following second form of representation. This will be the form that we use to prove many useful characteristics of QREs.

[TABLE]

In this way, if we are given $T_{y}$ , we are able to analyze how $T_{x}$ changes with $x$ . This helps us understand how to answer the question of what are the QREs given $T_{x}$ and $T_{x}$ in the system.

We also want to analyze the stability of the QREs. From dynamical system theory (e.g. Perko (1991)), a fixed point of a dynamical system is said to be asymptotically stable if all of the eigenvalues of its Jacobian matrix has negative real part; if it has at least one eigenvalue with positive real part, then it is unstable. It turns out that under the second form representation, we are able to determine whether a segment in the diagram is stable or not.

Lemma 1.

Given $T_{y}$ , the system state $\left(x,y^{II}(x,T_{y})\right)$ is a stable equilibrium if and only if

$\frac{\partial T_{X}^{II}}{\partial x}(x,T_{Y})>0$ * if $x\in(0,1/2)$ .* 2. 2.

$\frac{\partial T_{X}^{II}}{\partial x}(x,T_{Y})<0$ * if $x\in(1/2,1)$ .*

Proof.

The given condition is equivalent to the case that both eigenvalues of the Jacobian matrix of the dynamics (2) are negative. ∎

Finally, we define the principal branch. In Example 1, we call the branch on $x\in(0.5,1)$ the principal branch given $T_{y}=2$ , since for any $T_{x}>0$ , there is some $x\in(0.5,1)$ such that $T_{X}^{II}(x,T_{y})=T_{x}$ . Analogously, we can define it formally as in the following definition with the help of the second form representation.

Definition 7.

Given $T_{y}$ , the region $(a,b)\subset(0,1)$ contains the principal branch of QRE correspondence if it satisfies the following conditions:

$T_{X}^{II}(x,T_{y})$ * is continuous and differentiable for $x\in(a,b)$ .* 2. 2.

$T_{X}^{II}(x,T_{y})>0$ * for $x\in(a,b)$ .* 3. 3.

For any $T_{x}>0$ , there exists $x\in(a,b)$ such that $T_{X}^{II}(x,T_{y})=T_{x}$ .

Further, for a region $(a,b)$ that contains the principal branch, $x\in(a,b)$ is on the principal branch if it satisfies the following conditions:

The equilibrium state $(x,y^{II}(x,T_{y}))$ is stable. 2. 2.

There is no $x^{\prime}\in(a,b),x^{\prime}<x$ such that $T_{X}^{II}(x^{\prime},T_{y})=T_{X}^{II}(x,T_{y})$ .

4.3 Coordination Games

We begin our analysis with the class of coordination games, where we have all $a_{X}$ , $b_{X}$ , $a_{Y}$ , and $b_{Y}$ positive. Also, without loss of generality, we assume $a_{X}\geq b_{X}$ . In this case, there are no dominant strategy for both players, and there are two PNEs.

Let us revisit Example 1, we can make the following observations from Figure 3 and Figure 3:

Given $T_{y}$ , there are three branches. One is the principal branch, while the other two appear in pairs and occur only when $T_{x}$ is less than some value. 2. 2.

For small $T_{y}$ , the principal branch goes toward $x=0$ ; while for large $T_{y}$ , the principal branch goes toward $x=1$ .

Now, we are going to show that these observations are generally true in coordination games. The proofs in this section are deferred to Appendix D, where we will give a detailed discussion on the proving techniques.

The first idea we are going to introduce is the inverting temperature, which is the threshold of $T_{y}$ in Observation (2). We define it as

[TABLE]

We note that $T_{I}$ is positive only if $b_{Y}>a_{Y}$ , which is the case that two players have different preferences. When $T_{y}<T_{I}$ , as the first player increases his rationality from fully irrational, i.e. $T_{x}$ decreases from infinity, he is likely to be influenced by the second player’s preference. If $T_{y}$ is greater than $T_{I}$ , then the first player prefers to follow his own preference, making the principal branch goes toward $x=1$ . We formalize this idea in the following theorem:

Theorem 1 (Direction of the principal branch).

Given a $2\times 2$ coordination game, and given $T_{y}$ , the following statements are true:

If $T_{y}>T_{I}$ , then $(0.5,1)$ contains the principal branch. 2. 2.

If $T_{y}<T_{I}$ , then $(0,0.5)$ contains the principal branch.

The second idea is the critical temperature, denoted as $T_{C}(T_{y})$ , which is a function of $T_{y}$ . The critical temperature is defined as the infimum of $T_{x}$ such that for any $T_{x}>T_{C}(T_{y})$ , there is a unique QRE correspondence under $(T_{x},T_{y})$ . Generally, there is no close form for the critical temperature. However, we can still compute it efficiently, as we show it in Theorem 2. Besides, another interesting value of $T_{y}$ we should be noticed is $T_{B}=\frac{b_{Y}}{\ln(a_{X}/b_{X})}$ , which is the maximum value of $T_{y}$ that QREs not on the principal branch are presenting. Intuitively, as $T_{y}$ goes beyond $T_{B}$ , the first player ignores the decision of the second player and turn his face to what he think is better. We formalize the idea of $T_{C}$ and $T_{B}$ in the following theorem:

Theorem 2 (Properties about the second QRE).

Given a $2\times 2$ coordination game, and given $T_{y}$ , the following statements are true:

For almost every $T_{x}>0$ , all QREs not lying on the principal branch appear in pairs. 2. 2.

If $T_{y}>T_{B}$ , then there is no QRE correspondence in $x\in(0,0.5)$ . 3. 3.

If $T_{y}>T_{I}$ , then there is no QRE correspondence for $T_{x}>T_{C}(T_{y})$ in $x\in(0,0.5)$ . 4. 4.

If $T_{y}<T_{I}$ , then there is no QRE correspondence for $T_{x}>T_{C}(T_{y})$ in $x\in(0.5,1)$ . 5. 5.

$T_{C}(T_{y})$ * is given as $T_{X}^{II}(x_{L},T_{y})$ , where $x_{L}$ is the solution to the equality*

[TABLE] 6. 6.

$x_{L}$ * can be found using binary search.*

The next aspect of the QRE correspondence is their stability. According to Lemma 1, the stability of the QREs can also be inspected with the advantage of the second form representation by analyzing $\frac{\partial T_{X}^{II}}{\partial x}$ . We state the results in the following theorem:

Theorem 3 (Stability).

Given a $2\times 2$ coordination game, and given $T_{y}$ , the following statements are true:

If $a_{Y}\geq b_{Y}$ , then the principal branch is continuous. 2. 2.

If $T_{y}<T_{I}$ , then the principal branch is continuous. 3. 3.

If $T_{y}>T_{I}$ and $a_{Y}<b_{Y}$ , then the principal branch may not be continuous. 4. 4.

Fix $T_{x}$ , for the pairs of QREs not lying on the principal branch, the one of less distance to $x=0.5$ is unstable, while the other one is stable.

Note that part 3 in Theorem 3 infers that there is potentially an unstable segment between segments of the principal branch. This phenomenon is illustrated in Figure 5. Though this case is weaker than other cases, this does not hinder us from designing a controlling mechanism as we are going to do in Section 5.3.

4.4 Non-coordination games

Due to space constraint, the analysis for non-coordination games is deferred to Appendix C.

5 Mechanism Design

5.1 Hysteresis Mechanism: Select the Best Nash Equilibrium via QRE Dynamics

In this section, we consider the class of coordination games, and when the socially optimal state is one of the PNEs. The main task for us in this case is to determine when and how we can get to the socially optimal PNE. In Example 1, by sequentially changing $T_{x}$ , we move the equilibrium state from around $(0,0)$ to around $(1,1)$ , which is the social optimum state. We formalize this idea as the hysteresis mechanism and present it in Theorem 4. The hysteresis mechanism mainly takes advantage of the hysteresis effect we have discussed in Section 4, that we use transient changes of system parameters to induce permanent improvement to system performance via optimal equilibrium selection.

Theorem 4 (Hysteresis Mechanism).

Given a $2\times 2$ game, if it satisfies the following property:

Its diagonal form satisfies $a_{X},b_{X},a_{Y},b_{Y}>0$ . 2. 2.

Exactly one of its pure Nash equilibrium is the socially optimal state.

*Without loss of generality, we can assume $a_{X}\geq b_{X}$ . Then, there is a mechanism to control the system to the social optimum by sequentially changing $T_{x}$ and $T_{y}$ if 1) $a_{Y}\geq b_{Y}$ and 2) the socially optimal state is $(0,0)$ do not hold at the same time. *

Proof.

First, note that if $a_{Y}\geq b_{Y}$ , by Theorem 1 the principal branch is always in the region $x>0.5$ . As a result, once $T_{y}$ is increased beyond the critical temperature, the system state will no longer return to $x<0.5$ at any positive temperature. Therefore, $(0,0)$ cannot be approached from any state in $x>0.5$ through the QRE dynamics.

On the other hand, if $a_{Y}\geq b_{Y}$ and the socially optimal state is the PNE $(1,1)$ , then we can approach that state by first getting onto the principal branch. The mechanism can be described as

(C1)
(a)

Raise $T_{x}$ to some value above the critical temperature $T_{C}(T_{y})$ . 2. (b)

Reduce $T_{x}$ and $T_{y}$ to [math].

Though in this case, the initial choice of $T_{y}$ does not affect the result, if the social designer is taking the costs from assigning large $T_{x}$ and $T_{y}$ into account, he is going to trade off between $T_{C}$ and $T_{y}$ since typically smaller $T_{y}$ induces larger $T_{C}$ .

Next, consider $a_{Y}<b_{Y}$ . If we are aiming for state $(0,0)$ , then we can do the following:

(D1)
(a)

Keep $T_{y}$ at some value below $T_{I}=\frac{b_{Y}-a_{Y}}{2\ln(a_{X}/b_{X})}$ . Now the principal branch is at $(0,0.5)$ . 2. (b)

Raise $T_{x}$ to some value above the critical temperature $T_{C}(T_{y})$ . 3. (c)

Reduce $T_{x}$ to [math]. 4. (d)

Reduce $T_{y}$ to [math].

On the other hand, if we are aiming for state $(1,1)$ , then the following procedure suffices:

(D2)
(a)

Keep $T_{y}$ at some value above $T_{I}=\frac{b_{Y}-a_{Y}}{2\ln(a_{X}/b_{X})}$ . Now the principal branch is at $(0.5,1)$ . 2. (b)

Raise $T_{x}$ to some value above the critical temperature $T_{C}(T_{y})$ . 3. (c)

Reduce $T_{x}$ to [math]. 4. (d)

Reduce $T_{y}$ to [math].

Note that in the last two steps only by reducing $T_{y}$ after $T_{x}$ keeps the state around $x=1$ . We recommend the reader to refer to Figure 12 for case (D1), and Figure 12 for case (D2) for more insights. ∎

5.2 Efficiency of QREs: An example

A question that arises with the solution concept of QRE is does QRE improves social welfare? Here we show that the answer is yes. We begin with an example to illustrate:

Example 2.

Consider a standard coordination game with the payoff matrices of the form

[TABLE]

where $\epsilon>\epsilon^{\prime}>0$ are some small numbers. Note that in this game, there are two PNEs $(x,y)=(1,1)$ and $(x,y)=(0,0)$ , with social welfare $1+2\epsilon$ and $1+2\epsilon^{\prime}$ , respectively. We can see that for small $\epsilon$ and $\epsilon^{\prime}$ , the socially optimal state is $(x,y)=(1,0)$ , with social welfare $2$ . In this case, the state $(x,y)=(1,1)$ is the PNE with the best social welfare. However, we are able to achieve the state with the better social welfare than any NE through QRE dynamics. We illustrate the social welfare of the QREs with different temperatures of this example in Figure 6. In this figure, we can see that at PNE, which is the point $T_{x}=T_{y}=0$ , the social welfare is $1+2\epsilon$ . However, we are able to increase the social welfare by increasing $T_{y}$ . We will show in Section 5.3 a general algorithm to find the particular temperature, as well as a mechanism, which we refer to it as the optimal control mechanism, that drives the system to the desired state.

5.3 Optimal Control Mechanism: Better Equilibrium with Irrationality

Here, we show a general approach to improve the PoS bound for coordination games from Nash equilibria by QREs and Q-learning dynamics. We denote $QRE(T_{x},T_{y})$ as the set of QREs with respect to $T_{x}$ and $T_{y}$ . Further, denote $QRE$ as the set of the union of $QRE(T_{x},T_{y})$ over all positive $T_{x}$ and $T_{y}$ . Also, denote the set of pure Nash equilibria system states as $NE$ . Since the set $NE$ is the limit of $QRE(T_{x},T_{y})$ as $T_{x}$ and $T_{y}$ approach zero, we have the bounds:

[TABLE]

Then, we define QRE achievable states:

Definition 8.

A state $(x,y)\in[0,1]^{2}$ is a QRE achievable state if for every $\epsilon>0$ , there exist positive finite $T_{x}$ and $T_{y}$ and $(x^{\prime},y^{\prime})$ such that $|(x^{\prime},y^{\prime})-(x,y)|<\epsilon$ and $(x^{\prime},y^{\prime})\in QRE(T_{x},T_{y})$ .

Note that with this definition, pure Nash equilibria are QRE achievable states. However, the socially optimal states do not necessary to be QRE achievable. For example, we illustrate in Figure 8 the set of QRE achievable states for Example 2. We can find that the socially optimal state, $(x,y)=(1,0)$ , is not QRE achievable. Nevertheless, it is easy to see from Figure 8 and Figure 8 that we can achieve a higher social welfare at $(x,y)=(1,0.5)$ , which is a QRE achievable state. Formally, we can describe the set of QRE achievable states as the positive support of $T_{X}^{I}$ and $T_{Y}^{I}$ :

[TABLE]

An example for the region of a game with $a_{Y}\geq b_{Y}$ is illustrated in Figure 8. For the case $a_{Y}<b_{Y}$ , we demonstrate it in Figure 10.

In the following theorem, we propose the optimal control mechanism for a general process to achieve an equilibrium that is better than the PoS bound from Nash equilibria.

Theorem 5 (Optimal Control Mechanism).

Given a $2\times 2$ game, if it satisfies the following property:

Its diagonal form satisfies $a_{X},b_{X},a_{Y},b_{Y}>0$ . 2. 2.

None of its pure Nash equilibrium is the socially optimal state.

Without loss of generality, we can assume $a_{X}\geq b_{X}$ . Then,

there is a stable QRE achievable state whose social welfare is better than any Nash equilibrium. 2. 2.

there is a mechanism to control the system to this state from the best Nash equilibrium by sequentially changing $T_{x}$ and $T_{y}$ .

Proof.

Note that given those properties, there are two PNEs $(0,0)$ and $(1,1)$ . Since we know neither of them is social optimum, the socially optimal state must lies on either $(0,1)$ or $(1,0)$ .

First, consider $a_{Y}\geq b_{Y}$ . In this case, we know from Theorem 3 that all $x\in(0.5,1)$ states belong to a principal branch for some $T_{y}>0$ and are stable. While for $x<0.5$ , not all of them are stable. We illustrate the region of stable QRE achievable states in Figure 10. By Theorem 2 and Theorem 3, we can infer that the states near the border $x=0$ are stable. As a result, we can claim that the following states are what we are aiming for:

(A1)

If $(1,1)$ is the best NE and $(0,1)$ is the SO state, then we select $(0.5,1)$ . 2. (A2)

If $(1,1)$ is the best NE and $(1,0)$ is the SO state, then we select $(1,0.5)$ . 3. (A3)

If $(0,0)$ is the best NE and $(0,1)$ is the SO state, then we select $\left(0,\frac{b_{X}}{a_{X}+b_{X}}\right)$ . 4. (A4)

If $(0,0)$ is the best NE and $(1,0)$ is the SO state, then we select $\left(\frac{b_{Y}}{a_{Y}+b_{Y}},0\right)$ .

It is clear that these choices of states makes improvements on the social welfare. It is known that for the class of games we are considering, the price of stability is no greater than $2$ . In fact, in case A1 and A2, we reduce this factor to $4/3$ . Also in case A3 and A4, we reduce this factor to $\left(\frac{1}{2}+\frac{b_{X}/2}{a_{X}+b_{X}}\right)^{-1}$ .

The next step is to show the mechanism to drive the system to the desired state. Due to symmetry, we only discuss case A1 and A3, where case A2 and case A4 can be done analogously. For case A1, the state corresponds to the temperature $T_{x}\rightarrow\infty$ and $T_{y}\rightarrow 0$ . For any small $\delta>0$ , we can always find the state $(0.5+\delta,1-\delta)$ on the principal branch of some $T_{y}$ . This means that we can achieve this state from any initial state, not only from the NEs. With the help of the first form representation of the QREs in (5), given any QRE achievable system state $(x,y)$ , we are able to recover them to corresponding temperatures through $T_{X}^{I}$ and $T_{Y}^{I}$ . The mechanism can be described as follows:

(A1)
(a)

From any initial state, raise $T_{x}$ to $T_{X}^{I}(0.5+\delta,1-\delta)$ . 2. (b)

Decrease $T_{y}$ to $T_{Y}^{I}(0.5+\delta,1-\delta)$

For case A3, the state we selected is not on the principal branch. This means that we cannot increase the temperatures too much; otherwise the system state will move to the principal branch and will never return. We assume initially the system state is at $(\delta,\delta)$ for some small $\delta>0$ , which is some state close to the best NE. Also, we can assume the initial temperatures are $T_{x}=T_{X}^{I}(\delta,\delta)$ and $T_{y}=T_{Y}^{I}(\delta,\delta)$ . Our goal is to arrive at the state $\left(\delta_{1},\frac{b_{X}}{a_{X}+b_{X}}-\delta_{2}\right)$ for some small $\delta_{1}>0$ and $\delta_{2}>0$ such that $\left(\delta_{1},\frac{b_{X}}{a_{X}+b_{X}}-\delta_{2}\right)$ is stable. We present the mechanism in the following:

(A3)
(a)

From initial state $(\delta,\delta)$ , move $T_{x}$ to $T_{X}^{I}\left(\delta_{1},\frac{b_{X}}{a_{X}+b_{X}}-\delta_{2}\right)$ . 2. (b)

Increase $T_{y}$ to $T_{Y}^{I}\left(\delta_{1},\frac{b_{X}}{a_{X}+b_{X}}-\delta_{2}\right)$

Here note that Step (b) should not be proceeded before Step (a) because as we increase $T_{y}$ first, then we are taking the risks of getting off to the principal branch.

Next, consider the case that $a_{Y}<b_{Y}$ . Similarly to the previous case, we know from Theorem 2 and Theorem 3 that states near the borders $x=0,0.5,1$ and $y=0,0.5,1$ are basically stable states. Hence, we can claim the following results:

(B1)

If $(1,1)$ is the best NE and $(0,1)$ is the SO state, then we select $\left(\frac{b_{Y}}{a_{Y}+b_{Y}},1\right)$ . 2. (B2)

If $(1,1)$ is the best NE and $(1,0)$ is the SO state, then we select $(1,0.5)$ . 3. (B3)

If $(0,0)$ is the best NE and $(0,1)$ is the SO state, then we select $\left(0,\frac{b_{X}}{a_{X}+b_{X}}\right)$ . 4. (B4)

If $(0,0)$ is the best NE and $(1,0)$ is the SO state, then we select $\left(0.5,0\right)$ .

It is clear that these choices of states create improvement on the social welfare. An interesting result for this case is that basically these desired states can be reached from any initial state. Due to symmetry, we demonstrate the mechanisms for case (B3) and (B4), and the remaining ones can be done analogously.

For case (B3), we are aiming for the state $\left(\delta_{1},\frac{b_{X}}{a_{X}+b_{X}}-\delta_{2}\right)$ for some small $\delta_{1}>0$ and $\delta_{2}>0$ . We propose the following mechanism:

(B3)
Phase 1: Getting to the principal branch.
(a)

From any initial state, fix $T_{y}$ at some value less than $T_{I}=\frac{b_{Y}-a_{Y}}{2\ln(a_{X}/b_{X})}$ . 3. (b)

Increase $T_{x}$ above the critical temperature $T_{C}(T_{y})$ . 4. (c)

Decrease $T_{x}$ to $T_{x}^{I}\left(\delta_{1},\frac{b_{X}}{a_{X}+b_{X}}-\delta_{2}\right)$ . 5. Phase 2: Staying at the current branch. 6. (d)

Increase $T_{y}$ to $T_{Y}^{I}\left(\delta_{1},\frac{b_{X}}{a_{X}+b_{X}}-\delta_{2}\right)$ .

This process is illustrated in Figure 12 and Figure 12. In phase 1, as we are keeping low $T_{y}$ , meaning the second player is of more rationality. As the first player getting more rational, he is more likely to be influenced by the second player’s preference, and eventually getting to a Nash equilibrium. In phase 2, we make the second player more irrational to increase the social welfare. The level of irrationality we add in phase 2 should be capped to prevent the first player to deviate his decision.

For case (B4), since our desired state is on the principal branch, the mechanism will be similar to case (A1).

(B4)
(a)

From any initial state, raise $T_{x}$ to $T_{X}^{I}(0.5+\delta,\delta)$ . 2. (b)

Decrease $T_{y}$ to $T_{Y}^{I}(0.5+\delta,\delta)$ .

∎

As a remark, in case (A3) and (A4), if we do not start from $(\delta,\delta)$ but from some other states on the principal branch, we can instead aim for state $(0.5,1)$ . This state is not better than the best Nash equilibrium, but still makes improvements over the initial state. The process can be modified as:

(A3’)
(a)

From any initial state, raise $T_{x}$ to $T_{X}^{I}(0.5+\delta,1-\delta)$ (above $T_{C}(T_{y})$ ). 2. (b)

Reduce $T_{y}$ to $T_{Y}^{I}(0.5+\delta,1-\delta)$ .

6 Applications

6.1 Taxation

A direct application for the solution concept of QRE is to analyze the effect of taxation, which has been discussed in Wolpert et al. (2012). Unlike Nash equilibria, for QREs, if we multiply the payoff matrix by some factor $\alpha$ , the equilibrium does change. This is because by multiplying $\alpha$ , effectively we are dividing the temperature parameters by $\alpha$ . This means that if we charge taxes to the players with some flat tax rate $\alpha-1$ , the QREs will differ. Formally, we define the base temperature $T_{0}$ as the temperature when no tax is applied for both players. Then, we can define the tax rate for each player as $\alpha_{x}=1-T_{0}/T_{x},\alpha_{y}=1-T_{0}/T_{y}$ , respectively.

Now we demonstrate how the hysteresis mechanism can be applied via taxation with Example 1. Assume the base temperature $T_{0}=1$ , then with taxation, we can rewrite the process in Example 1 in the following form:

The initial state is $(0.05,0.14)$ , where $\alpha_{x}\approx 0$ and $\alpha_{y}\approx 0.5$ (where $T_{x}\approx 1$ and $T_{y}\approx 2$ ). 2. 2.

Fix $\alpha_{y}=0.5$ (where $T_{y}=2$ ), and increase $\alpha_{x}$ to $0.8$ , where $T_{x}=5$ and there is only one QRE correspondence. 3. 3.

Fix $\alpha_{y}=0.5$ (where $T_{y}=2$ ), and decrease $\alpha_{x}$ back to [math] (where $T_{x}=1$ ). Now $x\approx 0.997$ .

6.2 Evolution of metabolic phenotypes in cancer

Evolutionary Game Theory has been instrumental in studying evolutionary aspects of the somatic evolution that characterize’s cancer progression. Tomlinson and Bodmer were the first to explore the role of cell-cell interactions in cancer. This pioneering work was followed by others that expanded on those initial ideas to study the role of key aspects of cancer evolution like the role of space Kaznatcheev et al. (2015) treatment Basanta et al. (2012); Kaznatcheev et al. (2016) or metabolism Basanta et al. (2008); Kianercy et al. (2014). With regards to Kianercy’s work, it shows how microenvironmental heterogeneity impacts somatic evolution, in this case by optimizing the genetic instability to better tune cell metabolism to the dynamic microenvironment.

Our techniques (the hysteresis mechanism and the optimal control mechanism) can be applied to the cancer game Kianercy et al. (2014) with two types of tumor phenotypic strategies: hypoxic cells and oxygenated cells. These cells inhabit regions where oxygen could be abundant or lacking. In the former, oxygenated cells with regular metabolism thrive but in the latter, hypoxic cells whose metabolism is less reliant on the presence of oxygen (but more on the presence of glucose) have higher fitness.

7 Connection to previous works

Recently, there has been a growing interplay between game theory, dynamical systems, and computer science. Particular such examples include the integration of replicator dynamics and topological tools Piliouras et al. (2014); Papadimitriou and Piliouras (2016); Panageas and Piliouras (2016) in algorithmic game theory, and Q-learning dynamics Watkins and Dayan (1992) in multi-agent systems Tan (1993). Q-learning dynamics has been studied extensively in game settings e.g. by Sato *et al. *in Sato and Crutchfield (2003) and Tuyls *et al. *in Tuyls et al. (2003). In Coucheney et al. (2013) Q-learning dynamics is considered as an extension of replicator dynamics driven by a combination of payoffs and entropy. Recent advances in our understanding of evolutionary dynamics in multi-agent learning can be found in the survey Bloembergen et al. (2015).

We are particularly interested in the connection between the Q-learning dynamics and the concept of QRE McKelvey and Palfrey (1995) in game theory. In Cominetti et al. (2010) Cominetti *et al. *study this connection in traffic congestion games. The hysteresis effect of Q-learning dynamics was first identified in 2012 by Wolpert et al. Wolpert et al. (2012). Kianercy *et al. *in Kianercy and Galstyan (2012) observed the same phenomenon, and provided discussions on the bifurcation diagrams in $2\times 2$ games. The hysteresis effect has been also been highlighted in recent follow-up work by Kianercy et al. (2014) as a design principle for future cancer treatments. It was also studied in Romero (2015) in the context of minimum-effort coordination games. However, our current understanding is still mostly qualitative and in this work we have pushed towards a more practically applicable quantitative, algorithmic analysis.

Analyzing the characteristics of various dynamical systems has also been attracting the attention of the computer science community in recent years. For example, besides the Q-learning dynamics, the (simpler) replicator dynamics has been studied extensively due to its connections Kleinberg et al. (2011); Papadimitriou and Piliouras (2016); Piliouras and Shamma (2014) to the multiplicative weight update (MWU) algorithm in Kleinberg et al. (2009).

Finally, a lot of attention has also been devoted to biological systems and their connections to game theory and computation. In recent work by Mehta et al. Mehta et al. (2016), the connection with genetic diversity was discussed in terms of the complexity of predicting whether genetic diversity persists in the long run under evolutionary pressures. This paper builds upon a rapid sequence of related results Livnat et al. (2008); Chastain et al. (2013, 2014); Livnat et al. (2014); Meir and Parkes (2015); Mehta et al. (2015). The key result is Chastain et al. (2013, 2014) where effectively it was made clear that there exists a strong connection between studying replicator dynamics in games and standard models of evolution. Follow-up works show how to analyze dynamics that incorporate errors (i.e. mutations) Mehta et al. (2017) and how these mutations can have a critical effect to ensuring survival in the presence of dynamically changing environments. Our paper makes progress along these lines by examining how noisy dynamics can introduce such as bifurcations.

We were inspired by recent work by Kianercy *et al. *establishing a connection between cancer dynamics and cancer treatment and studying Q-learning dynamics in games. This is analogous to the connections Livnat and Papadimitriou (2016); Chastain et al. (2013, 2014) between MWU and evolution detailed above. It is our hope that by starting off a quantitative analysis of these systems we can kickstart similarly rapid developments in our understanding of the related questions.

8 Conclusion

In this paper, we perform a quantitative analysis of bifurcation phenomena connected to Q-learning dynamics in the class of $2\times 2$ games. Based on this analysis, we introduce two novel mechanisms, the hysteresis mechanism and the optimal control mechanism. Hysteresis mechanisms use transient changes to the system parameters to induce permanent improvements to its performance via optimal (Nash) equilibrium selection. Optimal control mechanisms induce convergence to states whose performance is better than even the best Nash equilibrium, showing that by controlling the exploration/exploitation tradeoff we can achieve strictly better states than those achievable by perfectly rational agents.

We believe that these new classes of mechanisms could lead to interesting and new questions within game theory as well as a more thorough understanding of cancer biology.

9 Supplementary materials

Appendix A From Q-learning to Q-learning Dynamics

In this section, we provide a quick sketch on how we can get to the Q-learning dynamics from Q-learning agents. We start with an introduction to the Q-learning rule. Then, we discuss the multi-agent model when there are multiple learners in the system. The goal for this section is to identify the dynamics of the system in which there are two learning agents playing a $2\times 2$ game repeatedly over time.

A.1 Q-learning Introduction

Q-learning Watkins and Dayan [1992], Watkins [1989] is a value-iteration method for solving the optimal strategies in Markov decision processes. It can be used as a model where users learn about their optimal strategy when facing uncertainties. Consider a system that consists of a finite number of states and there is one player who has a finite number of actions. The player is going to decide his strategy over an infinite time horizon. In Q-learning, at each time $t$ , the player stores a value estimate $Q_{(s,a)}(t)$ for the payoff of each state-action pair $(s,a)$ . Then, he chooses his action $a_{t+1}$ that maximizes the $Q$ -value $Q_{(s_{t},\cdot)}(t)$ for time $t+1$ , given the system state is $s_{t}$ at time $t$ . In the next time step, if the agent plays action $a_{t+1}$ , he will receive a reward $r(t+1)$ , and the value estimate is updated according to the rule:

[TABLE]

where $\alpha$ is the step size, and $\gamma$ is the discount factor.

A.2 Joint-learning Model

Next, we consider the joint learning model as in Kianercy and Galstyan [2012]. Suppose there are multiple players in the system that are learning concurrently. Denote the set of players as $P$ . We assume the system state is a function of the action each player is playing, and the reward observed by each player is a function of the system state. Their learning behaviors are modeled as simplified models based on the Q-learning algorithm described above. More precisely, we consider the case that each player assumes the system is only of one state, which corresponds to the case that the player has very limited memory, and has discount factor $\gamma=0$ . The reward observed by player $i\in P$ given he plays action $a$ at time $t$ is denoted as $r_{a}^{i}(t)$ . We can write the updating rule of the $Q$ -value for agent $i$ as follows:

[TABLE]

For the selection process, we consider the mechanism that each player $i\in P$ selects his action according to the Boltzmann distribution with temperature $T_{i}$ :

[TABLE]

where $x_{a}^{i}(t)$ is the probability that agent $i$ chooses action $a$ at time $t$ . The intuition behind this mechanism is that we are modeling the irrationality of the users by the temperature parameter $T_{i}$ . For small $T_{i}$ , the selection rule corresponds to the case of more rational agents. We can see that for $T_{i}\rightarrow 0$ , (9) corresponds to the best-response rule, that is, each agent selects the action with the highest $Q$ -value with probability one. On the other hand, for $T_{i}\rightarrow\infty$ , we can see that (9) corresponds to the selection rule of selecting each action uniformly at random, which models the case of fully-irrational agents.

A.3 Continuous-time dynamics

This underlying Q-learning model has been studied in the previous decades. It is known that if we take the time interval to be infinitely small, this sequential joint learning process can be approximated as a continuous-time model (Tuyls et al. [2003], Sato and Crutchfield [2003]) that has some interesting characteristics. To see this, consider the $2\times 2$ game as we have described in Section 2.1. The expected payoff for the first player at time $t$ given he chooses action $a$ can be written as $r_{a}^{x}(t)=[\bm{A}\bm{y}(t)]_{a}$ , and similarly, the expected payoff for the second player at time $t$ given he chooses action $a$ is $r_{a}^{y}(t)=[\bm{B}\bm{x}(t)]_{a}$ . The continuous-time limit for the evolution of the $Q$ -value for each player can be written as

[TABLE]

Then, we take the time derivative of (9) for each player to get the evolution of the strategy profile:

[TABLE]

Putting these together, and rescaling the time horizon to $\alpha t/T_{x}$ and $\alpha t/T_{y}$ respectively, we obtain the continuous-time dynamics:

[TABLE]

A.4 The exploration term increases entropy

Now, we show that the exploration term in the Q-learning dynamics results in the increase of the entropy:

Lemma 2.

Suppose $A=\bm{0}$ and $B=\bm{0}$ . The system entropy

[TABLE]

for the dynamics (2) increases with time, i.e.

[TABLE]

if $\bm{x}$ and $\bm{y}$ are not uniformly distributed.

Proof of Lemma 2.

It is equivalent that we consider the single agent dynamics:

[TABLE]

Taking the derivative of the entropy $H(\bm{x})$ , we have

[TABLE]

and since we have $\sum_{i}x_{i}=1$ , by Jensen’s inequality, we can find that

[TABLE]

where equality holds if and only if $\bm{x}$ is a uniform distribution. Consequently, if we have $x_{i}\in(0,1)$ , and $\bm{x}$ is not a uniform distribution, $\dot{H}(\bm{x})$ is strictly positive, which means that the system entropy increases with time. ∎

Appendix B Convergence of dissipative learning dynamics in $2\times 2$ games

Liouville’s formula

Liouville’s formula can be applied to any system of autonomous differential equations with a continuously differentiable vector field $V$ on an open domain of $\mathcal{S}\subset\mathbb{R}^{k}$ . The divergence of $V$ at $x\in\mathcal{S}$ is defined as the trace of the corresponding Jacobian at $x$ , i.e., $\text{div}[V(x)]\equiv\sum_{i=1}^{k}\frac{\partial V_{i}}{\partial x_{i}}(x)=tr(DV(x))$ . Since divergence is a continuous function we can compute its integral over measurable sets $A\subset\mathcal{S}$ (with respect to Lebesgue measure $\mu$ on $\mathbb{R}^{n}$ ). Given any such set $A$ , let $\phi_{t}(A)=\{\phi(x_{0},t):x_{0}\in A\}$ be the image of $A$ under map $\Phi$ at time $t$ . $\phi_{t}(A)$ is measurable and its measure is $\mu(\phi_{t}(A)))=\int_{\phi_{t}(A)}dx$ . Liouville’s formula states that the time derivative of the volume $\phi_{t}(A)$ exists and is equal to the integral of the divergence over $\phi_{t}(A)$ : $\frac{d}{dt}[A(t)]=\int_{\phi_{t}(A)}\text{div}[V(x)]dx.$ Equivalently:

Theorem 6 (Sandholm [2010], page 356).

$\frac{d}{dt}\mu(\phi_{t}(A))=\int_{\phi_{t}(A)}tr(DV(x))d\mu(x)$ **

A vector field is called divergence free if its divergence is zero everywhere. Liouville’s formula trivially implies that volume is preserved in such flows.

This theorem extends in a straightforward manner to systems where the vector field $V:X\rightarrow TX$ is defined on an affine set $X\subset\mathbb{R}^{n}$ with tangent space $TX$ . In this case, $\mu$ represents the Lebesgue measure on the (affine hull) of $X$ . Note that the derivative of $V$ at a state $x\in X$ must be represented using the derivate matrix $DV(x)\in\mathbb{R}^{n\times n}$ , which by definitions has rows in $TX$ . If $\hat{V}:\mathbb{R}^{n}\rightarrow R^{n}$ is a $C^{1}$ extension of $V$ then $DV(x)=D\hat{V}(x)P_{TX}$ , where $P_{TX}\in\mathbb{R}^{n\times n}$ is the orthogonal projection222To find the matrix of the orthogonal projection onto $TX$ (or any subspace $Y$ of $\mathbb{R}^{n}$ ) it suffices to find a basis ( $\vec{v_{1}},\vec{v_{2}},\dots,\vec{v_{m}}$ ). Let $B$ be the matrix with columns $\vec{v_{i}}$ , then $P=B(B^{T}B)^{-1}B^{T}$ . of $\mathbb{R}^{n}$ onto the subspace $TX$ .

Poincaré-Bendixson theorem

The Poincaré-Bendixson theorem is a powerful theorem that implies that two-dimensional systems cannot effectively exhibit chaos. Effectively, the limit behavior is either going to be an equilibrium, a periodic orbit, or a closed loop, punctuated by one (or more) fixed points. Formally, we have:

Theorem 7 (Bendixson [1901], Teschl [2012]).

Given a differentiable real dynamical system defined on an open subset of the plane, then every non-empty compact $\omega$ -limit set of an orbit, which contains only finitely many fixed points, is either a fixed point, a periodic orbit, or a connected set composed of a finite number of fixed points together with homoclinic and heteroclinic orbits connecting these.

Bendixson-Dulac theorem

By excluding the possibility of closed loops (i.e., periodic orbits, homoclinic cycles, heteronclinic cycles) we can effectively establish global convergence to equilibrium. The following criterion, which was first established by Bendixson in 1901 and further refined by French mathematician Dulac in 1933, allows us to do that. It is typically referred to as the Bendixson-Dulac negative criterion. It focus exactly on planar system where the measure of initial conditions always shrinks (or always increases) with time, i.e., dynamical systems with vector fields whose divergence is always negative (or always positive).

Theorem 8 (Müller and Kuttler [2015], page 210).

Let $D\subset\mathbb{R}^{2}$ be a simply connected region and $(f,g)$ in $C^{1}(D,\mathbb{R})$ with $div(f,g)=\frac{\partial f}{\partial x}+\frac{\partial g}{\partial y}$ being not identically zero and without change of sign in $D$ . Then the system

[TABLE]

has no loops lying entirely in $D$ .

The function $\varphi(x,y)$ is typically called the Dulac function.

Remark: This criterion can also be generalized. Specifically, it holds for the system:

[TABLE]

if $\rho(x,y)>0$ is continuously differentiable. Effectively, we are allowed to rescale the vector field by a scalar function (as long as this function does not have any zeros), before we prove that the divergence is positive (or negative). That is, it suffices to find $\rho(x,y)>0$ continuously differentiable, such that $(\rho(x,y)f(x,y))_{x}+(\rho(x,y)g(x,y))_{y}$ possesses a fixed sign.

By Kianercy and Galstyan [2012] we have that the after a change of variables, $u_{k}=\frac{\ln(x_{k+1})}{\ln x_{1}}$ , $v_{k}=\frac{\ln(y_{k+1})}{\ln y_{1}}$ for $k=1,\dots,n-1$ , the replicator system transforms to the following system:

[TABLE]

where $\hat{a}_{kj}=a_{k+1,j+1}-a_{1,j+1}$ , $\hat{b}_{kj}=b_{k+1,j+1}-a_{1,j+1}$ .

In the case of $2\times 2$ games, we can apply both the Poincaré-Bendixson theorem as well as the Bendixson-Dulac theorem, since the resulting dynamical system is planar and $\frac{\partial\dot{u}_{1}}{\partial u_{1}}+\frac{\partial\dot{v}_{1}}{\partial v_{1}}=-(T_{x}+T_{y})<0$ . Hence, for any initial condition system (II) converges to equilibria. The flow of original replicator system in the $2\times 2$ game is diffeomorhpic333 A function $f$ between two topological spaces is called a diffeomorphism if it has the following properties: $f$ is a bijection, $f$ is continuously differentiable, and $f$ has a continuously differentiable inverse. Two flows $\Phi^{t}:A\rightarrow A$ and $\Psi^{t}:B\rightarrow B$ are diffeomorhpic if there exists a diffeomorphism $g:A\rightarrow B$ such that for each $x\in A$ and $t\in\mathbb{R}$ $g(\Phi^{t}(x))=\Psi^{t}(g(x))$ . If two flows are diffeomorphic then their vector fields are related by the derivative of the conjugacy. That is, we get precisely the same result that we would have obtained if we simply transformed the coordinates in their differential equations Meiss [2007]. to the flow of system (II), thus replicator dynamics with positive temperatures $T_{x},T_{y}$ converges to equilibria for all initial conditions as well.

Appendix C Bifurcation Analysis for Games with Only One Nash Equilibrium

In this section, we present the results for the class of games with only one Nash equilibrium, where it can be either a pure one or a mixed one, where the mixed Nash equilibrium is defined as

Definition 9 (mixed Nash equilibrium).

A strategy profile $(x_{NE},y_{NE})$ is a mixed Nash equilibrium if

[TABLE]

This corresponds to the case that at least one of $b_{X}$ , $a_{Y}$ , or $b_{Y}$ being negative. Similarly, our analysis is based on the second form representation described in (6) and (7), which demonstrates insights from the first player’s perspective.

C.1 No dominating strategy for the first player

More specifically, this is the case when there is no dominating strategy for the first player, i.e. both $a_{X}$ and $b_{X}$ are positive. From (7) we can presume that the characteristics of the bifurcation diagrams depends on the value of $a_{Y}+b_{Y}$ since it affects whether $y^{II}$ is increasing with $x$ or not. Also, we can find some interesting phenomenon from the discussion below.

First, we consider the case when $a_{Y}+b_{Y}>0$ . This can be considered as a more general case as we have discussed in Section 4.3. In fact, the statements we have made in Theorem 1, Theorem 2, and Theorem 3 applies to this case. However, there are some subtle difference we should be noticed. If $a_{Y}>b_{Y}$ , where we can assume $b_{Y}<0$ , then by the second part of Theorem 2, there are no QRE in $x\in(0,0.5)$ , since $T_{B}$ now is a negative number. This means that we always only have the principal branch. On the other hand, if $a_{Y}<b_{Y}$ , where we can assume $a_{Y}<0$ , then similar to the example in Figure 5 and Figure 5, there could still be two branches. However, we can presume that the second branch vanishes before $T_{y}$ actually goes to zero, as the state $(1,1)$ is not a Nash equilibrium.

Theorem 9.

Given a $2\times 2$ game in which the diagonal form has $a_{X},b_{X}>0$ , $a_{Y}+b_{Y}>0$ , and $a_{Y}<b_{Y}$ , and given $T_{y}$ , if $T_{y}<T_{A}$ , where $T_{A}=\frac{-a_{Y}}{\ln(a_{Y}/b_{Y})}$ , then there are no QRE correspondence in $x\in(0.5,1)$ .

The proof of the above theorem directly follows from Proposition 4 in the appendix. An interesting observation here is that we can still make the first player get to his desired state by changing $T_{y}$ to some value that is greater than $T_{A}$ .

Next, we consider $a_{Y}+b_{Y}\leq 0$ . The bifurcation diagram is illustrated in Figure 14 and Figure 14. We can find that in this case the principal branch directly goes toward its unique Nash equilibrium. We present the results formally in the following theorem, where the proof follows from Section D.1.2 in the appendix.

Theorem 10.

*Given a $2\times 2$ game in which the diagonal form has $a_{X},b_{X}>0$ , $a_{Y}+b_{Y}\leq 0$ , QRE is unique given $T_{x}$ and $T_{y}$ . *

C.2 Dominating strategy for the first player

Finally, we consider the case when there is a dominating strategy for the first player, i.e. $b_{X}<0$ . According to Figure 16 and Figure 16, the principal branch seems always goes towards $x=1$ . This means that the first player always prefers his dominating strategy. We formalize this observation, as well as some important characteristics for this case in the theorem below, where the proof can be found in Section D.2 in the appendix.

Theorem 11.

Given a $2\times 2$ game in which the diagonal form has $a_{X}>0$ , $b_{X}<0$ , $a_{X}+b_{X}>0$ , and given $T_{y}$ , the following statements are true:

The region $(0,0.5)$ contains the principal branch. 2. 2.

There are no QRE correspondence for $x\in(0.5,1)$ . 3. 3.

If $a_{Y}+b_{Y}<0$ or $a_{Y}>b_{Y}$ , then the principal branch is continuous. 4. 4.

If $a_{Y}+b_{Y}>0$ and $b_{Y}>a_{Y}$ , then the principal branch may not be continuous.

As we can see from Theorem 11, for the most cases, the principal branch is continuous. One special case is when $a_{Y}+b_{Y}>0$ with $b_{Y}>a_{Y}$ . In fact, this can be seen as a duality, i.e. flipping the role of two players, of the case we have discussed in part 3 of Theorem 9, where for $T_{y}$ is within $T_{A}$ and $T_{I}$ , there can be three QRE correspondences.

Appendix D Detailed Bifurcation Analysis for General $2\times 2$ Game

In this section, we provide technical details for the results we stated in Section 4.3 and Section C. Before we get into details, we state some results that will be useful throughout the analysis in the following lemma. The proof of this lemma is straightforward and we omit it in this paper.

Lemma 3.

The following statements are true.

The derivative of $T_{X}^{II}$ is given as

[TABLE]

where

[TABLE] 2. 2.

The derivative of $y^{II}$ is given as

[TABLE] 3. 3.

For $x\in(0,1/2)\cup(1/2,1)$ , $\frac{\partial T_{X}^{II}}{\partial x}>0$ if and only if $L(x,T_{y})<\frac{b_{X}}{a_{X}+b_{X}}$ ; on the other hand, $\frac{\partial T_{X}^{II}}{\partial x}<0$ if and only if $L(x,T_{y})>\frac{b_{X}}{a_{X}+b_{X}}$ .

D.1 Case 1: $b_{X}\geq 0$

First, we consider the case $b_{X}\geq 0$ . As we are going to show in Proposition 1, the direction of the principal branch relies on $y^{II}(0.5,T_{y})$ , which is the strategy the second player is performing, assuming the first player is indifferent to his payoff. The idea is that if $y^{II}(0.5,T_{y})$ is large, then it means that the second player pays more attention to the action that the first player thinks better. This is more likely to happen when the second player has less rationality, i.e. high temperature $T_{y}$ . On the other hand, if the second player pays more attention to the other action, the first player is forced to choose that as it gets more expected payoff.

We show that for $T_{y}>T_{I}$ , the principal branch lies on $x\in\left(\frac{1}{2},1\right)$ , otherwise the principal branch lies on $x\in\left(0,\frac{1}{2}\right)$ . This result follows from the following proposition:

Proposition 1.

For case 1, if $T_{y}>T_{I}$ , then we have $y^{II}(1/2,T_{y})>\frac{b_{X}}{a_{X}+b_{X}}$ , and hence

[TABLE]

On the other hand, if $T_{y}<T_{I}$ , then we have $y^{II}(1/2,T_{y})<\frac{b_{X}}{a_{X}+b_{X}}$ , and hence

[TABLE]

Proof.

First, consider the case that $b_{Y}>a_{Y}$ , then, we can see that for $T_{y}>T_{I}=\frac{b_{Y}-a_{Y}}{2\ln(a_{X}/b_{X})}$ :

[TABLE]

Then, for the case that $a_{Y}>b_{Y}$ , we can see that

[TABLE]

For the case that $a_{Y}=b_{Y}$ , since we assumed $a_{X}\not=b_{X}$ , we have

[TABLE]

As a result, the numerator of (6) at $x=\frac{1}{2}$ is negative for $T_{y}>T_{I}$ , this proves the first two limit.

For the rest two limits, we only need to consider the case $b_{Y}>a_{Y}$ , otherwise $T_{I}=0$ , which is meaningless. For $b_{Y}>a_{Y}$ and $T_{y}<T_{I}$ , we can see that

[TABLE]

This makes the numerator of (6) at $x=\frac{1}{2}$ positive and proves the last two limits. ∎

D.1.1 Case 1a: $b_{X}\geq 0$ , $a_{Y}+b_{Y}>0$

In this section, we consider a relaxed version of the class of coordination game as in Section 4.3. We prove theorems presented in Section 4.3, and showing that these results can in fact be extended to the case that $a_{Y}+b_{Y}>0$ , instead of requiring $a_{Y}>0$ and $b_{Y}>0$ .

First, we can find that as $a_{Y}+b_{Y}>0$ , $y^{II}$ is an increasing function of $x$ , meaning

[TABLE]

This implies that both player tend to agree to each other. Intuitively, if $a_{Y}\geq b_{Y}$ , then both player agree with that the first action is the better one. For this case, we can show that no matter what $T_{y}$ is, the principal branch lies on $x\in\left(\frac{1}{2},1\right)$ . In fact, this can be extended to the case whenever $T_{y}>T_{I}$ , which is the first part of Theorem 1.

Proof of Part 1 of Theorem 1.

We can find that for $T_{y}>T_{I}$ , we have $y^{II}(1/2,T_{Y})>\frac{b_{X}}{a_{X}+b_{X}}$ for any $T_{y}$ according to Proposition 1. Since $y^{II}$ is monotonic increasing with $x$ , we have $y^{II}>\frac{b_{X}}{a_{X}+b_{X}}$ for $x>1/2$ . This means that we have $T_{X}^{II}>0$ for any $x\in(1/2,1)$ . Also, it is easy to see that $\lim_{x\rightarrow 1^{-}}T_{X}^{II}=0$ . As a result, we can find that $(0.5,1)$ contains the principal branch. ∎

For Case 1a with $a_{Y}\geq b_{Y}$ we can observe that on the principal branch, the lower the $T_{x}$ , the more $x$ is close to $1$ . We are able to show this monotonicity characteristics in Proposition 2, which can be used to justify the stability owing to Lemma 1.

Proposition 2.

In Case 1a, if $a_{Y}\geq b_{Y}$ , then $\frac{\partial T_{X}^{II}}{\partial x}<0$ for $x\in\left(\frac{1}{2},1\right)$ .

Proof.

It suffices to show that $L(x,T_{y})>\frac{b_{X}}{a_{X}+b_{X}}$ for $x\in\left(\frac{1}{2},1\right)$ . Note that according to Prop 1, we have if $a_{Y}\geq b_{Y}$ ,

[TABLE]

Since $y^{II}(x,T_{y})$ is monotonic increasing when $a_{Y}+b_{Y}>0$ , $y^{II}(x,T_{y})>\frac{1}{2}$ for $x\in\left(\frac{1}{2},1\right)$ . As a result, we have $1-2y^{II}<0$ , and hence we can see that for $x\in\left(\frac{1}{2},1\right)$ ,

[TABLE]

Consequently we have that for $x\in\left(\frac{1}{2},1\right)$ , $L(x,T_{y})>\frac{b_{X}}{a_{X}+b_{X}}$ , and hence $\frac{\partial T_{X}^{II}}{\partial x}<0$ according to Lemma 3. ∎

Proof of Part 1 of Theorem 3.

According to Lemma 1, Proposition 2 implies that all $x\in(0.5,1)$ is on the principal branch. This directly leads us to part 1 of Theorem 3. ∎

Next, if we look into the region $x\in(0,1/2)$ , we can find that in this region, QREs appears only when $T_{x}$ and $T_{y}$ is low. This observation can be formalized in the proposition below. We can see that this proposition directly proves part 2 and 3 of Theorem 2, as well as part 2 of Theorem 3.

Proposition 3.

Consider Case 1a. Let $x_{1}=\min\left\{\frac{1}{2},\frac{-T_{y}\ln\left(\frac{a_{X}}{b_{X}}\right)+b_{Y}}{a_{Y}+b_{Y}}\right\}$ and $T_{B}=\frac{b_{Y}}{\ln(a_{X}/b_{X})}$ . The following statements are true for $x\in(0,1/2)$ :

If $T_{y}>T_{B}$ , then $T_{X}^{II}<0$ . 2. 2.

If $T_{y}<T_{B}$ , then $T_{X}^{II}>0$ if and only if $x\in(0,x_{1})$ . 3. 3.

$\frac{\partial L}{\partial x}>0$ * for $x\in(0,x_{1})$ .* 4. 4.

If $T_{y}<T_{I}$ , then $\frac{\partial T_{X}^{II}}{\partial x}>0$ . 5. 5.

If $T_{y}>T_{I}$ , then there is a nonnegative critical temperature $T_{C}(T_{y})$ such that $T_{X}^{II}(x,T_{Y})\leq T_{C}(T_{y})$ for $x\in(0,1/2)$ . If $T_{Y}<T_{B}$ , then $T_{C}(T_{y})$ is given as $T_{X}^{II}(x_{L})$ , where $x_{L}\in(0,x_{1})$ is the unique solution to $L(x,T_{y})=\frac{b_{X}}{a_{X}+b_{X}}$ .

Proof.

For the first and second part, consider any $x\in(0,1/2)$ , and we can see that

[TABLE]

Note that for $T_{y}>\frac{b_{Y}}{\ln(a_{X}/b_{X})}=T_{B}$ , we have $x_{1}<0$ , and hence $T_{X}<0$ .

From the above derivation we can see that for all $x\in(0,1/2)$ such that $T_{X}^{II}(x,T_{y})>0$ , we have $y^{II}<1/2$ since $\frac{b_{X}}{a_{X}+b_{X}}<1/2$ . Then, we can easily find that

[TABLE]

Further, when $T_{y}<T_{I}$ , we have $y^{II}(1/2,T_{y})<\frac{b_{X}}{a_{X}+b_{X}}$ . This implies that for $x\in(0,1/2)$ , $y^{II}(x,T_{y})<\frac{b_{X}}{a_{X}+b_{X}}$ . Since $\frac{\partial L}{\partial x}>0$ , and $L$ is continuous, we can see that $L(x,T_{y})<\frac{b_{X}}{a_{X}+b_{X}}$ for $x\in(0,1/2)$ . This implies the fourth part of the proposition.

Next, if we look at the derivative of $T_{X}^{II}$ ,

[TABLE]

we can see that any critical point in $x\in(0,1/2)$ must satisfy $L(x,T_{y})=\frac{b_{X}}{a_{X}+b_{X}}$ . When $T_{y}>T_{I}$ , $x_{1}<1/2$ , and we can see that $L(x_{1},T_{y})>y^{II}(x_{1},T_{y})=\frac{b_{X}}{a_{X}+b_{X}}$ . If $T_{y}<\frac{b_{Y}}{\ln(a_{X}/b_{X})}$ , then $\lim_{x\rightarrow 0+}T_{X}=y^{II}(0,T_{Y})<\frac{b_{X}}{a_{X}+b_{X}}$ . Hence, there is exactly one critical point for $T_{X}$ for $x\in(0,x_{1})$ , which is a local maximum for $T_{X}$ . If $T_{y}>\frac{b_{Y}}{\ln(a_{X}/b_{X})}$ , then we can see that $T_{X}$ is always negative, in which case the critical temperature is zero. ∎

The results in Proposition 3 not only applies for the case $a_{Y}\geq b_{Y}$ but also general cases about the characteristics on $(0,1/2)$ . According to this proposition, we can conclude the following things for the case $a_{Y}\geq b_{Y}$ , as well as the case $a_{Y}<b_{Y}$ when $T_{y}>T_{I}$ :

The temperature $T_{B}=\frac{b_{Y}}{\ln(a_{X}/b_{X})}$ determines whether there is a branch appears in $x\in(0,1/2)$ . 2. 2.

There is some critical temperature $T_{C}$ . If we raise $T_{x}$ above $T_{C}$ , then the system is always on the principal branch. 3. 3.

The critical temperature $T_{C}$ is given as the solution to the equality $L(x,T_{Y})=\frac{b_{X}}{a_{X}+b_{X}}$ .

When there is a positive critical temperature, though it has no closed form solution, we can perform binary search to look for $x\in(0,x_{1})$ that satisfies $L(x,T_{y})=\frac{b_{X}}{a_{X}+b_{X}}$ .

Another result we are able to obtain from Proposition 3 is that the principal branch for Case 1a when $T_{y}<T_{I}$ lies on $(0,1/2)$ .

Proof of Part 2 of Theorem 1.

First, we note that $T_{y}<T_{I}$ is meaningful only when $b_{Y}>a_{Y}$ , for which case we always have $T_{I}<T_{B}$ . From Proposition 3, we can see that for $T_{Y}^{II}<T_{I}$ , we have $x_{1}=1/2$ , and hence $T_{X}^{II}>0$ for $x\in(0,1/2)$ . From Proposition 1, we already have $\lim_{x\rightarrow\frac{1}{2}^{-}}T_{X}^{II}=\infty$ . Also, it is easy to see that $\lim_{x\rightarrow 0^{+}}T_{X}^{II}=0$ . As a result, since $T_{X}^{II}$ is continuous differentiable over $(0,0.5)$ , for any $T_{x}>0$ , there exists $x\in(0,0.5)$ such that $T_{X}^{II}(x,T_{y})=T_{x}$ . ∎

What remains to show is the characteristics on the side $(1/2,1)$ when $b_{Y}>a_{Y}$ . In Figure 5 and Figure 5, we can find that for low $T_{y}$ , the branch on the side $(1/2,1)$ demonstrated a similar behavior as what we have shown in Proposition 3 for the side $(0,1/2)$ . However, for high $T_{y}$ , while we still can find that $(0,1/2)$ contains the principal branch, the principal branch is not continuous. These observations are formalized in the following proposition. From this proposition, the proof of part 4 of Theorem 2 directly follows.

Proposition 4.

Consider Case 1a with $b_{Y}>a_{Y}$ . Let $x_{2}=\max\left\{\frac{1}{2},\frac{-T_{Y}\ln\left(\frac{a_{X}}{b_{X}}\right)+b_{Y}}{a_{Y}+b_{Y}}\right\}$ and $T_{A}=\max\left\{0,\frac{-a_{Y}}{\ln(a_{X}/b_{X})}\right\}$ . The following statements are true for $x\in(1/2,1)$ .

If $T_{y}<T_{A}$ , then $T_{X}^{II}<0$ . 2. 2.

If $T_{y}>T_{A}$ , then $T_{X}^{II}>0$ if and only if $x\in(x_{2},1)$ . 3. 3.

For $x\in\left[\frac{b_{Y}}{a_{Y}+b_{Y}},1\right)$ , we have $\frac{\partial L}{\partial x}>0$ . 4. 4.

If $T_{y}\in(T_{A},T_{I})$ , then there is a positive critical temperature $T_{C}(T_{y})$ such that $T_{X}^{II}(x,T_{y})\leq T_{C}(T_{y})$ for $x\in(1/2,1)$ , given as $T_{C}(T_{y})=T_{X}^{II}(x_{L})$ , where $x_{L}\in(1/2,1)$ is the unique solution of $L(x,T_{y})=\frac{b_{X}}{a_{X}+b_{X}}$ .

Proof.

For the first part and the second part, consider $x\in(1/2,1)$ , and we can find that

[TABLE]

Note that for $T_{y}>T_{I}$ , we get $x_{2}=1/2$ . Also, if $T_{y}<T_{A}$ , then $T_{X}^{II}<0$ for all $x\in(1/2,1)$ .

For the third part, that $y^{II}\geq\frac{1}{2}$ for all $x\geq\frac{b_{Y}}{a_{Y}+b_{Y}}$ and $\frac{b_{Y}}{a_{Y}+b_{Y}}>\frac{1}{2}$ . Then, we can find that

[TABLE]

For the fourth part, we can find that any critical point of $L(x,T_{Y})$ in $(0,1)$ must be either $x=\frac{1}{2}$ or satisfies the following equation:

[TABLE]

Consider $G(x,T_{y})=(1-2x)+x(1-x)(1-2y^{II})\frac{a_{Y}+b_{Y}}{T_{y}}$ . For $b_{Y}>a_{Y}$ , $y^{II}(1/2,T_{y})$ is strictly less than $1/2$ . Also, we can see that $\frac{b_{Y}}{a_{Y}+b_{Y}}>1/2$ . Now, we can observe that $G(1/2,T_{y})>0$ and $G(\frac{b_{Y}}{a_{Y}+b_{Y}},T_{y})<0$ . Next, we can see that $G(x,T_{y})$ is monotonic decreasing with respect to $x$ for $x\in\left(\frac{1}{2},\frac{b_{Y}}{a_{Y}+b_{Y}}\right)$ by looking at its derivative:

[TABLE]

As a result, we can see that there is some $x^{*}\in\left(\frac{1}{2},\frac{b_{Y}}{a_{Y}+b_{Y}}\right)$ such that $G(x^{*},T_{y})=0$ . This implies that $L(x,T_{y})$ has exactly one critical point for $x\in\left(\frac{1}{2},\frac{b_{Y}}{a_{Y}+b_{Y}}\right)$ . Besides, we can see that if $G(x,T_{y})>0$ , $\frac{\partial L}{\partial x}<0$ ; while if $G(x,T_{y})<0$ , then $\frac{\partial L}{\partial x}>0$ . Therefore, $x^{*}$ is a local minimum for $L$ .

From the above arguments, we can conclude that the shape of $L(x,T_{y})$ for $T_{y}<T_{I}$ is as follows:

There is a local maximum at $x=1/2$ , where $L(1/2,T_{y})=y(1/2,T_{y})<\frac{b_{X}}{a_{X}+b_{X}}$ . 2. 2.

$L$ is decreasing on the interval $\left(\frac{1}{2},x^{*}\right)$ , where $x^{*}$ is the unique solution to (15). 3. 3.

$L$ is increasing on the interval $(x^{*},1)$ . If $T_{y}>T_{A}$ , then $\lim_{x\rightarrow 1^{-}}L(x,T_{y})=y(1,T_{y})>\frac{b_{X}}{a_{X}+b_{X}}$ .

Finally, we can claim that there is a unique solution to $L(x,T_{Y})=\frac{b_{X}}{a_{X}+b_{X}}$ , and such point gives a local maximum to $T_{X}^{II}$ . ∎

The above proposition suggests that for $T_{y}\in(T_{A},T_{I})$ , we are able to use binary search to find the critical temperature. For $T_{y}>T_{I}$ , unfortunately, with the similar argument of Proposition 4, we can find that there are potentially at most two critical points for $T_{X}^{II}$ on $(1/2,1)$ , as shown in Figure 5, which may induce an unstable segment between two stable segments. This also proves part 3 of Theorem 3.

Now, we have enough materials to prove the remaining statements in Section 4.3.

Proof of Part 1, 5, and 6 of Theorem 2, part 4 of Theorem 3.

For $T_{y}>T_{I}$ , by Proposition 3, we can conclude that for $x\in(0,x_{L})$ , we have $\frac{\partial T_{X}^{II}}{\partial x}>0$ , for which the QREs are stable by Lemma 1. With similar argument we can conclude that the QREs on $x\in(x_{L},x_{1})$ are unstable. Besides, given $T_{x}$ , the stable QRE $x_{a}\in(0,x_{L})$ and the unstable $x_{b}\in(x_{L},x_{1})$ that satisfies $T_{X}^{II}(x_{a},T_{y})=T_{X}^{II}(x_{b},T_{y})=T_{x}$ appear in pairs. For $T_{y}<T_{I}$ , with the same technique and by Proposition 4, we can claim that the QREs in $x\in(x_{2},x_{L})$ are unstable; while the QREs in $x\in(x_{L},1)$ are stable. This proves the first part of of Theorem 2 and part 4 of Theorem 3.

Part 5 and 6 of Theorem 2 are corollaries of part 5 of Proposition 3 and part 4 of Proposition 4. ∎

D.1.2 Case 1b: $b_{X}>0$ , $a_{Y}+b_{Y}<0$

In this case, both player have different preferences. For the game within this class, there is only one Nash equilibrium (either pure or mixed). We presented examples in Figure 14 and Figure 14. We can find that in these figures, there is only one QRE given $T_{x}$ and $T_{y}$ . We show in the following two propositions that this observation is true for all instances.

Proposition 5.

Consider Case 1b. Let $x_{3}=\max\left\{0,\frac{-T_{y}\ln(a_{X}/b_{X})+b_{Y}}{a_{Y}+b_{Y}}\right\}$ . If $T_{y}<T_{I}$ , then the following statements are true

$T_{X}^{II}(x,T_{y})<0$ * for $x\in(1/2,1)$ .* 2. 2.

$T_{X}^{II}(x,T_{y})>0$ * for $x\in\left(x_{3},\frac{1}{2}\right)$ .* 3. 3.

$\frac{\partial T_{X}^{II}(x,T_{y})}{\partial x}>0$ * for $x\in\left(x_{3},\frac{1}{2}\right)$ .* 4. 4.

$\left(x_{3},\frac{1}{2}\right)$ * contains the principal branch.*

Proof.

Note that if $T_{y}<T_{I}$ , we have $x_{3}<1/2$ . Also, according to Proposition 2, $y^{II}(1/2,T_{y})<\frac{b_{X}}{a_{X}+b_{X}}$ . Since $y^{II}$ is continuous and monotonic decreasing with $x$ , we can see that $y^{II}<\frac{b_{X}}{a_{X}+b_{X}}$ for $x>1/2$ . Therefore, the numerator of (6) is always positive for $x\in(1/2,1)$ , which makes $T_{X}^{II}$ negative. This proves the first part of the proposition.

For the second part, observe that for $x\in(0,1/2)$ , $T_{X}^{II}>0$ if and only if $y^{II}<\frac{b_{X}}{a_{X}+b_{X}}$ . This is equivalent to $x>\frac{-T_{y}\ln(a_{X}/b_{X})+b_{Y}}{a_{Y}+b_{Y}}$ .

For the third part, note that for $x\in(0,1/2)$ , $x(1-x)\ln(1/x-1)\frac{\partial y^{II}}{\partial x}<0$ . This implies $L(x,T_{y})<y^{II}(x,T_{y})<\frac{b_{X}}{a_{X}+b_{X}}$ for $x\in(x_{3},1/2)$ , from which we can conclude that $\frac{\partial T_{X}^{II}(x,T_{y})}{\partial x}>0$ .

Finally, we note that if $x_{3}>0$ , then $T_{X}^{II}(x_{3},T_{y})=0$ . If $x_{3}=0$ , we have $\lim_{x\rightarrow 0^{+}}T_{X}^{II}=0$ . As a result, we can conclude that $(x_{3},1/2)$ contains the principal branch. ∎

With the similar arguments, we are able to show the following proposition for $T_{y}>T_{I}$ :

Proposition 6.

Consider Case 1b. Let $x_{3}=\min\left\{1,\frac{-T_{y}\ln(a_{X}/b_{X})+b_{Y}}{a_{Y}+b_{Y}}\right\}$ . If $T_{y}>T_{I}$ , then the following statements are true

$T_{X}^{II}(x,T_{y})<0$ * for $x\in(0,1/2)$ .* 2. 2.

$T_{X}^{II}(x,T_{y})>0$ * for $x\in\left(\frac{1}{2},x_{3}\right)$ .* 3. 3.

$\frac{\partial T_{X}^{II}(x,T_{y})}{\partial x}<0$ * for $x\in\left(\frac{1}{2},x_{3}\right)$ .* 4. 4.

$\left(\frac{1}{2},x_{3}\right)$ * contains the principal branch.*

D.1.3 Case 1c: $a_{Y}+b+Y=0$

In this case, we have $T_{I}=\frac{b_{Y}}{\ln(a_{X}/b_{X})}$ , and $y^{II}$ is a constant with respect to $x$ . The proof of Theorem 10 for $a_{Y}+b_{Y}=0$ directly follows from the following proposition.

Proposition 7.

Consider Case 1c. The following statements are true:

If $T_{y}<T_{I}$ , then $T_{X}^{II}(x,T_{y})<0$ for $x\in(0.5,1)$ , and $T_{X}^{II}(x,T_{y})>0$ for $x\in(0,0.5)$ . 2. 2.

If $T_{y}>T_{I}$ , then $T_{X}^{II}(x,T_{y})<0$ for $x\in(0,0.5)$ , and $T_{X}^{II}(x,T_{y})>0$ for $x\in(0.5,1)$ . 3. 3.

If $T_{y}<T_{I}$ , then $\frac{\partial T_{X}^{II}(x,T_{y})}{\partial x}>0$ for $x\in\left(0,0.5\right)$ . 4. 4.

If $T_{y}>T_{I}$ , then $\frac{\partial T_{X}^{II}(x,T_{y})}{\partial x}<0$ for $x\in\left(0.5,1\right)$ .

Proof.

Note that $y^{II}=\left(1+e^{b_{Y}/T_{y}}\right)^{-1}$ .

First consider the case when $a_{Y}>b_{Y}$ . In this case $T_{I}=0$ and $b_{Y}<0$ . Therefore, $y^{II}>\frac{b_{X}}{a_{X}+b_{X}}$ , and from which we can conclude that $T_{X}^{II}>0$ for $x\in(0.5,1)$ and $T_{X}^{II}<0$ for $x\in(0,0.5)$ , for any positive $T_{y}$ .

Now consider the case that $a_{Y}<b_{Y}$ . If $T_{y}<T_{I}$ , we have $y^{II}<\frac{b_{X}}{a_{X}+b_{X}}$ , and hence we get $T_{X}^{II}(x,T_{y})<0$ for $x\in(0.5,1)$ , and $T_{X}^{II}(x,T_{y})>0$ for $x\in(0,0.5)$ , which is the first part of the proposition statement. Similarly, if $T_{y}>T_{I}$ , we have $y^{II}>\frac{b_{X}}{a_{X}+b_{X}}$ , from which the second part of the proposition follows.

For the third part and the fourth part, note that $L(x,T_{y})=y^{II}$ in this case as $\frac{\partial y^{II}}{\partial x}=0$ by observing (13), and the sign of the derivative of $T_{X}^{II}$ can be seen from Lemma 3. ∎

D.2 Case 2: $b_{X}<0$

In this case, the first action is a dominating strategy for the first player. Note that both $-(a_{X}+b_{X})$ and $b_{X}$ are not positive, which means that the numerator of (6) is always smaller than or equal to zero. This implies that all QRE correspondences appear on $x\in\left(\frac{1}{2},1\right)$ . In fact, since $y^{II}>0$ for $x\in(1/2,1)$ , the numerator of (6) is always negative, we have $T_{X}^{II}>0$ for $x\in(1/2,1)$ . Also we can easily see that

[TABLE]

This implies that $(1/2,1)$ contains the principal branch. First, we show the result when $a_{Y}+b_{Y}<0$ in the following proposition. Also, the bifurcation diagram is presented in Figure 16.

Proposition 8.

For Case 2, if $a_{Y}+b_{Y}<0$ , then for $x\in(1/2,1)$ , we have $\frac{\partial T_{X}^{II}}{\partial x}<0$ .

Proof.

In this case, $y^{II}$ is monotonic decreasing with $x$ . We can see that

[TABLE]

since $x(1-x)\ln\left(\frac{1}{x}-1\right)\frac{\partial y^{II}}{\partial x}$ is positive for $x\in(1/2,1)$ . Bringing this back to (12), we have $\frac{\partial T_{X}^{II}}{\partial x}<0$ . ∎

For $a_{Y}+b_{Y}>0$ , if $a_{Y}>b_{Y}$ , the bifurcation diagram has the similar trend as in Figure 16; while if $a_{Y}<b_{Y}$ , we lose the continuity on the principal branch.

Proposition 9.

For Case 2, if $a_{Y}+b_{Y}>0$ , then for $x\in(1/2,1)$ , we have

if $a_{Y}>b_{Y}$ , then $\frac{\partial T_{X}^{II}}{\partial x}<0$ . 2. 2.

if $a_{Y}<b_{Y}$ , then $T_{X}$ has at most two local extrema.

Proof.

In this case, $y^{II}$ is monotonic increasing with $x$ . For $a_{Y}>b_{Y}$ , we can find that $y^{II}(1/2,T_{y})>0$ and $L(1/2,T_{y})=y^{II}(1/2,T_{y})>0$ . Also, we can get that $L$ is monotonic increasing for $x\in(1/2,1)$ by inspecting

[TABLE]

Hence, for $x\in(1/2,1)$ , $L(x,T_{y})>0$ . This implies $\frac{\partial T_{X}^{II}}{\partial x}<0$ for $x\in(1/2,1)$ .

For the second part, we can find that for $a_{Y}<b_{Y}$ , $y^{II}(1/2)<1/2$ . Let $x_{2}=\min\left\{1,\frac{b_{Y}}{a_{Y}+b_{Y}}\right\}$ . First note that if $x_{2}<1$ , then for $x>x_{2}$ , we have $y>1/2$ , and further we can get $\frac{\partial L(x,T_{y})}{\partial x}>0$ for $x\in(x_{2},1)$ . We use the same technique as in the proof of the Proposition 4. Let $G(x,T_{y}=(1-2x)+x(1-x)(1-2y^{II})\frac{a_{Y}+b_{Y}}{T_{y}}$ . Note that $G(1/2,T_{y})>0$ and $G(x_{2},T_{y})<0$ . Next, observe that $G(x,T_{y})$ is monotonic decreasing for $x\in\left(\frac{1}{2},x_{2}\right)$ . Hence, there exists a $x^{*}\in(1/2,x_{2})$ such that $G(x^{*},T_{y})=0$ . This $x^{*}$ is a local minimum for $L$ . We can conclude that for $x\in(1/2,1)$ , $L$ has the following shape:

There is a local maximum at $x=1/2$ , where $L(1/2,T_{y})=y(1/2,T_{y})>0$ . 2. 2.

$L$ is decreasing on the interval $x\in(1/2,x^{*})$ , where $x^{*}$ is the solution to $G(x^{*},T_{y})=0$ . 3. 3.

$L$ is increasing on the interval $x\in(x^{*},x_{2})$ . Note that $\lim_{x\rightarrow 1^{-}}L(x,T_{y})=y^{II}(1,T_{y})>0$ .

As a result, if $L(x^{*},T_{y})>\frac{b_{X}}{a_{X}+b_{X}}$ , then $T_{X}^{II}$ is monotonic decreasing; otherwise, $T_{X}^{II}$ has a local minimum and a local maximum on $(1/2,1)$ . ∎

Bibliography44

The reference list from the paper itself. Each links out to its DOI / PubMed record.

1Basanta et al. [2008] David Basanta, Matthias Simon, Haralambos Hatzikirou, and Andreas Deutsch. Evolutionary game theory elucidates the role of glycolysis in glioma progression and invasion. Cell proliferation , 41(6):980–987, 2008.
2Basanta et al. [2012] David Basanta, Jacob G Scott, Mayer N Fishman, Gustavo Ayala, Simon W Hayward, and Alexander RA Anderson. Investigating prostate cancer tumour–stroma interactions: clinical and biological insights from an evolutionary game. British journal of cancer , 106(1):174–181, 2012.
3Bendixson [1901] Ivar Bendixson. Sur les courbes définies par des équations différentielles. Acta Mathematica , 24(1):1–88, 1901.
4Bloembergen et al. [2015] Daan Bloembergen, Karl Tuyls, Daniel Hennes, and Michael Kaisers. Evolutionary dynamics of multi-agent learning: a survey. Journal of Artificial Intelligence Research , 53:659–697, 2015.
5Chastain et al. [2013] Erick Chastain, Adi Livnat, Christos H. Papadimitriou, and Umesh V. Vazirani. Multiplicative updates in coordination games and the theory of evolution. In ITCS , pages 57–58, 2013.
6Chastain et al. [2014] Erick Chastain, Adi Livnat, Christos Papadimitriou, and Umesh Vazirani. Algorithms, games, and evolution. Proceedings of the National Academy of Sciences (PNAS) , 111(29):10620–10623, 2014. doi: 10.1073/pnas.1406556111 . URL http://www.pnas.org/content/early/2014/06/11/1406556111.abstract .
7Cominetti et al. [2010] Roberto Cominetti, Emerson Melo, and Sylvain Sorin. A payoff-based learning procedure and its application to traffic games. Games and Economic Behavior , 70(1):71–83, 2010.
8Coucheney et al. [2013] Pierre Coucheney, Bruno Gaujal, and Panayotis Mertikopoulos. Entropy-driven dynamics and robust learning procedures in games . Ph D thesis, INRIA, 2013.

TL;DR

Contribution

Findings

Abstract

Peer Reviews

Videos

Taxonomy

Bifurcation Mechanism Design – From Optimal Flat Taxes to Improved Cancer Treatments

Abstract

1 Introduction

Our contribution.

2 Preliminaries

2.1 Game Theory Basics: 2×22\times 22×2 games

Definition 1** (Nash equilibrium).**

Definition 2** (Quantal response equilibrium).**

2.2 Efficiency of an equilibrium

Definition 3**.**

Definition 4**.**

3 Our Model

3.1 Q-learning Dynamics

3.2 Convergence of the Q-learning dynamics

3.3 Rescaling the Payoff Matrix

Definition 5** (Hofbauer and Sigmund (1998)).**

Definition 6**.**

4 Hysteresis Effect and Bifurcation Analysis

4.1 Hysteresis effect in Q-learning dynamics: An example

Example 1** (Hysteresis effect).**

4.2 Characterizing QREs

Lemma 1**.**

Proof.

Definition 7**.**

4.3 Coordination Games

Theorem 1** (Direction of the principal branch).**

Theorem 2** (Properties about the second QRE).**

Theorem 3** (Stability).**

4.4 Non-coordination games

5 Mechanism Design

5.1 Hysteresis Mechanism: Select the Best Nash Equilibrium via QRE Dynamics

Theorem 4** (Hysteresis Mechanism).**

Proof.

5.2 Efficiency of QREs: An example

Example 2**.**

5.3 Optimal Control Mechanism: Better Equilibrium with Irrationality

Definition 8**.**

Theorem 5** (Optimal Control Mechanism).**

Proof.

6 Applications

6.1 Taxation

6.2 Evolution of metabolic phenotypes in cancer

7 Connection to previous works

8 Conclusion

9 Supplementary materials

Appendix A From Q-learning to Q-learning Dynamics

A.1 Q-learning Introduction

A.2 Joint-learning Model

A.3 Continuous-time dynamics

A.4 The exploration term increases entropy

Lemma 2**.**

Proof of Lemma 2.

Appendix B Convergence of dissipative learning dynamics in 2×22\times 22×2 games

Liouville’s formula

Theorem 6** (Sandholm [2010], page 356).**

Poincaré-Bendixson theorem

Theorem 7** (Bendixson [1901], Teschl [2012]).**

Bendixson-Dulac theorem

Theorem 8** (Müller and Kuttler [2015], page 210).**

Appendix C Bifurcation Analysis for Games with Only One Nash Equilibrium

Definition 9** (mixed Nash equilibrium).**

C.1 No dominating strategy for the first player

Theorem 9**.**

Theorem 10**.**

C.2 Dominating strategy for the first player

Theorem 11**.**

Appendix D Detailed Bifurcation Analysis for General 2×22\times 22×2 Game

Lemma 3**.**

D.1 Case 1: bX≥0b_{X}\geq 0bX​≥0

Proposition 1**.**

Proof.

D.1.1 Case 1a: bX≥0b_{X}\geq 0bX​≥0, aY+bY>0a_{Y}+b_{Y}>0aY​+bY​>0

Proof of Part 1 of Theorem 1.

2.1 Game Theory Basics: $2\times 2$ games

Definition 1 (Nash equilibrium).

Definition 2 (Quantal response equilibrium).

Definition 3.

Definition 4.

Definition 5 (Hofbauer and Sigmund (1998)).

Definition 6.

Example 1 (Hysteresis effect).

Lemma 1.

Definition 7.

Theorem 1 (Direction of the principal branch).

Theorem 2 (Properties about the second QRE).

Theorem 3 (Stability).

Theorem 4 (Hysteresis Mechanism).

Example 2.

Definition 8.

Theorem 5 (Optimal Control Mechanism).

Lemma 2.

Appendix B Convergence of dissipative learning dynamics in $2\times 2$ games

Theorem 6 (Sandholm [2010], page 356).

Theorem 7 (Bendixson [1901], Teschl [2012]).

Theorem 8 (Müller and Kuttler [2015], page 210).

Definition 9 (mixed Nash equilibrium).

Theorem 9.

Theorem 10.

Theorem 11.

Appendix D Detailed Bifurcation Analysis for General $2\times 2$ Game

Lemma 3.

D.1 Case 1: $b_{X}\geq 0$

Proposition 1.

D.1.1 Case 1a: $b_{X}\geq 0$ , $a_{Y}+b_{Y}>0$

Proposition 2.

Proposition 3.

Proposition 4.

D.1.2 Case 1b: $b_{X}>0$ , $a_{Y}+b_{Y}<0$

Proposition 5.

Proposition 6.

D.1.3 Case 1c: $a_{Y}+b+Y=0$

Proposition 7.

D.2 Case 2: $b_{X}<0$

Proposition 8.

Proposition 9.