Terminal Ranking Games

Erhan Bayraktar; Yuchong Zhang

arXiv:1906.09628·math.OC·June 23, 2020·Math. Oper. Res.

Terminal Ranking Games

Erhan Bayraktar, Yuchong Zhang

PDF

TL;DR

This paper studies a mean field game where agents are rewarded based on project rankings, using Schrödinger bridges to explicitly find equilibria and analyze the impact of reward structures on welfare and efficiency.

Contribution

It introduces a novel application of Schrödinger bridges to explicitly compute equilibria in ranking-based mean field games and addresses mechanism design and welfare analysis.

Findings

01

Explicit equilibrium calculations using Schrödinger bridges.

02

Identification of reward functions for desired equilibria.

03

Analysis of reward inequality effects on welfare and efficiency.

Abstract

We analyze a mean field tournament: a mean field game in which the agents receive rewards according to the ranking of the terminal value of their projects and are subject to cost of effort. Using Schr\"{o}dinger bridges we are able to explicitly calculate the equilibrium. This allows us to identify the reward functions which would yield a desired equilibrium and solve several related mechanism design problems. We are also able to identify the effect of reward inequality on the players' welfare as well as calculate the price of anarchy.

Equations264

X_{i, t} = x_{0} + \int_{0}^{t} a_{i, s} d s + σ B_{i, t} .

X_{i, t} = x_{0} + \int_{0}^{t} a_{i, s} d s + σ B_{i, t} .

a sup E [R_{\tilde{μ}} (X_{T}) - \int_{0}^{T} c a_{t}^{2} d t] where X_{t} = x_{0} + \int_{0}^{t} a_{s} d s + σ B_{t} .

a sup E [R_{\tilde{μ}} (X_{T}) - \int_{0}^{T} c a_{t}^{2} d t] where X_{t} = x_{0} + \int_{0}^{t} a_{s} d s + σ B_{t} .

d X_{t} = a_{t} d t + σ d B_{t}^{Q}

d X_{t} = a_{t} d t + σ d B_{t}^{Q}

H (Q ∣ P) := E^{Q} [ln (\frac{d Q}{d P})] = \frac{1}{2 c σ ^{2}} E^{Q} [\int_{0}^{T} c a_{t}^{2} d t] .

H (Q ∣ P) := E^{Q} [ln (\frac{d Q}{d P})] = \frac{1}{2 c σ ^{2}} E^{Q} [\int_{0}^{T} c a_{t}^{2} d t] .

V (R, \tilde{μ}) := Q \in Q sup E^{Q} [R_{\tilde{μ}} (X_{T})] - 2 c σ^{2} H (Q ∣ P), X_{t} = x_{0} + σ ω_{t} .

V (R, \tilde{μ}) := Q \in Q sup E^{Q} [R_{\tilde{μ}} (X_{T})] - 2 c σ^{2} H (Q ∣ P), X_{t} = x_{0} + σ ω_{t} .

Q \in P (Ω) minimize H (Q ∣ P) subject to Q_{0} = ν, Q_{T} = μ,

Q \in P (Ω) minimize H (Q ∣ P) subject to Q_{0} = ν, Q_{T} = μ,

Q^{*} (\cdot) = \int_{R^{2}} P^{x, y} (\cdot) π^{*} (d x, d y),

Q^{*} (\cdot) = \int_{R^{2}} P^{x, y} (\cdot) π^{*} (d x, d y),

π \in P (R^{2}) minimize H (π ∣ P_{0, T}) subject to π_{0} = ν, π_{T} = μ .

π \in P (R^{2}) minimize H (π ∣ P_{0, T}) subject to π_{0} = ν, π_{T} = μ .

V (R, \tilde{μ})

V (R, \tilde{μ})

\leq μ \in P (R) : μ \sim P_{T} sup Q \in P (Ω) : Q_{0} = ν, Q_{T} = μ sup \int_{R} R_{\tilde{μ}} (x) d μ (x) - 2 c σ^{2} H (Q ∣ P)

= μ \in P (R) : μ \sim P_{T} sup \int_{R} R_{\tilde{μ}} (x) d μ (x) - 2 c σ^{2} Q \in P (Ω) : Q_{0} = ν, Q_{T} = μ in f H (Q ∣ P)

= μ \in P (R) : μ \sim P_{T} sup \int_{R} R_{\tilde{μ}} (x) d μ (x) - 2 c σ^{2} π \in P (R^{2}) : π_{0} = ν, π_{T} = μ in f H (π ∣ P_{0, T})

= μ \in P (R) : μ \sim P_{T} sup \int_{R} R_{\tilde{μ}} (x) d μ (x) - 2 c σ^{2} H (μ ∣ N (x_{0}, σ^{2} T)) .

V (R, \tilde{μ}) = μ \in P (R) : μ \sim P_{T} sup \int_{R} R_{\tilde{μ}} (x) d μ (x) - 2 c σ^{2} H (μ ∣ N (x_{0}, σ^{2} T)) .

V (R, \tilde{μ}) = μ \in P (R) : μ \sim P_{T} sup \int_{R} R_{\tilde{μ}} (x) d μ (x) - 2 c σ^{2} H (μ ∣ N (x_{0}, σ^{2} T)) .

f_{0} (x) := \frac{1}{σ T} φ (\frac{x - x _{0}}{σ T}) .

f_{0} (x) := \frac{1}{σ T} φ (\frac{x - x _{0}}{σ T}) .

V (R, \tilde{μ}) = f_{μ} > 0 : \int_{R} f_{μ} (x) d x = 1 sup \int_{R} {R_{\tilde{μ}} (x) - 2 c σ^{2} ln (\frac{f _{μ} ( x )}{f _{0} ( x )})} f_{μ} (x) d x .

V (R, \tilde{μ}) = f_{μ} > 0 : \int_{R} f_{μ} (x) d x = 1 sup \int_{R} {R_{\tilde{μ}} (x) - 2 c σ^{2} ln (\frac{f _{μ} ( x )}{f _{0} ( x )})} f_{μ} (x) d x .

β (\tilde{μ}) := \int_{R} f_{0} (y) exp (\frac{R _{\tilde{μ}} ( y )}{2 c σ ^{2}}) d y,

β (\tilde{μ}) := \int_{R} f_{0} (y) exp (\frac{R _{\tilde{μ}} ( y )}{2 c σ ^{2}}) d y,

f_{μ^{*}} (x) = \frac{1}{β ( μ ~ )} f_{0} (x) exp (\frac{R _{\tilde{μ}} ( x )}{2 c σ ^{2}}) .

f_{μ^{*}} (x) = \frac{1}{β ( μ ~ )} f_{0} (x) exp (\frac{R _{\tilde{μ}} ( x )}{2 c σ ^{2}}) .

V (R, \tilde{μ})

V (R, \tilde{μ})

= z (x) > 0, \int z (x) d μ_{0} (x) = 1 sup \int_{R} (R_{\tilde{μ}} (x) - 2 c σ^{2} ln z (x)) z (x) d μ_{0} (x) .

\frac{d μ ^{*}}{d μ _{0}} (x) = \frac{exp ( \frac{R _{\tilde{μ}} ( x )}{2 c σ ^{2}} )}{\int exp ( \frac{R _{\tilde{μ}} ( y )}{2 c σ ^{2}} ) d μ _{0} ( y )},

\frac{d μ ^{*}}{d μ _{0}} (x) = \frac{exp ( \frac{R _{\tilde{μ}} ( x )}{2 c σ ^{2}} )}{\int exp ( \frac{R _{\tilde{μ}} ( y )}{2 c σ ^{2}} ) d μ _{0} ( y )},

p (t, x, s, y) := \frac{1}{σ s - t} φ (\frac{y - x}{σ s - t}) .

p (t, x, s, y) := \frac{1}{σ s - t} φ (\frac{y - x}{σ s - t}) .

ψ (t, x) := \int_{R} p (t, x, T, y) \frac{f _{μ^{*}} ( y )}{f _{0} ( y )} d y, \hat{ψ} (t, x) = p (0, x_{0}, t, x) .

ψ (t, x) := \int_{R} p (t, x, T, y) \frac{f _{μ^{*}} ( y )}{f _{0} ( y )} d y, \hat{ψ} (t, x) = p (0, x_{0}, t, x) .

ψ (t, x) = \int_{R} p (t, x, T, y) ψ (T, y) d y, and \hat{ψ} (t, x) = \int_{R} p (0, y, t, x) \hat{ψ} (0, y) d y .

ψ (t, x) = \int_{R} p (t, x, T, y) ψ (T, y) d y, and \hat{ψ} (t, x) = \int_{R} p (0, y, t, x) \hat{ψ} (0, y) d y .

a^{*} (t, x) = σ^{2} \partial_{x} ln ψ (t, x) .

a^{*} (t, x) = σ^{2} \partial_{x} ln ψ (t, x) .

ψ (t, x) = \frac{1}{β ( μ ~ )} E [exp (\frac{R _{\tilde{μ}} ( x + σ T - t Z )}{2 c σ ^{2}})], Z \sim N (0, 1) .

ψ (t, x) = \frac{1}{β ( μ ~ )} E [exp (\frac{R _{\tilde{μ}} ( x + σ T - t Z )}{2 c σ ^{2}})], Z \sim N (0, 1) .

a^{*} (t, x) = σ^{2} \frac{u _{x} ( t , x )}{u ( t , x )} .

a^{*} (t, x) = σ^{2} \frac{u _{x} ( t , x )}{u ( t , x )} .

β (μ) = \int_{R} f_{0} (y) exp (\frac{R _{μ} ( y )}{2 c σ ^{2}}) d y < \infty.

β (μ) = \int_{R} f_{0} (y) exp (\frac{R _{μ} ( y )}{2 c σ ^{2}}) d y < \infty.

f_{μ} (x) = \frac{1}{β ( μ )} f_{0} (x) exp (\frac{R _{μ} ( x )}{2 c σ ^{2}}) .

f_{μ} (x) = \frac{1}{β ( μ )} f_{0} (x) exp (\frac{R _{μ} ( x )}{2 c σ ^{2}}) .

R_{b}^{r m} := {R \in R_{b} : R (x, r, m) is independent of x and continuous in m},

R_{b}^{r m} := {R \in R_{b} : R (x, r, m) is independent of x and continuous in m},

q_{μ} (r) = x_{0} + σ T N^{- 1} \frac{\int _{0}^{r} exp ( - \frac{R ( z , m _{μ} )}{2 c σ ^{2}} ) d z}{\int _{0}^{1} exp ( - \frac{R ( z , m _{μ} )}{2 c σ ^{2}} ) d z},

q_{μ} (r) = x_{0} + σ T N^{- 1} \frac{\int _{0}^{r} exp ( - \frac{R ( z , m _{μ} )}{2 c σ ^{2}} ) d z}{\int _{0}^{1} exp ( - \frac{R ( z , m _{μ} )}{2 c σ ^{2}} ) d z},

m = x_{0} + σ T \int_{0}^{1} N^{- 1} \frac{\int _{0}^{r} exp ( - \frac{R ( z , m )}{2 c σ ^{2}} ) d z}{\int _{0}^{1} exp ( - \frac{R ( z , m )}{2 c σ ^{2}} ) d z} d r .

m = x_{0} + σ T \int_{0}^{1} N^{- 1} \frac{\int _{0}^{r} exp ( - \frac{R ( z , m )}{2 c σ ^{2}} ) d z}{\int _{0}^{1} exp ( - \frac{R ( z , m )}{2 c σ ^{2}} ) d z} d r .

V (R, μ) = 2 c σ^{2} ln β (μ) = - 2 c σ^{2} ln (\int_{0}^{1} exp (- \frac{R ( z , m _{μ} )}{2 c σ ^{2}}) d z) .

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Full text

Terminal Ranking Games††thanks:

Erhan Bayraktar is supported in part by the NSF under grant DMS-1613170 and by the Susan M. Smith Professorship. We are grateful to Jakša Cvitanić for stimulating discussions.

Erhan Bayraktar Department of Mathematics, University of Michigan, 530 Church Street, Ann Arbor, MI 48104, USA, [email protected]

Yuchong Zhang Department of Statistical Sciences, University of Toronto, 100 St. George Street, Toronto, Ontario M5S 3G3, Canada, [email protected]

Abstract

We analyze a mean field tournament: a mean field game in which the agents receive rewards according to the ranking of the terminal value of their projects and are subject to cost of effort. Using Schrödinger bridges we are able to explicitly calculate the equilibrium. This allows us to identify the reward functions which would yield a desired equilibrium and solve several related mechanism design problems. We are also able to identify the effect of reward inequality on the players’ welfare as well as calculate the price of anarchy.

Keywords: Tournaments, rank-based rewards, mechanism design, mean field games, price of anarchy, Schrödinger bridges, Lorenz order.

2020 Mathematics Subject Classification: 91A16, 91B43, 93E20

1 Introduction

Consider the following tournament: each player (indexed by $i\in\{1,\ldots,N\}$ ) exerts an effort, which we denote by $a_{i}$ , to move the value of her project/state, which is modeled as a drifted Brownian motion:

[TABLE]

We assume $B_{1},\ldots,B_{N}$ are independent. The cost of effort per unit time is assumed to be quadratic in $a_{i}$ with coefficient $c$ . The game ends at time $T>0$ , when each player receives a reward that is a deterministic function of three components:

•

Her terminal project value $X_{i,T}$ ;

•

The ranking of $X_{i,T}$ relative to other players, measured by the fraction $\frac{1}{N}\sum_{j=1}^{N}1_{\{X_{j,T}\leq X_{i,T}\}}$ of players having equal or worse performance (so that the top performer has rank one and the bottom performer has rank $1/N$ );

•

Statistics of the population performance, such as population mean $\frac{1}{N}\sum_{j=1}^{N}X_{j,T}$ or the $k$ -th order statistic of $X_{1,T},\ldots,X_{N,T}$ or both. This allows us to cover the case when the “reward pie” is not fixed, but grows with the total production or the $k$ -th best performance. For simplicity of the presentation, we only consider dependence via the population mean.

In this paper, we will analyze the mean field game associated with the above $N$ -player game, and explicitly characterize the equilibrium (see Section 3), improving on the results of Bayraktar and Zhang [2] which dealt only with the abstract existence and uniqueness of the mean-field equilibrium. Analysis of mean-field games is useful in solving $N$ -player games when $N$ is large, since it has been shown in Bayraktar and Zhang [2] that the mean-field equilibrium can be used to construct an approximate Nash equilibria for the finite player games.

Our explicit characterization, which is rare in mean field games, allows us to solve tournament design problems. Specifically, we determine in Section 5 the reward functions that maximize the rank- $\alpha$ performance, the net profit (for the tournament planner), and the total effort, respectively. Moreover, in Section 6, we also compute the so-called price of anarchy which measures the efficiency loss due to decentralization; see e.g. Lacker and Ramanan [13], Carmona et al. [5], and Cardaliaguet and Rainer [3].

Mean field games, introduced simultaneously by Lasry and Lions [14, 15, 16], and Huang, Caines and Malhamé [12, 11] (see also the two-volume book of Carmona and Delarue [4] for an extensive overview), analyze games with a large number of players which are weakly interacting through their empirical distribution. The main appeal of the mean field games is the decentralized structure of their equilibria: agents compute their best response to a given population distribution, which is then determined by a fixed point problem. The best response calculation is a pure stochastic control problem. Instead of working with the Hamilton-Jacobi-Bellman equation, we perform the calculation using Schrödinger bridges which can be seen as the stochastic analogue of quadratic optimal transport. (See Léonard [18] and Chen et al. [7] for an overview of Schrödinger bridges and their connection to optimal transport.) We first introduce an auxiliary terminal distribution for the state (to convert the problem to an optimal transport problem), and then optimize over all such terminal distributions. This approach allows us to reformulate the best response problem as a static calculus of variation problem, which we then explicitly solve. This leads us to the next stage, the fixed point equation, whose solutions can be explicitly determined through its quantiles.

The distinguishing feature of our mean field game, i.e., tournament, is the rank-based feature of the reward. In particular, each player is rewarded according to the ranking of the terminal value of their project relative to the population, subject to cost of effort. This makes the analysis of the problem more difficult since the mean field interaction is non-local in the measure and the rank function is not regular. This problem was suggested by Guéant et al. [10] as a model in oil production, analyzed using abstract tools in the weak formulation by Carmona and Lacker [6] and in the strong formulation by Bayraktar and Zhang [2]. In these works continuity with respect to the rank was assumed. Related tournament games where the players are ranked according to their completion times have been considered by Bayraktar et al. [1] for controlled Brownian motion dynamics and by Nutz and Zhang [21] for one-stage Poisson dynamics with controlled jump intensity. In the Appendix we are going to construct an extension of Schrödinger bridges from space to time which can then be applied to construct the equilibrium in Bayraktar et al. [1] as well.

In economics there is a substantial literature on tournaments, going back to Lazear and Rosen [17]; see Bayraktar et al. [1] for a review. Most of these works focus on finitely many players or static models. Using such a one shot model, Fang et al. [8] analyze the discouraging effects of inequality. In our paper we observe a similar phenomenon, in that the more unequal the reward is (in the Lorenz order, see e.g. Marshall et al. [20]) the smaller the game value for each player. However, unlike in the work of Fang et al. [8], the same is not true for the effort in our set-up: the most fair distribution induces agents to put forth zero effort. Hence, one of the questions in the mechanism design section we investigate is what reward function maximizes the cumulative effort. We also analyze the case when agents have a social planner doing the optimization, which is used in computing the price of anarchy.

The rest of the paper is organized as follows: In Section 2 we consider the single player’s problem and find her best response using Schrödinger bridges. We then explicitly compute the mean field equilibrium in Section 3 and show the effect of reward inequality on the well-being of the players in Section 4. Section 5 is where we investigate the tournament design problems with respect to several criteria. In Section 6 we compute the price of anarchy. Finally, in Appendix A we show how one can adapt the Schrödinger bridge approach to the completion time ranking game of Bayraktar et al. [1].

2 A single player’s problem

Let us first describe the incentives of the player: We call $R(x,r,m):\mathbb{R}\times[0,1]\times\mathbb{R}\rightarrow\mathbb{R}\cup\{\pm\infty\}$ a reward function if it is increasing in all of its arguments111Throughout the paper, increasing and decreasing are understood in the weak sense., $\mathbb{R}$ -valued if $r\in(0,1)$ , and satisfies $\int_{0}^{1}R(x,r,m)dr<\infty$ for all $(x,m)\in\mathbb{R}^{2}$ . Denote the set of reward functions by $\mathcal{R}$ and the set of bounded reward functions by $\mathcal{R}_{b}$ .

Given the distribution $\tilde{\mu}\in\mathcal{P}(\mathbb{R})$ of the terminal project value of the population, we wish to find the best response to $\tilde{\mu}$ for a representative player. For any $\mu\in\mathcal{P}(\mathbb{R})$ , write $F_{\mu}$ for the cumulative distribution function (c.d.f.) of $\mu$ and $R_{\mu}(x)$ for $R(x,F_{\mu}(x),\int_{\mathbb{R}}yd\mu(y))$ . A representative player with cost parameter $c$ solves the following stochastic control problem:

[TABLE]

Here $a$ is admissible if it is progressively measurable and satisfies $E\int_{0}^{T}|a_{s}|ds<\infty$ . Different from Bayraktar and Zhang [2], let us consider the weak formulation of the above problem, which has some interesting connection with optimal transport.

Let $\Omega=C([0,T],\mathbb{R})$ be the Wiener space and $\mathbb{W}_{x}$ be the Wiener measure starting at $x$ at $t=0$ . Under $\mathbb{W}_{0}$ , the canonical process $\omega_{t}$ is a Brownian motion, and thus $X_{t}:=x_{0}+\sigma\omega_{t}$ represents the project value process under zero effort. Let $P$ be the law of $X$ under $\mathbb{W}_{0}$ , and $(\mathcal{F}_{t})_{t\in[0,T]}$ be the filtration generated by the process $\omega_{t}$ .

For any $Q\in\mathcal{P}(\Omega)$ such that $Q\sim P$ , the Girsanov theorem implies that we can find an adapted process $a_{t}$ such that

[TABLE]

for some $Q$ -Brownian motion $B^{Q}$ . Conversely, given any sufficiently integrable adapted process $a_{t}$ , we can define $Q\sim P$ such that the above equation holds. This means that if we restrict ourselves to sufficiently integrable effort process, we can identify $\mathcal{Q}:=\{Q\in\mathcal{P}(\Omega):Q\sim P\}$ with the set of laws of the controlled project value process $X$ . Moreover, let $H(\cdot|\cdot)$ denote the relative entropy, with the convention that $H(Q|P)=\infty$ if $Q$ is not absolutely continuous with respect to $P$ . We have

[TABLE]

Thus, we take the following as our definition of the single player’s control problem:

[TABLE]

Remark 2.1.

Here to keep notation simple, we define the filtration to be the one generated by the canonical process, but similar to Bayraktar et al. [1, Remark 2.1], all results remain valid if we take $(\mathcal{F}_{t})$ to be a larger filtration for which $\omega_{t}$ remains a Brownian motion.

2.1 Reduction via Schrödinger bridges

Let $X_{t}=x_{0}+\sigma\omega_{t}$ and $P=\mathbb{W}_{0}\circ X^{-1}$ as before; $P$ will serve as our reference measure.222In standard Schrödinger bridge problem, the reference measure $P$ is usually taken to be the stationary Wiener measure $\int\mathbb{W}_{x}dx$ , but the disintegration argument works for any non-zero, non-negative, $\sigma$ -finite measure on $\Omega$ . For any $Q\in\mathcal{P}(\Omega)$ , write $Q_{t}$ for the time- $t$ marginal of $Q$ , and $Q_{0,T}$ for the joint distribution of $Q$ at time [math] and $T$ . Given a source distribution $\nu=\delta_{x_{0}}$ and a target distribution $\mu$ , the Schrödinger bridge problem looks for an entropy-minimizing transport from $\nu$ to $\mu$ :

[TABLE]

It is known, by a simple disintegration, that the solution to the Schrödinger bridge problem is given by (see Föllmer [9] and also Léonard [18], Chen et al. [7])

[TABLE]

where $P^{x,y}:=P(\cdot|X_{0}=x,X_{T}=y)$ is the law of a scaled Brownian bridge (scaled by $\sigma$ ), and $\pi^{*}$ is the solution to the following static optimization, assuming it exists:

[TABLE]

In addition, it holds that $H(Q^{*}|P)=H(\pi^{*}|P_{0,T})$ . Since $\nu=P_{0}=\delta_{x_{0}}$ , the static problem (2.5) is trivial, giving a minimum entropy of $H(\mu|\mathcal{N}(x_{0},\sigma^{2}T))$ .

Going back to our control problem (2.3), by splitting the optimization over $Q$ to a maximization over its time- $T$ marginal plus a constrained entropy minimization, we can utilize the equivalence between (2.4) and (2.5) and obtain

[TABLE]

Since $\int_{\mathbb{R}^{2}}P^{x_{0},y}(\cdot)\mu(dy)\in\mathcal{Q}$ for any $\mu\sim P_{T}$ , the inequality is in fact an equality. Thus,

[TABLE]

Let $\varphi$ be the standard normal probability density function (p.d.f.) and introduce

[TABLE]

We finally arrive at a constrained calculus of variation problem over the p.d.f. of $\mu$ :

[TABLE]

which can be easily solved by the method of Lagrange multipliers.333Alternatively, one can directly drop the integral constraint, by observing that any non-negative function $f\not\equiv 0$ can be normalized to have integral equal to one. The solution is provided below without proof. Once we find the optimal marginal $\mu^{*}$ , we can recover $Q^{*}$ by $Q^{*}(\cdot)=\int_{\mathbb{R}^{2}}P^{x_{0},y}(\cdot)\mu^{*}(dy).$

Proposition 2.1.

Given $R\in\mathcal{R}$ and $\tilde{\mu}\in\mathcal{P}(\mathbb{R})$ . Let

[TABLE]

where $f_{0}$ is defined in (2.7). Suppose $\beta(\tilde{\mu})<\infty$ . Then the optimal terminal distribution $\mu^{*}$ of the single player has p.d.f.

[TABLE]

The optimal value is given by $V(R,\tilde{\mu})=2c\sigma^{2}\ln\beta(\tilde{\mu}).$

Remark 2.2.

The Schrödinger bridge approach can also be adapted to the hitting time ranking game of Bayraktar et al. [1]. This calls for a variant of the Schrödinger bridge problem where the target distribution is not the time- $T$ marginal, but the law of first passage time of level zero. We detail this digression in the appendix for the interested readers.

Remark 2.3.

The model assumption that $X_{T}$ has a Gaussian $\mathbb{W}_{0}$ -density is not essential in the best-response step; what matters is that the cost of effort can be written as a relative entropy $H(Q|P)$ where $P$ and $Q$ are the laws of the state process $X$ corresponding to zero effort and general effort, respectively. Suppose the $\mathbb{W}_{0}$ -distribution of $X_{T}$ is $\mu_{0}$ . Using the equivalence between the dynamic and static Schrödinger bridge problems, equations (2.6) and (2.8) hold with $\mathcal{N}(x_{0},\sigma^{2}T)$ replaced by $\mu_{0}$ and $f_{\mu}$ replaced by $z=d\mu/d\mu_{0}$ :

[TABLE]

Using the method of Lagrange multipliers, one finds that the optimal $\mu^{*}$ is given by

[TABLE]

which is similar to (2.9). The derivation of the fixed point (see the proof of Theorem 3.2), on the other hand, relies on the existence of a density, but not on the Gaussian property.

2.2 Optimal effort

The Schrödinger bridge approach allows us to compute the optimal target distribution easily, which is all we need to analyze equilibrium measures (see Section 3 for details). On the other hand, to get a more explicit description of the optimal effort, ideally as a feedback function $a^{*}(t,x)$ of time and state, we still need to go back to the dynamic control formulation of the Schrödinger bridge problem. We can utilize some existing results in, for example, Chen et al. [7].

Recall that under the reference measure $P$ , the canonical process is a scaled Brownian motion with transition density

[TABLE]

Note that $f_{0}(y)=p(0,x_{0},T,y)$ . Define

[TABLE]

It can be easily checked that $\psi,\hat{\psi}$ satisfy $\psi(0,x)\hat{\psi}(0,x)=\delta_{x_{0}}(x)$ , $\psi(T,x)\hat{\psi}(T,x)=f_{\mu^{*}}(x)$ ,

[TABLE]

By Chen et al. [7, p. 679-680], the optimal coupling $Q^{*}$ has Markovian drift $a^{*}$ given by

[TABLE]

Using (2.9), we obtain

[TABLE]

Comparing with Bayraktar and Zhang [2, eq. (3.3)], we see that $u(t,x):=\beta(\tilde{\mu})\psi(t,x)$ is precisely the Cole-Hopf transformation of the value function of the original control problem (2.1). Replacing $\psi$ by $u$ , we recover the same optimal Markovian control as Bayraktar and Zhang [2]:

[TABLE]

When $R$ is bounded, it is shown in Bayraktar and Zhang [2] that $\lim_{x\rightarrow\pm\infty}a^{*}(t,x)=0$ , meaning players show slackness when having a very big lead, and give up when falling far behind.

Remark 2.4.

For bounded rewards, Bayraktar and Zhang [2] also showed that the controlled diffusion $dX_{t}=a^{*}(t,X_{t})dt+\sigma dB_{t}$ in fact has a unique strong solution. From there, one can mimic the change of measure technique in Bayraktar et al. [1] to obtain the optimal terminal distribution (2.9). An advantage of the weak formulation, beside the connection to optimal transport theory, is that it avoids the hassle of having to verify the regularity of $a^{*}$ near $x=0$ .

3 Characterization of equilibrium

We say $\mu\in\mathcal{P}(\mathbb{R})$ is an equilibrium (terminal distribution) if it is a fixed point of the best response mapping: $\tilde{\mu}\mapsto Q_{T}$ , where $Q\in\mathcal{Q}$ is the optimal control for $V(R,\tilde{\mu})$ . By (2.9), we have the following characterization for general rewards functions.

Theorem 3.1.

Let $R\in\mathcal{R}$ and $\mu\in\mathcal{P}(\mathbb{R})$ satisfy

[TABLE]

(The above condition always holds when $R\in\mathcal{R}_{b}$ .) Then $\mu$ is an equilibrium if and only if it has a strictly positive density satisfying

[TABLE]

The associated game value is given by $V(R,\mu)=2c\sigma^{2}\ln\beta(\mu)$ .

Specializing to the subclass of reward functions

[TABLE]

we obtain a semi-explicit characterization.

Theorem 3.2.

Suppose $R\in\mathcal{R}_{b}^{rm}$ . Then there exists at least one equilibrium. $\mu\in\mathcal{P}(\mathbb{R})$ is an equilibrium terminal distribution of the project value if and only if its quantile function $q_{\mu}$ satisfies

[TABLE]

where $N(\cdot)$ is the standard normal c.d.f. and $m_{\mu}=\int_{-\infty}^{\infty}yd\mu(y)$ is a solution of

[TABLE]

The associated game value is given by

[TABLE]

Proof.

Since $R$ is bounded, we only need to look for solutions of the fixed point equation (3.1). Let $y(\cdot)$ be the c.d.f. of the random variable $F_{\mu}(\mathcal{N}(x_{0},\sigma^{2}\sqrt{T}))$ , i.e.

[TABLE]

Since any fixed point $\mu$ has a positive density, we can differentiate $y(r)$ and use (3.1) to get

[TABLE]

Using $y(0)=0$ and $y(1)=1$ , we find that

[TABLE]

and

[TABLE]

It follows that

[TABLE]

from which we get (3.2). To determine $m_{\mu}$ , we integrate (3.2) from $r=0$ to $r=1$ and use that $m_{\mu}=\int_{0}^{1}q_{\mu}(r)dr$ . This leads to equation (3.3). It remains to show that (3.3) has a solution.

Let $g(m)$ be the right hand side of (3.3). We want to show $g$ has a fixed point. Since $R$ is bounded, it can be shown that $C^{-1}\leq y^{\prime}(r)\leq C$ where

[TABLE]

It follows that

[TABLE]

So the range of $g$ is contained in a compact interval. Moreover, $g$ is continuous on this interval since $R$ is assumed to be continuous in $m$ . By Brouwer’s fixed point theorem, $g$ has a fixed point. ∎

Remark 3.1.

Observe that the equilibrium distribution $\mu$ does not change if we add any bounded function $\kappa(m)$ to the reward. In other words, any bounded compensation that is solely based on the mean performance of the population does not really incentivize the players.

Remark 3.2.

When $R\in\mathcal{R}^{rm}_{b}$ is further independent of $m$ (i.e. purely rank-based), the equilibrium is unique. In this case, the total effort of the population (or the expected cumulative effort of a representative player) is given by

[TABLE]

Remark 3.3.

If we confine ourselves to the subclass of equilibria which satisfy $\beta(\mu)<\infty$ , then all results in this section can be restated with $R\in\mathcal{R}^{rm}$ which is obtained from $\mathcal{R}^{rm}_{b}$ by dropping the boundedness requirement.

In the next two sections, we focus on bounded rewards that are purely rank-based:

[TABLE]

Each of these rewards induces a unique equilibrium, which facilitates the study of comparative statics and optimal reward design. In this case, we write $\mathcal{V}(R)$ for the unique game value.

4 Effect of reward inequality

Definition 4.1.

Given two reward functions $R,\tilde{R}\in\mathcal{R}_{b}^{r}$ , we say $R$ is more unequal than $\tilde{R}$ in Lorenz order (or $R$ majorizes $\tilde{R}$ ), written as $R\succ\tilde{R}$ , if $\int_{0}^{1}R(r)dr=\int_{0}^{1}\tilde{R}(r)dr$ and

[TABLE]

Theorem 4.1.

Suppose $R,\tilde{R}\in\mathcal{R}_{b}^{r}$ and $R\succ\tilde{R}$ , then the associated game values satisfy $\mathcal{V}(R)\leq\mathcal{V}(\tilde{R})$ ; that is, reward inequality decreases the game value.

Proof.

First assume $R,\tilde{R}\in\mathcal{R}^{r}_{n}$ , where $\mathcal{R}^{r}_{n}$ is the set of piecewise constant reward functions of the form

[TABLE]

In this case, the Lorenz order translates to $\sum_{i=1}^{n}R_{i}=\sum_{i=1}^{n}\tilde{R}_{i}$ and $\sum_{i=1}^{k}R_{i}\leq\sum_{i=1}^{k}\tilde{R}_{i}$ for all $k\in\{1,\ldots,n\}$ ; that is, $(R_{1},\ldots,R_{n})$ majorizes $(\tilde{R}_{1},\ldots,\tilde{R}_{n})$ . By Marshall et al. [20, Proposition 4.B.1], $\tilde{R}\prec R$ if and only if $\sum_{i=1}^{n}g(\tilde{R}_{i})\leq\sum_{i=1}^{n}g(R_{i})$ for all continuous convex functions $g$ . Take $g(x)=\exp(-\frac{x}{2c\sigma^{2}})$ , we obtain

[TABLE]

which is equivalent to $\mathcal{V}(\tilde{R})\geq\mathcal{V}(R)$ . This finishes the proof for piecewise constant reward functions.

For general $R,\tilde{R}\in\mathcal{R}_{b}^{r}$ , we approximate $\mathcal{V}(R)$ and $\mathcal{V}(\tilde{R})$ by the Riemann sums $\mathcal{V}(R^{(n)})$ and $\mathcal{V}(\tilde{R}^{(n)})$ , respectively, where $R^{(n)},\tilde{R}^{(n)}\in\mathcal{R}^{r}_{n}$ . Moreover, by the mean value theorem, $R^{(n)},\tilde{R}^{(n)}$ can always be chosen to satisfy $\sum_{i=1}^{k}R^{(n)}_{i}=\int_{0}^{k/n}R(r)dr$ and $\sum_{i=1}^{k}\tilde{R}^{(n)}_{i}=\int_{0}^{k/n}\tilde{R}(r)dr$ for all $k\in\{1,\ldots,n\}$ . This ensures that the discretization preserves the Lorenz order. The result then follows from the previous step and passing to the limit. ∎

Remark 4.1.

The maximum game value is attained by the most equal reward function, namely, the uniform reward. This can also be directly seen from Jenssen’s inequality:

[TABLE]

with equality attained if and only if $R$ is constant. From another perspective, the expected reward in equilibrium is always equal to $\int_{0}^{1}R(r)dr$ by symmetry, while the expected cost of effort is minimized to zero under the uniform reward, when nobody exerts any effort. Since uniform reward induces zero effort, the expected total effort clearly does not have the same monotonicity as the game value with respect to reward inequality (cf. Section 5.4).

5 Tournament design

Denote the mapping from $R\in\mathcal{R}_{b}^{r}$ to the unique equilibrium $\mu$ by

[TABLE]

From (3.2), we see that $\mathcal{E}$ is translation invariant, i.e. $\mathcal{E}(R+C)=\mathcal{E}(R)$ for any constant $C$ . Let $\mathcal{P}^{+}(\mathbb{R})$ be the set of probability measures on $\mathbb{R}$ that have strictly positive density. For $\mu\in\mathcal{P}^{+}(\mathbb{R})$ , define the normalized density

[TABLE]

5.1 Realizing a target equilibrium distribution

Suppose the principal has in mind a target distribution $\mu$ of the terminal project value in equilibrium. He wants to know whether that is feasible via a purely rank-based reward, and if yes, how should he design the reward to achieve it? The following theorem completely characterizes the set of feasible equilibria and the reward functions that induce them.

Theorem 5.1.

(i)

The set of equilibria attainable by a purely rank-based reward is given by

[TABLE]

(ii)

If $\mu\in\mathcal{E}(\mathcal{R}_{b}^{r})$ , then

[TABLE]

(iii)

Suppose we impose additional reservation “utility” constraint $\mathcal{V}(R)\geq V_{0}$ and budget constraint $\int_{0}^{1}R(r)dr\leq K$ , then the constant $C$ in (ii) is restricted to

[TABLE]

where

[TABLE]

In particular, such a $C$ exists if and only if

[TABLE]

Proof.

(i) From Theorem 3.1, we know that the normalized density $\zeta_{\mu}$ of any equilibrium $\mu$ is increasing and log-bounded. Conversely, given any $\mu\in\mathcal{P}^{+}(\mathbb{R})$ with such properties, it is easy to check that $\mu$ satisfies (3.2) with purely rank-based reward function $R_{0}(r)=2c\sigma^{2}\ln\zeta_{\mu}(q_{\mu}(r))$ :

[TABLE]

(ii) If $R(r)$ is another function in $\mathcal{R}_{b}^{r}$ that attains $\mu$ in equilibrium, then

[TABLE]

by (3.2). Differentiating both sides with respect to $y$ and setting $y=q_{\mu}(r)$ , we obtain

[TABLE]

Since the left hand side is independent of $r$ , $R-R_{0}$ must be constant.

(iii) Let $R(r)=R_{0}(r)+C$ be a reward function realizing $\mu$ in equilibrium. By Theorem 3.2, the game value $\mathcal{V}(R)=\mathcal{V}(R_{0})+C=C$ . Hence $\mathcal{V}(R)\geq V_{0}$ if and only if $C\geq V_{0}$ . We also have

[TABLE]

So $\int_{0}^{1}R(r)dr\leq K$ if and only if $C\leq K-2c\sigma^{2}H\left(\mu|\mathcal{N}(x_{0},\sigma^{2}T)\right)$ . ∎

Theorem 5.1 allows us to convert many optimal reward design problems into problems about finding the optimal target equilibrium distribution. We gave three solvable examples below.

5.2 Maximizing rank- $\alpha$ performance

Fix a number $\alpha\in(0,1)$ , a reservation utility $V_{0}$ and a budget $K\geq V_{0}$ . We look for a reward function $R\in\mathcal{R}^{r}_{b}$ which meets both the reservation utility requirement and the budget constraint, and which maximizes the $\alpha$ -quantile of $\mathcal{E}(R)$ . Define the set of feasible reward functions by

[TABLE]

The optimization problem reads

[TABLE]

Theorem 5.2.

The optimal quantile $Q(\alpha)$ is uniquely attained (up to a.e. equivalence) by the step function

[TABLE]

where $x_{\alpha}$ is the unique solution in $[\alpha,1)$ to the equation

[TABLE]

Let $\mu^{*}=\mathcal{E}(R^{*})$ and $f_{0}$ be given by (2.7). We have

[TABLE]

and

[TABLE]

Proof.

By Theorem 5.1, $\mu\in\mathcal{E}(\mathcal{H})$ if and only if $\mu\in\mathcal{E}(\mathcal{R}_{b}^{r})$ and

[TABLE]

So we can equivalently formulate the optimization problem as one having $\mu$ as the decision variable, $q_{\mu}(\alpha)$ as the objective function, and $\mu\in\mathcal{E}(\mathcal{R}_{b}^{r})$ and (5.1) as the constraints.

Maximizing $q_{\mu}(\alpha)$ is equivalent to maximizing

[TABLE]

For any feasible equilibrium distribution $\mu$ , let $h:=1/(\zeta_{\mu}\circ q_{\mu})$ which implies $\zeta_{\mu}=1/(h\circ F_{\mu})$ , $\int_{0}^{1}h(r)dr=1$ , and $\mu=\mathcal{E}(-2c\sigma^{2}\ln h)$ . In particular, the mapping from $\mu$ to $h$ is one-to-one. Further rewrite the optimization problem as

[TABLE]

where $h$ is also constrained to be positive, decreasing, bounded and bounded away from zero, as translated from $\mu\in\mathcal{E}(\mathcal{R}_{b}^{r})$ . Each feasible $\mu$ clearly induces a feasible $h$ . Conversely, for any feasible $h$ , define $\mu=\mathcal{E}(-2c\sigma^{2}\ln h)$ . Then $-2c\sigma^{2}\ln h(r)=2c\sigma^{2}\ln\zeta_{\mu}(q_{\mu}(r))+C$ for some constant $C$ by Theorem 5.1. Together with the constraints in (5.2), we find that $h=1/(\zeta_{\mu}\circ q_{\mu})$ and that $\mu$ is feasible. Thus, the mapping from feasible $\mu$ to feasible $h$ is in fact bijective, which implies that it suffices for us to solve problem (5.2). Any optimal $h$ induces an optimal $\mu=\mathcal{E}(-2c\sigma^{2}\ln h)$ which can be realized by the reward function

[TABLE]

Here we have added the constant $V_{0}$ to $-2c\sigma^{2}\ln h$ to ensure that $R\in\mathcal{H}$ . The rest of the proof is devoted to solving the equivalent problem (5.2).

We first show that the constant $x_{\alpha}$ given in the theorem statement is well-defined. Let

[TABLE]

It can be shown that $g(x)$ is strictly decreasing on $(0,\alpha)$ and strictly increasing on $(\alpha,1)$ , hence has a global minimum at $x=\alpha$ with $g(\alpha)=0$ . Moreover, $g(x)\rightarrow\infty$ as $x\rightarrow 0$ or $1$ . Since $K\geq V_{0}$ , by intermediate value theorem, the equation

[TABLE]

has at least one solution. When $K=V_{0}$ , $x=\alpha$ is the unique solution. When $K>V_{0}$ , there are two solutions: one in $(0,\alpha)$ and the other in $(\alpha,1)$ . In both cases. $x_{\alpha}\in[\alpha,1)$ is well-defined.

Next, we show that

[TABLE]

is the unique optimizer of problem (5.2). Since $0<\alpha\leq x_{\alpha}<1$ , it is clear that $h^{*}$ is decreasing, bounded and bounded away from zero. Straightforward calculation also shows that

[TABLE]

Therefore, $h^{*}$ satisfies all the feasibility constraints. Given any other feasible $h$ . We have, by repeated application of Jensen’s inequality, that

[TABLE]

That is

[TABLE]

We claim that $\mathcal{J}(h)\leq x_{\alpha}=\mathcal{J}(h^{*})$ . Suppose on the contrary that $\mathcal{J}(h)>x_{\alpha}$ . Then since $x_{\alpha}\geq\alpha$ and $g$ is strictly increasing on $(\alpha,1)$ , we must have $g(\mathcal{J}(h))>g(x_{\alpha})$ , which is a contradiction. Thus, we have proved that $h^{*}$ is optimal. In fact, $h^{*}$ is the unique optimizer, since $\mathcal{J}(h)=\mathcal{J}(h^{*})$ would imply all Jensen’s inequalities above are equalities. This holds if and only if $h$ is constant on $[0,\alpha)$ and $(\alpha,1]$ . We then use $\mathcal{J}(h)=\mathcal{J}(h^{*})=x_{\alpha}$ and $\int_{0}^{1}h(r)dr=1$ to deduce that $h=h^{*}$ .

Finally, we argue that the optimal reward function $R^{*}=V_{0}-2c\sigma^{2}\ln h^{*}$ induced by $h^{*}$ is also unique. Because of the bijection between $\mu$ and $h$ , we know that $\mu^{*}=\mathcal{E}(-2c\sigma^{2}\ln h^{*})$ is the unique optimal equilibrium distribution. Note that $H\left(\mu^{*}|\mathcal{N}(x_{0},\sigma^{2}T)\right)=\int_{0}^{1}-\ln h^{*}(r)dr=\frac{K-V_{0}}{2c\sigma^{2}}$ . By Theorem 5.1,

[TABLE]

The remaining theorem statements follow from direct calculation. ∎

Remark 5.1.

*One can also replace the reservation utility constraint by the hard constraint: $R\geq R_{0}$ . Similar to Bayraktar et al. [1, Theorem 6.2], the optimal reward function in this case is the equal reward with cutoff rank $\alpha$ , i.e. $R(r)=R_{0}+\frac{K-R_{0}}{1-\alpha}1_{[\alpha,1]}(r)$ . *

5.3 Maximizing net profit

Suppose each terminal output $y$ generates a profit $g(y)$ for the principal, where $g$ is a bounded increasing function. The goal is to find $R\in\mathcal{R}_{b}^{r}$ such that $\mathcal{V}(R)\geq V_{0}$ and the net profit

[TABLE]

is maximized.

Theorem 5.3.

The optimal net profit is given by

[TABLE]

and is uniquely attained by

[TABLE]

where $f_{0}$ is given by (2.7), and

[TABLE]

Proof.

By Theorem 5.1, it suffices for us to look for the optimal $\mu\in\mathcal{E}(\mathcal{R}_{b}^{r})$ which can then be realized by $R(r)=2c\sigma^{2}\ln\zeta_{\mu}(q_{\mu}(r))+C$ for any $C\geq V_{0}$ . It is clear that the principal should pick $C=V_{0}$ to minimize the cost. Write $R^{\mu}(r)=2c\sigma^{2}\ln\zeta_{\mu}(q_{\mu}(r))+V_{0}$ . We then have

[TABLE]

The optimization problem over $\mu$ is given by

[TABLE]

To solve problem (5.3), we define

[TABLE]

For each fixed $\lambda\in\mathbb{R}$ , the integrand above attains its pointwise maximum at

[TABLE]

Clearly, since $g$ is bounded and increasing, so is $\ln(f_{\mu}/f_{0})$ . We then find $\lambda$ by

[TABLE]

giving

[TABLE]

The formulas for $f_{\mu^{*}}$ and $R^{*}=R^{\mu^{*}}$ then follow. ∎

5.4 Maximizing total effort

Let $K\geq V_{0}$ be given. We look for a purely rank-based reward function $R$ which maximizes the total effort

[TABLE]

subject to the reservation utility constraint $\mathcal{V}(R)\geq V_{0}$ and the budget constraint $\int_{0}^{1}R(r)dr\leq K$ .

Theorem 5.4.

$\sup_{R\in\mathcal{R}_{b}^{r}}A(R)=\sqrt{(K-V_{0})T/c}$ . When $K=V_{0}$ , the unique optimal reward is given by $R^{*}(r)\equiv V_{0}$ . When $K>V_{0}$ , as $M\rightarrow\infty$ , an $O(e^{-M/\lambda})$ -optimal reward is

[TABLE]

where

[TABLE]

Proof.

By Theorem 5.1, it suffices for us to look for an optimal target distribution $\mu$ satisfying

[TABLE]

Such a $\mu$ , if lies in $\mathcal{E}(\mathcal{R}_{b}^{r})$ , can be realized by the reward function $R(r)=2c\sigma^{2}\ln\zeta_{\mu}(q_{\mu}(r))+V_{0}$ . We shall assume that we are in the nontrivial case $K>V_{0}$ , otherwise the only attainable equilibrium is $\mu=\mathcal{N}(x_{0},\sigma^{2}T)$ which is induced by the uniform reward. We first relax the boundedness requirement of $\ln(f_{\mu}/f_{0})$ ; it turns out that the the relaxed optimizer fails to be in $\mathcal{E}(\mathcal{R}_{b}^{r})$ . We then construct an approximate optimizer by truncation.

The relaxed optimization problem over $\mu$ reads

[TABLE]

Any candidate optimizer $f_{\mu^{*}}$ necessarily satisfies the Kuhn–Tucker conditions (see e.g. Luenberger [19])

[TABLE]

The above implies

[TABLE]

and

[TABLE]

where $\lambda_{2}>0$ is determined by the complementary slackness

[TABLE]

giving $\lambda_{2}=\sigma^{2}\sqrt{cT/(K-V_{0})}$ . We then have

[TABLE]

In other words, $\mu^{*}=\mathcal{N}(x_{0}+\sqrt{(K-V_{0})T/c},\sigma^{2}T)$ . It is also clear that $\ln f_{\mu^{*}}(y)/f_{0}(y)$ is increasing. Since the objective and the equality constraints are linear in $f_{\mu}$ , and the inequality constraint is convex in $f_{\mu}$ , it can also be shown that these conditions, together with the monotonicity of $\ln f_{\mu^{*}}/f_{0}$ , are sufficient for optimality. The relaxed optimal value equals $\sqrt{(K-V_{0})T/c}\geq\sup_{R\in\mathcal{R}_{b}^{r}}A(R)$ .

Since $\ln f_{\mu^{*}}/f_{0}$ is unbounded, such a $\mu^{*}\notin\mathcal{R}^{r}_{b}$ . Consider the truncated $\mu_{M}$ defined in the theorem statement. We have $R_{M}\in\mathcal{R}_{b}^{r}$ and $\mu_{M}=\mathcal{E}(R_{M})$ for all $M$ . Moreover, let

[TABLE]

We can show that

[TABLE]

and

[TABLE]

as $M\rightarrow\infty$ . It follows that

[TABLE]

∎

Remark 5.2.

It can be verified using Theorem 3.1 that $\mu^{*}=\mathcal{N}(x_{0}+\sqrt{(K-V_{0})T/c},\sigma^{2}T)$ is an equilibrium induced by the unbounded reward

[TABLE]

The optimal effort process associated with $\mu^{*}$ is constant: $a^{*}_{s}\equiv\sqrt{\frac{K-V_{0}}{cT}}$ , by straightforward calculation using (2.10). This can also be seen by directly substituting

[TABLE]

into the control problem, yielding a linear-quadratic optimization:

[TABLE]

However, it is not clear whether $\mu^{*}$ is the unique equilibrium under $R^{*}$ .

6 Price of anarchy

For a fixed reward function $R$ , the price of anarchy (PoA) is defined as the ratio between the optimal centralized welfare $V_{c}$ and the worst equilibrium welfare/game value. By centralized, we mean that the principal can prescribe and enforce the effort, or equivalently, the law of the controlled process, for the agents. We only consider a symmetric effort prescription, i.e. same terminal law for all players. To avoid triviality, we consider $R$ that is not purely rank-based, otherwise the optimal centralized welfare is always equal to $\int_{0}^{1}R(r)dr$ which is attained by prescribing zero effort for all.

The optimal centralized welfare $V_{c}$ is defined as

[TABLE]

This is a control problem of McKean-Vlasov type. Similar to the derivation of (2.6), we can reformulate the centralized problem as

[TABLE]

When $R(x,r,m)$ is independent of individual performance $x$ , the inner optimization over $\mu$ is explicitly solvable. Specifically, letting $\Pi(m):=\int_{0}^{1}R(r,m)dr$ , we have

[TABLE]

Here and in the sequel, we omit the underlying assumption that $m\in\mathbb{R}$ and $\mu\in\mathcal{P}(\mathbb{R})$ . Using the Lagrange method, we find that the mean-constrained entropy minimization has optimal value $\frac{(m-x_{0})^{2}}{2\sigma^{2}T}$ , attained by the normal distribution $\mu^{*}=\mathcal{N}(m,\sigma^{2}T)$ . It follows that

[TABLE]

We see that $V_{c}<\infty$ if $\Pi(m)$ has sub-quadratic growth. As one would expect for a symmetric game, the centralized solution does not depend on the rank-order allocation of rewards.

When $R(x,r,m)$ is independent of rank $r$ , we have

[TABLE]

Again by the Lagrange method, we find that the inner maximization over $\mu$ has solution

[TABLE]

where $\lambda_{m}$ is determined by

[TABLE]

Plugging in $f_{\mu(m)}$ into the formula for $V_{c}$ , we get

[TABLE]

Example 6.1.

Suppose $R(r,m)=m+2\sigma\sqrt{c(1-\alpha)m}N^{-1}(r)$ with $\alpha\in(0,1]$ . Note that $R(r,m)$ takes the same form as the effort-maximizing reward in Remark 5.2 with $K$ replaced by $m$ and $V_{0}$ by $\alpha m$ . In this case, we have $\Pi(m)=m$ and by (6.1),

[TABLE]

To compute the equilibrium welfare, observe that given $\mu\in\mathcal{P}(\mathbb{R})$ with $\int_{\mathbb{R}}xd\mu(x)=m_{\mu}$ . Let $\tilde{R}(r):=R(r,m_{\mu})$ , then $\mu$ is an equilibrium for $R$ if and only if $\mu$ is optimal for $V(R,\mu)=V(\tilde{R},\mu)$ , which means $\mu$ is also the unique equilibrium for the purely rank-based reward $\tilde{R}$ . This allows us to directly use Remark 5.2 to write down one equilibrium $\mu$ (not necessarily unique), whose mean satisfies

[TABLE]

The unique solution is given by

[TABLE]

with associated game value $\alpha m_{\mu}$ . It follows that in this case,

[TABLE]

When $\alpha\rightarrow 0$ , PoA $\rightarrow\infty$ . When $\alpha=1$ , PoA $\geq 1+\frac{T}{4cx_{0}}$ . When $\alpha m_{\mu}=x_{0}+\frac{T}{4c}$ or $\alpha=\frac{T+4cx_{0}}{2T+4cx_{0}}$ , PoA $\geq 1$ . If we only consider equilibria that satisfy $\beta(\mu)<\infty$ , then all inequalities become equalities.

Example 6.2.

Suppose $R(x,m)=\alpha x+(1-\alpha)g(m)$ where $g$ is bounded increasing, and $\alpha\in[0,1]$ . In this case, it can be shown that $\mu(m)=\mathcal{N}(m,\sigma^{2}T)$ , $m=x_{0}+\frac{T}{2c}(\alpha-\lambda_{m})$ , and

[TABLE]

By (6.2), we get

[TABLE]

Since $R$ is independent of rank and linear in $x$ , we have that $\beta(\mu)<\infty$ for all $\mu\in\mathcal{P}(\mathbb{R})$ . By Theorem 3.1, all equilibria are characterized by (3.1). We find that there is a unique equilibrium $\mu$ which is normal with mean $m_{\mu}=x_{0}+\frac{T\alpha}{2c}$ and variance $\sigma^{2}T)$ . The associated game value is

[TABLE]

It follows that, after rearranging the denominator,

[TABLE]

When $\alpha=0$ , PoA= $\sup_{m}\left\{g(m)-\frac{c}{T}(m-x_{0})^{2}\right\}/g(x_{0})$ . When $\alpha=1$ , the game is non-interactive, and PoA $=1$ .

Appendix A Schrödinger bridges from space to time

Let $\Omega=C([0,T],\mathbb{R})$ be the canonical space and $\mathbb{W}_{x}$ be the Wiener measure starting at $x$ at time zero. Also let $(\mathcal{F}_{t})_{t\in[0,T]}$ be the filtration generated by the canonical process. Define $\tau(\omega):=\inf\{t\in[0,T]:w_{t}=0\}$ with the convention that $\inf\emptyset=\infty$ . Given a reference measure $P\in\mathcal{P}(\Omega)$ , a source distribution $\nu\in\mathcal{P}(\mathbb{R})$ and a target distribution $\mu\in\mathcal{P}(\mathbb{T})$ where $\mathbb{T}:=[0,T]\cup\{\infty\}$ , consider the following variant of the Schrödinger bridge problem:

[TABLE]

For any $Q\in\mathcal{P}(\Omega)$ , define $Q^{x,t}:=Q(\cdot|\omega_{0}=x,\tau(\omega)=t)$ . We have the disintegration:

[TABLE]

Similar to the standard Schrödinger bridge problem, one can show that the optimal transport plan is given by

[TABLE]

where $\pi^{*}$ is the solution to

[TABLE]

assuming the infimum is attained.

Now, consider the hitting time ranking game of Bayraktar et al. [1], where each agent solves

[TABLE]

Here it is assumed that $R_{\tilde{\mu}}(t)=R_{\infty}\in\mathbb{R}$ for all $t>T$ . Take $X=x_{0}+\sigma\omega_{t}$ and $P=\mathbb{W}_{0}\circ X^{-1}$ , and identify $a_{t}$ with the set of laws

[TABLE]

The condition on the Radon-Nikodym derivative means $a_{t}\equiv 0$ for all $t>\tau\wedge T$ . Let $\mu_{0}:=P\circ\tau^{-1}$ be the law of the first passage time of level $x_{0}/\sigma$ of a Brownian motion. We can rewrite the agent’s control problem in weak formulation as

[TABLE]

Note that for each $\mu\sim\mu_{0}$ , the associated optimal $Q=Q_{\mu}=\int_{\mathbb{T}}P^{x_{0},t}(\cdot)\mu(dt)$ is always equivalent to $P$ and satisfies $dQ/dP=\zeta(\tau)\in\mathcal{F}_{\tau\wedge T}$ , where $\zeta(t):=d\mu(t)/d\mu_{0}(t)$ . It follows that $Q\in\mathcal{Q}$ and the inequality is in fact an equality. The resulting static problem can be further split into a constrained calculus of variation problem, followed by a static optimization:

[TABLE]

By elementary calculation, one deduces the same formula as Bayraktar et al. [1, Eq. (2.5)].

Bibliography21

The reference list from the paper itself. Each links out to its DOI / PubMed record.

1[1] Erhan Bayraktar, Jakša Cvitanić, and Yuchong Zhang. Large tournament games. Ann. Appl. Probab. , 29(6):3695–3744, 2019.
2[2] Erhan Bayraktar and Yuchong Zhang. A rank-based mean field game in the strong formulation. Electron. Commun. Probab. , 21:Paper No. 72, 12, 2016.
3[3] Pierre Cardaliaguet and Catherine Rainer. On the (in)efficiency of MFG equilibria. SIAM J. Control Optim. , 57(4):2292–2314, 2019.
4[4] René Carmona and François Delarue. Probabilistic Theory of Mean Field Games with Applications I-II . Springer, 2018.
5[5] René Carmona, Christy V. Graves, and Zongjun Tan. Price of anarchy for mean field games. ESAIM: Proc S , 65:349–383, 2019.
6[6] René Carmona and Daniel Lacker. A probabilistic weak formulation of mean field games and applications. Ann. Appl. Probab. , 25(3):1189–1231, 2015.
7[7] Yongxin Chen, Tryphon T. Georgiou, and Michele Pavon. On the relation between optimal transport and Schrödinger bridges: a stochastic control viewpoint. J. Optim. Theory Appl. , 169(2):671–691, 2016.
8[8] Dawei Fang, Thomas Noe, and Philipp Strack. Turning up the heat: The discouraging effect of competition in contests. Journal of Political Economy , 128(5):1940–1975, 2020.

TL;DR

Contribution

Findings

Abstract

Peer Reviews

Videos

Terminal Ranking Games††thanks:

Abstract

1 Introduction

2 A single player’s problem

Remark 2.1**.**

2.1 Reduction via Schrödinger bridges

Proposition 2.1**.**

Remark 2.2**.**

Remark 2.3**.**

2.2 Optimal effort

Remark 2.4**.**

3 Characterization of equilibrium

Theorem 3.1**.**

Theorem 3.2**.**

Proof.

Remark 3.1**.**

Remark 3.2**.**

Remark 3.3**.**

4 Effect of reward inequality

Definition 4.1**.**

Theorem 4.1**.**

Proof.

Remark 4.1**.**

5 Tournament design

5.1 Realizing a target equilibrium distribution

Theorem 5.1**.**

Proof.

5.2 Maximizing rank-α\alphaα performance

Theorem 5.2**.**

Proof.

Remark 5.1**.**

5.3 Maximizing net profit

Theorem 5.3**.**

Proof.

5.4 Maximizing total effort

Theorem 5.4**.**

Proof.

Remark 5.2**.**

6 Price of anarchy

Example 6.1**.**

Example 6.2**.**

Appendix A Schrödinger bridges from space to time

Remark 2.1.

Proposition 2.1.

Remark 2.2.

Remark 2.3.

Remark 2.4.

Theorem 3.1.

Theorem 3.2.

Remark 3.1.

Remark 3.2.

Remark 3.3.

Definition 4.1.

Theorem 4.1.

Remark 4.1.

Theorem 5.1.

5.2 Maximizing rank- $\alpha$ performance

Theorem 5.2.

Remark 5.1.

Theorem 5.3.

Theorem 5.4.

Remark 5.2.

Example 6.1.

Example 6.2.