Zero-sum Stochastic Games: Limit Optimal Trajectories

Sylvain Sorin (IMJ-PRG); Guillaume Vigeral (CEREMADE)

arXiv:1812.08414·math.OC·December 21, 2018

Zero-sum Stochastic Games: Limit Optimal Trajectories

Sylvain Sorin (IMJ-PRG), Guillaume Vigeral (CEREMADE)

PDF

Open Access

TL;DR

This paper investigates the behavior of zero-sum stochastic games as the discount factor approaches zero, focusing on the convergence and properties of limit optimal trajectories of payoffs and occupation measures.

Contribution

It introduces the concept of limit optimal trajectories in zero-sum stochastic games and analyzes their existence, uniqueness, and characterization for absorbing games.

Findings

01

Existence of limit optimal trajectories as discount factor tends to zero

02

Uniqueness conditions for these trajectories in absorbing games

03

Characterization of the structure of limit trajectories

Abstract

We consider zero sum stochastic games. For every discount factor $λ$ , a time normalization allows to represent the game as being played on the interval [0, 1]. We introduce the trajectories of cumulated expected payoff and of cumulated occupation measure up to time t $\in$ [0, 1], under $ϵ$ -optimal strategies. A limit optimal trajectory is defined as an accumulation point as the discount factor tends to 0. We study existence, uniqueness and characterization of these limit optimal trajectories for absorbing games.

Equations121

l_{λ}^{ω, x, y} (t_{n}) = λ i = 1 \sum n (1 - λ)^{i - 1} c_{i}^{ω, x, y}

l_{λ}^{ω, x, y} (t_{n}) = λ i = 1 \sum n (1 - λ)^{i - 1} c_{i}^{ω, x, y}

Q_{λ}^{ω, x, y} (t_{n}) = λ i = 1 \sum n (1 - λ)^{i - 1} q_{i}^{ω, x, y}

Q_{λ}^{ω, x, y} (t_{n}) = λ i = 1 \sum n (1 - λ)^{i - 1} q_{i}^{ω, x, y}

\forall ε > 0, \exists λ_{0} > 0, \forall λ < λ_{0}, \exists x_{λ} \in X_{λ}^{ε}, \exists y_{λ} \in Y_{λ}^{ε}, \forall ω \in Ω, \forall t \in [0, 1], ∣ l^{ω} (t) - l_{λ}^{ω, x_{λ}, y_{λ}} (t) ∣ \leq ε .

\forall ε > 0, \exists λ_{0} > 0, \forall λ < λ_{0}, \exists x_{λ} \in X_{λ}^{ε}, \exists y_{λ} \in Y_{λ}^{ε}, \forall ω \in Ω, \forall t \in [0, 1], ∣ l^{ω} (t) - l_{λ}^{ω, x_{λ}, y_{λ}} (t) ∣ \leq ε .

\forall ε > 0, \exists λ_{0} > 0, \forall λ < λ_{0}, \exists x_{λ} \in X_{λ}^{ε}, \exists y_{λ} \in Y_{λ}^{ε}, \forall ω \in Ω, \forall t \in [0, 1], ∥ Q^{ω} (t) - Q_{λ}^{ω, x_{λ}, y_{λ}} (t) ∥ \leq ε .

\forall ε > 0, \exists λ_{0} > 0, \forall λ < λ_{0}, \exists x_{λ} \in X_{λ}^{ε}, \exists y_{λ} \in Y_{λ}^{ε}, \forall ω \in Ω, \forall t \in [0, 1], ∥ Q^{ω} (t) - Q_{λ}^{ω, x_{λ}, y_{λ}} (t) ∥ \leq ε .

G^{*} (x, y) := p^{*} (x, y) \overline{g}^{*} (x, y) := \int_{I \times J} p^{*} (i, j) g^{*} (i, j) x (d i) y (d j) .

G^{*} (x, y) := p^{*} (x, y) \overline{g}^{*} (x, y) := \int_{I \times J} p^{*} (i, j) g^{*} (i, j) x (d i) y (d j) .

A (x, x^{'}, a, y, y^{'}, b) = \frac{g ( x , y ) + a G ^{*} ( x ^{'} , y ) + b G ^{*} ( x , y ^{'} )}{1 + a p ^{*} ( x ^{'} , y ) + b p ^{*} ( x , y ^{'} )} .

A (x, x^{'}, a, y, y^{'}, b) = \frac{g ( x , y ) + a G ^{*} ( x ^{'} , y ) + b G ^{*} ( x , y ^{'} )}{1 + a p ^{*} ( x ^{'} , y ) + b p ^{*} ( x , y ^{'} )} .

v = x \in X max (x^{'}, a) \in X \times R^{+} sup (y, y^{'}, b) \in T in f A (x, x^{'}, a, y, y^{'}, b) = y \in Y min (y^{'}, b) \in Y \times R^{+} in f (x, x^{'}, a) \in S in f A (x, x^{'}, a, y, y^{'}, b) .

v = x \in X max (x^{'}, a) \in X \times R^{+} sup (y, y^{'}, b) \in T in f A (x, x^{'}, a, y, y^{'}, b) = y \in Y min (y^{'}, b) \in Y \times R^{+} in f (x, x^{'}, a) \in S in f A (x, x^{'}, a, y, y^{'}, b) .

w \leq (x, x^{'}, a) \in S sup (y, y^{'}, b) \in T in f \frac{g ( x , y ) + a G ^{*} ( x ^{'} , y ) + b G ^{*} ( x , y ^{'} )}{1 + a p ^{*} ( x ^{'} , y ) + b p ^{*} ( x , y ^{'} )}

w \leq (x, x^{'}, a) \in S sup (y, y^{'}, b) \in T in f \frac{g ( x , y ) + a G ^{*} ( x ^{'} , y ) + b G ^{*} ( x , y ^{'} )}{1 + a p ^{*} ( x ^{'} , y ) + b p ^{*} ( x , y ^{'} )}

r_{λ} (x, y) = λ g (x, y) + (1 - λ) [(1 - p^{*} (x, y)) r_{λ} (x, y) + G^{*} (x, y)]

r_{λ} (x, y) = λ g (x, y) + (1 - λ) [(1 - p^{*} (x, y)) r_{λ} (x, y) + G^{*} (x, y)]

r_{λ} (x, y) = \frac{λ g ( x , y ) + ( 1 - λ ) G ^{*} ( x , y )}{λ + ( 1 - λ ) p ^{*} ( x , y )} .

r_{λ} (x, y) = \frac{λ g ( x , y ) + ( 1 - λ ) G ^{*} ( x , y )}{λ + ( 1 - λ ) p ^{*} ( x , y )} .

v_{λ} \leq \frac{λ g ( x _{λ} , y ) + ( 1 - λ ) G ^{*} ( x _{λ} , y )}{λ + ( 1 - λ ) p ^{*} ( x _{λ} , y )}, \forall y \in Y,

v_{λ} \leq \frac{λ g ( x _{λ} , y ) + ( 1 - λ ) G ^{*} ( x _{λ} , y )}{λ + ( 1 - λ ) p ^{*} ( x _{λ} , y )}, \forall y \in Y,

v_{λ} \leq \frac{g ( x _{λ} , y ) + \frac{( 1 - λ )}{λ} G ^{*} ( x _{λ} , y )}{1 + \frac{( 1 - λ )}{λ} p ^{*} ( x _{λ} , y )}, \forall y \in Y .

v_{λ} \leq \frac{g ( x _{λ} , y ) + \frac{( 1 - λ )}{λ} G ^{*} ( x _{λ} , y )}{1 + \frac{( 1 - λ )}{λ} p ^{*} ( x _{λ} , y )}, \forall y \in Y .

∣ g (\overline{x}, y) - g (x_{\overline{λ}}, y) ∣ \leq ε, \forall y \in Y

∣ g (\overline{x}, y) - g (x_{\overline{λ}}, y) ∣ \leq ε, \forall y \in Y

∣ v_{\overline{λ}} - w ∣ \leq ε .

∣ v_{\overline{λ}} - w ∣ \leq ε .

w - ε \leq \frac{g ( x , y ) + a G ^{*} ( x ^{'} , y )}{1 + a p ^{*} ( x ^{'} , y )} + ε, \forall y \in Y .

w - ε \leq \frac{g ( x , y ) + a G ^{*} ( x ^{'} , y )}{1 + a p ^{*} ( x ^{'} , y )} + ε, \forall y \in Y .

w p^{*} (\overline{x}, y^{'}) \leq G^{*} (\overline{x}, y^{'}), \forall y^{'} \in Y .

w p^{*} (\overline{x}, y^{'}) \leq G^{*} (\overline{x}, y^{'}), \forall y^{'} \in Y .

w \leq \frac{g ( x , y ) + a G ^{*} ( x ^{'} , y ) + b G ^{*} ( x , y ^{'} )}{1 + a p ^{*} ( x ^{'} , y ) + b p ^{*} ( x , y ^{'} )} + 2 ε, \forall y, y^{'} \in Y, b \in R_{+}

w \leq \frac{g ( x , y ) + a G ^{*} ( x ^{'} , y ) + b G ^{*} ( x , y ^{'} )}{1 + a p ^{*} ( x ^{'} , y ) + b p ^{*} ( x , y ^{'} )} + 2 ε, \forall y, y^{'} \in Y, b \in R_{+}

r_{λ} (\overset{x}{^}_{λ}, y) = \frac{λ [ g ( x , y ) + λa g ( x ^{'} , y )] + ( 1 - λ ) [ G ^{*} ( x , y ) + λa G ^{*} ( x ^{'} , y )]}{λ ( 1 + λa ) + ( 1 - λ ) ( p ^{*} ( x , y ) + λa p ^{*} ( x ^{'} , y ))} .

r_{λ} (\overset{x}{^}_{λ}, y) = \frac{λ [ g ( x , y ) + λa g ( x ^{'} , y )] + ( 1 - λ ) [ G ^{*} ( x , y ) + λa G ^{*} ( x ^{'} , y )]}{λ ( 1 + λa ) + ( 1 - λ ) ( p ^{*} ( x , y ) + λa p ^{*} ( x ^{'} , y ))} .

A (x, x^{'}, a, y, y, \frac{1 - λ}{λ}) = \frac{λ g ( x , y ) + λ a G ^{*} ( x ^{'} , y ) + ( 1 - λ ) G ^{*} ( x , y )}{λ + λ a p ^{*} ( x ^{'} , y ) + ( 1 - λ ) p ^{*} ( x , y )} .

A (x, x^{'}, a, y, y, \frac{1 - λ}{λ}) = \frac{λ g ( x , y ) + λ a G ^{*} ( x ^{'} , y ) + ( 1 - λ ) G ^{*} ( x , y )}{λ + λ a p ^{*} ( x ^{'} , y ) + ( 1 - λ ) p ^{*} ( x , y )} .

∣ r_{λ} (\overset{x}{^}_{λ}, y) - A (x, x^{'}, a, y, y, \frac{1 - λ}{λ}) ∣ \leq 4 C λa

∣ r_{λ} (\overset{x}{^}_{λ}, y) - A (x, x^{'}, a, y, y, \frac{1 - λ}{λ}) ∣ \leq 4 C λa

v - r_{λ} (\overset{x}{^}_{λ}, y) \leq ε + 4 C λa \leq 2 ε

v - r_{λ} (\overset{x}{^}_{λ}, y) \leq ε + 4 C λa \leq 2 ε

r_{λ} (ϕ (λ, s), y) \geq A (s, ψ (λ, y)) - o (1), \forall s \in S, \forall y \in Y

r_{λ} (ϕ (λ, s), y) \geq A (s, ψ (λ, y)) - o (1), \forall s \in S, \forall y \in Y

v

v

=

v + ε \geq A (x, x ", + \infty, y, y_{ε}^{'}, b_{ε}) = \overline{g}^{*} (x ", y)

v + ε \geq A (x, x ", + \infty, y, y_{ε}^{'}, b_{ε}) = \overline{g}^{*} (x ", y)

v + ε \geq A (x, x ", 0, y, y_{ε}^{'}, b_{ε}) = \frac{g ( x , y ) + b _{ε} G ^{*} ( x , y _{ε}^{'} )}{1 + b _{ε} p ^{*} ( x , y _{ε}^{'} )}

v + ε \geq A (x, x ", 0, y, y_{ε}^{'}, b_{ε}) = \frac{g ( x , y ) + b _{ε} G ^{*} ( x , y _{ε}^{'} )}{1 + b _{ε} p ^{*} ( x , y _{ε}^{'} )}

v + ε \geq A (x, x ", 0, y, y_{ε}^{'}, b_{ε}) \geq min {g (x, y), h^{-} (x)} .

v + ε \geq A (x, x ", 0, y, y_{ε}^{'}, b_{ε}) \geq min {g (x, y), h^{-} (x)} .

v + ε \geq med (g (x, y); h^{+} (y); h^{-} (x))

v + ε \geq med (g (x, y); h^{+} (y); h^{-} (x))

\frac{g ( x , y ) + b G ^{*} ( x , y ^{'} )}{1 + b p ^{*} ( x , y ^{'} )} \leq v + ε .

\frac{g ( x , y ) + b G ^{*} ( x , y ^{'} )}{1 + b p ^{*} ( x , y ^{'} )} \leq v + ε .

g (x, y)

g (x, y)

(v - ε) (1 + a p^{*} (x^{'}, y) + b p^{*} (x, y^{'}))

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsEconomic theories and models · Game Theory and Applications · Stochastic processes and financial applications

Full text

Zero-sum stochastic games: limit optimal trajectories

Sylvain Sorin

Sylvain Sorin, Sorbonne Université, UPMC Paris 06, Institut de Mathématiques de Jussieu-Paris Rive Gauche, UMR 7586, CNRS, F-75005, Paris, France

[[email protected]

https://webusers.imj-prg.fr/sylvain.sorin](mailto:[email protected])

and

Guillaume Vigeral

**Guillaume Vigeral (corresponding author)

**Université Paris-Dauphine, PSL Research University, CNRS, CEREMADE, Place du Maréchal De Lattre de Tassigny. 75775 Paris cedex 16, France

[ [email protected]

http://www.ceremade.dauphine.fr/ vigeral/indexenglish.html ](mailto:%[email protected])

Abstract.

We consider zero sum stochastic games. For every discount factor $\lambda$ , a time normalization allows to represent the game as being played on the interval $[0,1]$ . We introduce the trajectories of cumulated expected payoff and of cumulated occupation measure up to time $t\in[0,1]$ , under $\varepsilon$ -optimal strategies. A limit optimal trajectory is defined as an accumulation point as the discount factor tends to 0. We study existence, uniqueness and characterization of these limit optimal trajectories for absorbing games.

Some of the results of this paper were presented in “Atelier Franco-Chilien: Dynamiques, optimisation et apprentissage” Valparaiso, November 2010 and a preliminary version of this paper was given at the Game Theory Conference in Stony Brook, July 2012. This research was supported by grant PGMO 0294-01 (France)

1. Introduction

The analysis of two person zero sum repeated games in discrete time may be performed along two lines:

asymptotic approach: to each probability distribution $\theta$ on the set of stages ( $m=1,2,...$ ) one associates the game $G_{\theta}$ where the evaluation of the stream of stage payoffs $\{g_{m}\}$ is $\sum_{m=1}^{+\infty}\theta_{m}g_{m}$ , and one denotes its value by $v_{\theta}$ . Given a preordered family $\{\Theta,\succ\}$ of probability distributions, one studies whether $v_{\theta}$ converges as $\theta\in\Theta$ “goes to $\infty$ ” according to $\succ$ . Typical exemples correspond to $n$ -stage games ( $\theta_{m}=\frac{1}{n}I_{m\leq n},n\rightarrow\infty$ ), $\lambda$ -discounted games ( $\theta_{m}=\lambda(1-\lambda)^{m-1},\lambda\rightarrow 0$ ), or more generally decreasing evaluations ( $\theta_{m}\geq\theta_{m+1}$ ) with $\theta_{1}\rightarrow 0$ . The game has an asymptotic value $v^{*}$ when these limits exist and coincide.
uniform approach: for each strategy of player 1, one evaluates the amount that can be obtained against any strategy of the opponent in any sufficiently long game for $\{\Theta,\succ\}$ . This allows to define a $minmax$ and the game has a uniform value $v_{\infty}$ when $minmax=maxmin$ .

The second approach is stronger than the first one (the existence of $v_{\infty}$ implies the existence of $v^{*}$ and their equality), but there are games with asymptotic value and no uniform value (incomplete information on both sides, Aumann and Maschler [1], Mertens and Zamir [7]; stochastic games with signals on the moves). The first approach deals only with families of values while the second explicitly consider strategies. The main difference is that in the first case the ( $\varepsilon$ )-optimal strategies of the players may depend on the evaluation represented by $\theta$ .

We focus here on a class of games where this dependence has a smooth representation. Basically in addition to the asymptotic properties of the value one studies the asymptotic behavior along the play induced by ( $\varepsilon$ )-optimal strategies.

The first step is to normalize the duration of the game using the evaluation $\theta$ . We consider each game $G_{\theta}$ as being played on $[0,1]$ , stage $n$ lasting from time $t_{n-1}=\sum_{m<n}\theta_{m}$ to time $t_{n}=t_{n-1}+\theta_{n}$ (with $t_{0}=0$ ).

Note that here time $t$ corresponds to the fraction $t$ of the total duration of the game, as evaluated trough $\theta$ . In particular given ( $\varepsilon$ )-optimal strategies in $G_{\theta}$ the stream of expected stage payoffs generate a bounded measurable trajectory on $[0,1]$ and one will consider its asymptotic behavior.

The next section introduces the basic definitions and concepts that allow to describe our results. The main proofs are in Section 3. Further examples are in Sections 4 and 5.

To end this quick overview let us recall that there are games which do not have an asymptotic value: stochastic games with compact action spaces, Vigeral [15]; finite stochastic games with signals on the state, Ziliotto [16]; or more generally Sorin and Vigeral, [13].

2. Limit optimal trajectories

Let $\Gamma$ be a two-person zero-sum stochastic game with state space $\Omega$ , action spaces $I$ and $J$ , stage payoff $g$ and transition $\rho$ from $\Omega\times I\times J$ to IR (resp. $\Delta(\Omega)$ ). We assume that $\Omega$ is finite, $I$ and $J$ are compact metric, $g$ and $\rho$ continuous. We keep the same notations for the multilinear extensions to $X=\Delta(I)$ and $Y=\Delta(J)$ , where as usual $\Delta(A)$ denotes the probabilities on $A$ .

For any pair of stationary strategies $({\bf x},{\bf y})\in X^{\Omega}\times Y^{\Omega}={\bf X}\times{\bf Y}$ , any state $\omega\in\Omega$ and any stage $n$ , denote by $c^{\omega,{\bf x},{\bf y}}_{n}$ the expected payoff at stage $n$ under these stationary strategies, given the initial state $\omega$ , and by $q^{\omega,{\bf x},{\bf y}}_{n}\in\Delta(\Omega)$ the corresponding distribution of the state at stage $n$ . Hence $c^{\omega,{\bf x},{\bf y}}_{n}=\langle q^{\omega,{\bf x},{\bf y}}_{n},\bar{g}({\bf x},{\bf y})\rangle$ where $\bar{g}({\bf x},{\bf y})$ stands for the vector payoff with component in state $\zeta\in\Omega$ given by $g({\zeta};{\bf x}(\zeta),{\bf y}(\zeta))$ .

Definition 1.

For any $({\bf x},{\bf y})\in X^{\Omega}\times Y^{\Omega}$ , any discount factor $\lambda\in(0,1]$ , and any starting state $\omega$ , define the function $l^{\omega,{\bf x},{\bf y}}_{\lambda}:[0,1]\rightarrow\mathds{R}$ by*

[TABLE]

for $t_{n}=\lambda\sum_{i=1}^{n}(1-\lambda)^{i-1}$ and a linear interpolation between these dates $\{t_{n}\}$ .

Thus $l^{\omega,{\bf x},{\bf y}}_{\lambda}(t_{n})$ corresponds to the expectation of the accumulated payoff for the $n$ first stages in the discounted game, or up to time $t_{n}$ and $l^{\omega,{\bf x},{\bf y}}_{\lambda}(t)$ to the same at the fraction $t$ of the game, both under ${\bf x}$ and ${\bf y}$ in the $\lambda$ -discounted game starting from $\omega$ .

Let $M(\Omega)$ denote the set of positive measures on $\Omega$ . We introduce similarly the expected accumulated occupation measure at time $t$ under ${\bf x}$ and ${\bf y}$ in the $\lambda$ -discounted game starting from $\omega$ as follows:

Definition 2.

For any $({\bf x},{\bf y})\in X^{\Omega}\times Y^{\Omega}$ , any discount factor $\lambda$ , and any starting state $\omega$ , define the function $Q^{\omega,{\bf x},{\bf y}}_{\lambda}:[0,1]\rightarrow M(\Omega)$ by:*

[TABLE]

and by a linear interpolation between these dates $\{t_{n}\}$ .

Note that for any $t\in[0,1]$ , ${Q^{\omega,{\bf x},{\bf y}}_{\lambda}\left(t\right)}\in t\,\Delta(\Omega)$ .

Denote by $l^{{\bf x},{\bf y}}_{\lambda}$ and $Q^{{\bf x},{\bf y}}_{\lambda}$ the $\Omega$ -vectors of functions $l^{\omega,{\bf x},{\bf y}}_{\lambda}(\cdot)$ and $Q^{\omega,{\bf x},{\bf y}}_{\lambda}(\cdot)$ respectively.

Limit trajectories for the payoff and occupation measures will be defined as accumulation points of $l^{{\bf x}_{\lambda},{\bf y}_{\lambda}}_{\lambda}$ and $Q^{{\bf x}_{\lambda},{\bf y}_{\lambda}}_{\lambda}$ under $\lambda-$ dependent $\varepsilon$ -optimal strategies ${\bf x}_{\lambda}$ and ${\bf y}_{\lambda}$ as $\lambda$ tends to 0.

More precisely, denote by ${\bf X}^{\varepsilon}_{\lambda}$ (resp. ${\bf Y}^{\varepsilon}_{\lambda}$ ) the set of $\varepsilon$ -optimal stationary strategies in the $\lambda$ -discounted game $\Gamma_{\lambda}$ (with value $v_{\lambda}$ ) for Player 1 (resp. for Player 2).

Then we introduce:

Definition 3.

$l=(l^{\omega}:[0,1]\rightarrow\mathds{R})_{\omega\in\Omega}$ * is a limit optimal trajectory for the expected accumulated payoff ( $LOTP$ ) if :*

[TABLE]

$Q=(Q^{\omega}:[0,1]\rightarrow M(\Omega))_{\omega\in\Omega}$ * is a limit optimal trajectory for the expected accumulated occupation measure ( $LOTM$ ) if :*

[TABLE]

Alternate weaker and stronger definitions are as follows: in both cases, if “ $\forall\lambda<\lambda_{0}$ ” is replaced by “for some $\lambda_{n}$ going to 0”, we will speak of a weak $LOT$ . If “ $\exists{\bf x}_{\lambda}\in{\bf X}^{\varepsilon}_{\lambda},\ \exists{\bf y}_{\lambda}\in{\bf Y}^{\varepsilon}_{\lambda}$ ” is replaced by ‘ $\exists\varepsilon^{\prime}<\varepsilon,\ \forall{\bf x}_{\lambda}\in{\bf X}^{\varepsilon^{\prime}}_{\lambda},\ \forall{\bf y}_{\lambda}\in{\bf Y}^{\varepsilon^{\prime}}_{\lambda}$ ”, we will speak of a strong $LOT$ .

Remark 4.

a weak $LOT$ always exists by standard arguments of equicontinuity.
if a $LOTP$ $l$ exists, $v_{\lambda}$ converges to $l(1)$ .
if a strong $LOT$ exists, it is unique
no strong $LOTM$ exists in general (just consider a game where payoff is always 0).
if the game has a uniform value and both players use $\varepsilon$ -optimal strategies the average expected payoff is essentially constant along the play.

A first approach to this topic concerns one player games (or games where one player controls the transitions), where there is no finiteness assumption on $\Omega$ . Assume that $v_{\lambda}$ converges uniformly then there exists a strong LOTP and it is linear w.r.t. $t$ , which means that the expected payoff is constant along the trajectory (Sorin, Venel and Vigeral [14]).

The same article provides an example of a two player game with finite action and countable state spaces, where LOTP is not unique.

The main contributions of the current paper are:

For absorbing games, existence of a linear LOTP, and existence of a “geometric” algebraic LOTM

For finite absorbing games, existence of a strong LOTP.

An exemple of a finite game where LOTM is not semialgebraic.

An example of compact absorbing game with non uniqueness of LOTP.

Let us mention recent results of Oliu-Barton and Ziliotto [8] establishing the existence of linear strong LOPT for finite stochastic games and optimal strategies: the class of games is larger and they allow for any kind of optimal strategies. Our results deal with compact action spaces and $\varepsilon$ -optimal strategies.

As a final comment, let us underline the fact that the previous concepts and definitions can be extended to any repeated game, for any evaluation and any type of strategies.

3. Absorbing games

An absorbing game $\Gamma$ is defined by two sets of actions $I$ and $J$ , two stage payoff functions $g$ , $g^{*}$ from $I\times J$ to $\left[-1,1\right]$ and a probability of absorption $p^{*}$ from $I\times J$ to $\left[0,1\right].$

$I$ and $J$ are compact metric sets, $g,g^{*}$ and $p^{*}$ are (jointly) continuous.

The repeated game is played in discrete time as follows. At stage $t=1,2,...$ (if absorption has not yet occurred) player 1 chooses $i_{t}\in I$ and, simultaneously, player 2 chooses $j_{t}\in J$ :

(i) the payoff at stage $t$ is $g\left(i_{t},j_{t}\right)$ ;

(ii) with probability $p^{*}\left(i_{t},j_{t}\right)$ absorption is reached and the payoff in all future stages $s>t$ is $g^{*}\left(i_{t},j_{t}\right)$ ;

(iii) with probability $p\left(i_{t},j_{t}\right):=1-p^{*}\left(i_{t},j_{t}\right)$ the situation is repeated at stage $t+1$ .

Recall that the asymptotic analysis for these games is due to Kohlberg [3] in the case where $I$ and $J$ are finite and Rosenberg and Sorin [9] in the current framework. In either case the value $v_{\lambda}$ of the discounted game $\Gamma_{\lambda}$ converges to some $v$ as $\lambda$ goes to 0. This does not require any assumption on the information of the players. In case of full observation of the actions - or of the stage payoff, a uniform value exists, see Mertens and Neyman [5] in the finite case and Mertens, Neyman and Rosenberg [6] for compact actions.

Recall that $X=\Delta(I)$ and $Y=\Delta(J)$ are the sets of probabilities on $I$ and $J$ . The functions $g$ , $p$ and $p^{*}$ are bilinearly extended to $X\times Y$ . Let

[TABLE]

${\overline{g}}^{*}(x,y)$ is thus the expected absorbing payoff conditionally to absorption (and is thus only defined for $p^{*}(x,y)\neq 0$ ).

3.1. An auxiliary game

Consider the two-person zero-sum game ${\bf A}$ , defined for any $(x,x^{\prime},a)\in S=X^{2}\times\mathbb{R}^{+}$ and $(y,y^{\prime},b)\in T=Y^{2}\times\mathbb{R}^{+}$ , by the payoff function

[TABLE]

3.1.1. General properties

The following proposition extends to the compact case results due to Laraki [4] in the finite case (later simplified by Cardaliaguet, Laraki and Sorin [2]).

Proposition 5.

The game ${\bf A}$ has a value, which is $v=\lim v_{\lambda}$ .

More precisely*

[TABLE]

2) Moreover, if $(x,x^{\prime},a)$ is $\varepsilon$ -optimal in the game ${\bf A}$ then for any $\lambda$ small enough the stationary strategy $\hat{x}_{\lambda}:=\frac{x+\lambda ax^{\prime}}{1+\lambda a}$ is $2\varepsilon$ -optimal in $\Gamma_{\lambda}$ .

Proof.

Consider an accumulation point $w$ of the family $\{v_{\lambda}\}$ and let $\lambda_{n}\rightarrow 0$ such that $v_{\lambda_{n}}$ converges to $w$ .

We will show that

[TABLE]

A dual argument proves at the same time that the family $\{v_{\lambda}\}$ converges and that the auxiliary game ${\bf A}$ has a value.

Let $r_{\lambda}(x,y)$ be the payoff in the game $\Gamma_{\lambda}$ , induced by a pair of stationary strategies $(x,y)\in X\times Y$ . It satisfies

[TABLE]

hence

[TABLE]

In particular for any $x_{\lambda}\in X$ optimal for Player 1 in $\Gamma_{\lambda}$ , one obtains

[TABLE]

that one can write

[TABLE]

Let $\overline{x}\in X$ be an accumulation point of $\{x_{\lambda_{n}}\}$ and given $\varepsilon>0$ let $\overline{\lambda}$ in the sequence $\{{\lambda_{n}}\}$ such that

[TABLE]

(we use the fact that $g$ is uniformly continuous on $X\times Y$ ) and

[TABLE]

Then with $\overline{a}=\frac{(1-\overline{\lambda})}{\overline{\lambda}}$ and $\overline{x}^{\prime}=x_{\overline{\lambda}}$ , (6) implies

[TABLE]

On the other hand, going to the limit in (5) leads to

[TABLE]

We multiply (7) by the denominator $1+\overline{a}p^{*}(\overline{x}^{\prime},y)$ and we add to (8) multiplied by $b\in{\bf R}_{+}$ to obtain the property:

$\forall\varepsilon>0,\exists\ \overline{x},\overline{x}^{\prime}\in X$ and $\overline{a}\in{\bf R}_{+}$ such that

[TABLE]

which implies (2). Note moreover that $\overline{x}$ is independent of $\varepsilon$ , hence the result.

Let $(x,x^{\prime},a)$ be $\varepsilon$ -optimal in the game ${\bf A}$ and $\hat{x}_{\lambda}:=\displaystyle{\frac{x+\lambda ax^{\prime}}{1+\lambda a}}$ . Using (4) one obtains

[TABLE]

Note that

[TABLE]

Thus

[TABLE]

where $C$ is a bound on the payoffs. Hence, for any $y\in Y$

[TABLE]

for $\lambda$ small enough. ∎

${\bf A}$ is an auxiliary limit game in the sense that:

i) There is a map $\phi$ from $S\times(0,1]$ to $X$ (that associates to a strategy of player 1 in ${\bf A}$ and a discount factor a stationary strategy of player 1 in $\Gamma$ ).

ii) There is a map $\psi$ from $Y\times(0,1]$ to $T$ (that associates to a stationary strategy of player 2 in $\Gamma$ and a discount factor a stationary strategy of player 2 in ${\bf A}$ ).

iii)

[TABLE]

iv) A dual property holds.

These properties imply: $\lim v_{\lambda}$ exists and equals $v({\bf A})$ .

We then recover Corollary 3.2 in Sorin and Vigeral [12], with a new proof that will be useful in the sequel.

Corollary 6.

[TABLE]

where $\operatorname{med}$ is the median of three numbers, and with the usual convention that $\sup_{x"\in\emptyset}=-\infty$ ; $\inf_{y"\in\emptyset}=+\infty$ . Moreover if $(x,x^{\prime},a)$ (resp $(y,y^{\prime},\varepsilon)$ ) is $\varepsilon$ -optimal in ${\bf A}$ then $x$ (resp. $y$ ) is $\varepsilon$ -optimal in (10).

Proof.

For any $\varepsilon>0$ fix a triplet $(y,y^{\prime}_{\varepsilon},b_{\varepsilon})\in T$ of the second player $\varepsilon$ -optimal in ${\bf A}$ , where we can assume that $y$ does not depend on $\varepsilon$ by the previous result. Then for any $x"\in X$ such that $p^{*}(x",y)>0$ , one has

[TABLE]

thus $v+\varepsilon\geq h^{+}(y):=\sup_{x"|p^{*}(x",y)>0}\left\{\overline{g}^{*}(x",y)\right\}$ . Denote similarly $h^{-}(x)=\inf_{y"|p^{*}(x,y")>0}\left\{\overline{g}^{*}(x,y^{\prime})\right\}$ .

On the other hand, for any $x$

[TABLE]

Now if $p^{*}(x,y^{\prime}_{\varepsilon})>0$ , $\displaystyle{\frac{g(x,y)+b_{\varepsilon}\>G^{*}(x,y^{\prime}_{\varepsilon})}{1+b_{\varepsilon}\>p^{*}(x,y^{\prime}_{\varepsilon})}}\geq\min\{g(x,y),\overline{g}^{*}(x,y^{\prime}_{\varepsilon})\}$ , hence in all cases

[TABLE]

Thus for any $x\in X$

[TABLE]

Letting $\varepsilon$ go to 0 and using the dual inequality establish the results. ∎

3.1.2. Further properties of optimal strategies

We establish here more precise results concerning the decomposition of the payoff induced by $\varepsilon$ -optimal strategies in the game ${\bf A}$ .

Proposition 7.

Let $(x,x^{\prime},a)$ and $(y,y^{\prime},b)$ be $\varepsilon$ -optimal in the game ${\bf A}$ .

a)

If $p^{*}(x,y)>0$ then $|\overline{g}^{*}(x,y)-v|\leq\varepsilon$ 2. b)

$|g(x,y)-v|\leq 2(1+ap^{*}(x^{\prime},y)+bp^{*}(x,y^{\prime}))\varepsilon$ ** 3. c)

If $ap^{*}(x^{\prime},y)+bp^{*}(x,y^{\prime})>0$ then $\left|\displaystyle{\frac{aG^{*}(x^{\prime},y)+bG^{*}(x,y^{\prime})}{ap^{*}(x^{\prime},y)+bp^{*}(x,y^{\prime})}}-v\right|\leq 3\displaystyle{\frac{1+ap^{*}(x^{\prime},y)+bp^{*}(x,y^{\prime})}{ap^{*}(x^{\prime},y)+bp^{*}(x,y^{\prime})}}\varepsilon$ .

Proof.

a) This is exactly equation (12) and its dual.

b) From $v+\varepsilon\geq A(x,x^{\prime},0,y,y^{\prime},b)$ we get

[TABLE]

On the other hand, $v-\varepsilon\leq A(x,x^{\prime},a,y,y^{\prime},+\infty)$ hence $G^{*}(x,y^{\prime})\geq(v-\varepsilon)p^{*}(x,y^{\prime})$ . Combining both inequalities yields

[TABLE]

and the dual inequality is similar.

c) Since $A(x,x^{\prime},a,y,y^{\prime},b)\geq v-\varepsilon$ , one has

[TABLE]

hence

[TABLE]

and the dual inequality is similar. ∎

3.2. Asymptotics properties in $\Gamma_{\lambda}$

Since the game is absorbing, we write simply $Q^{x,y}_{\lambda}(t)$ for $Q^{\omega_{0},x,y}_{\lambda}(t)(\omega_{0})$ , where $\omega_{0}$ is the nonabsorbing state.

Lemma 8.

Let $x_{\lambda}$ and $y_{\lambda}$ be two families of (non necessarily optimal) stationary strategies of Player 1 and Player 2 respectively. Assume that $\displaystyle{\frac{p^{*}(x_{\lambda},y_{\lambda})}{\lambda}}$ converges to some $\gamma$ in $[0,+\infty]$ as $\lambda$ goes to 0. Then $Q^{x_{\lambda},y_{\lambda}}_{\lambda}(t)$ converges uniformly in $t$ to $\displaystyle{\frac{1-(1-t)^{1+\gamma}}{1+\gamma}}$ as $\lambda$ goes to 0, with the natural convention that $\displaystyle{\frac{1-(1-t)^{1+\gamma}}{1+\gamma}}=0$ for $\gamma=+\infty$ .

Proof.

By definition 2, for any $\lambda$ and $t_{n}=\lambda\sum_{i=1}^{n}(1-\lambda)^{i-1}=1-(1-\lambda)^{n}$ ,

[TABLE]

with linear interpolation between these dates.

Remark first that this implies that $Q^{x_{\lambda},y_{\lambda}}_{\lambda}(t)\leq[{1+\frac{p^{*}(x_{\lambda},y_{\lambda})}{\lambda}}]^{-1}$ for all $t$ and $\lambda$ , which gives at the limit the desired result if $\gamma=+\infty$ .

Assume now that $\gamma\in[0,+\infty[$ , and thus that $p^{*}(x_{\lambda},y_{\lambda})$ tends to 0 as $\lambda$ goes to 0. Fix $t$ and $\lambda$ , and let $n$ be the integer part of $\frac{\ln(1-t)}{\ln(1-\lambda)}$ so that $t_{n}\leq t\leq t_{n+1}$ . Since $Q^{x_{\lambda},y_{\lambda}}_{\lambda}(t_{n})$ is decreasing in $n$ ,

[TABLE]

Similarly,

[TABLE]

Letting $\lambda$ go to 0 in (15) and (16) yields the result. ∎

For any $(x,x^{\prime},a)\in X^{2}\times\mathbb{R}^{+}$ and $(y,y^{\prime},b)\in Y^{2}\times\mathbb{R}^{+}$ define

[TABLE]

An immediate consequence of the previous lemma is

Corollary 9.

Let $(x,x^{\prime},a)\in X^{2}\times\mathbb{R}^{+}$ and $(y,y^{\prime},b)\in Y^{2}\times\mathbb{R}^{+}$ and denote $\hat{x}_{\lambda}:=\frac{x+\lambda ax^{\prime}}{1+\lambda a}$ and $\hat{y}_{\lambda}:=\frac{y+\lambda by^{\prime}}{1+\lambda b}$ . Then $Q^{\hat{x}_{\lambda},\hat{y}_{\lambda}}_{\lambda}(t)$ converges uniformly in $t$ to $\displaystyle{\frac{1-(1-t)^{1+\gamma(x,x^{\prime},a,y,y^{\prime},b)}}{1+\gamma(x,x^{\prime},a,y,y^{\prime},b)}}$ as $\lambda$ goes to 0.

Proposition 10.

Any absorbing game has a LOTM $Q(t)=\frac{1-(1-t)^{1+\gamma}}{1+\gamma}$ for some $\gamma\in[0,+\infty]$ , and a LOTP $l(t)=tv$ .

Proof.

For every $n$ let $(x,x^{\prime}_{n},a_{n})$ and $(y,y^{\prime}_{n},b_{n})$ be $\frac{1}{n}$ -optimal strategies for each player in ${\bf A}$ (recall that $x$ and $y$ can be chosen independently on $n$ ). Up to extraction $\gamma_{n}:=\gamma(x,x^{\prime}_{n},a_{n},y,y^{\prime}_{n},b_{n})$ converges to some $\gamma$ in $[0,+\infty]$ .

Fix $\varepsilon>0$ and let $n\geq\frac{2}{\varepsilon}$ such that

[TABLE]

on [0,1]. By Proposition 5 the strategies $\hat{x}^{n}_{\lambda}:=\frac{x+\lambda a_{n}x^{\prime}_{n}}{1+\lambda a_{n}}$ and $\hat{y}^{n}_{\lambda}:=\frac{y+\lambda b_{n}y^{\prime}_{n}}{1+\lambda b_{n}}$ are $\varepsilon$ -optimal in $\Gamma_{\lambda}$ for $\lambda$ small enough. Corollary 9 and equation (17) imply that

[TABLE]

for all $\lambda$ small enough and $t\in[0,1]$ . This answers the first part of the Proposition.

Clearly

[TABLE]

Recall that the payoff function is assumed bounded by 1. Then equations (18) and (19) imply that for $\lambda$ small enough and every $t$ ,

[TABLE]

Since $x^{n}_{\lambda}$ and $y^{n}_{\lambda}$ converge to $x$ and $y$ , we then have for $\lambda$ small enough

[TABLE]

.

We now consider four separate cases. Basically either $\gamma=0$ or $+\infty$ and equation (20) implies that $l(t)$ is linear, and hence equals $tv$ since $l(1)=v$ by near optimality of the strategies $\hat{x}^{n}_{\lambda}$ and $\hat{y}^{n}_{\lambda}$ in $\Gamma_{\lambda}$ ; or $\gamma\in]0,+\infty[$ and then both $g(x,y)$ and ${\overline{g}}^{*}(\hat{x}^{n}_{\lambda},\hat{y}^{n}_{\lambda})$ are close to $v$ by Proposition 7, which once again implies $l(t)=tv$ .

Case 1 : $p^{*}(x,y)>0$ . Then ${\overline{g}}^{*}(\hat{x}^{n}_{\lambda},\hat{y}^{n}_{\lambda})$ converges to ${\overline{g}}^{*}(x,y)$ as $\lambda$ go to 0, and by Proposition 7 a) $|{\overline{g}}^{*}(x,y)-v|\leq\varepsilon$ . Since $\gamma=+\infty$ in that case, equation (20) yields $|l^{\hat{x}^{n}_{\lambda},\hat{y}^{n}_{\lambda}}_{\lambda}(t)-tv|\leq 5\varepsilon$ for all $\lambda$ small enough, uniformly in $t$ .

Case 2 : $p^{*}(x,y)=0$ and $\gamma=0$ , hence $\gamma_{n}\leq 1$ (up to chosing a larger $n$ ). Then Proposition 7 b) implies $|{g}(x,y)-v|\leq 4\varepsilon$ , and equation (20) yields $|l^{\hat{x}^{n}_{\lambda},\hat{y}^{n}_{\lambda}}_{\lambda}(t)-tv|\leq 7\varepsilon$ for all $\lambda$ small enough, uniformly in $t$ .

Case 3 : $p^{*}(x,y)=0$ and $\gamma\in]0,+\infty[$ , hence $\frac{\gamma}{2}\leq\gamma_{n}\leq 1+\gamma$ (up to chosing a larger $n$ ). Then Proposition 7 b) implies $|{g}(x,y)-v|\leq 2(2+\gamma)\varepsilon$ . Moreover, $p^{*}(x,y)=0$ implies that ${\overline{g}}^{*}(\hat{x}^{n}_{\lambda},\hat{y}^{n}_{\lambda})$ converges to $\displaystyle{\frac{aG^{*}(x^{\prime}_{n},y)+bG^{*}(x,y^{\prime}_{n})}{ap^{*}(x^{\prime}_{n},y)+bp^{*}(x,y^{\prime}_{n})}}$ as $\lambda$ go to 0, and by Proposition 7 c) $\left|\displaystyle{\frac{aG^{*}(x^{\prime}_{n},y)+bG^{*}(x,y^{\prime}_{n})}{ap^{*}(x^{\prime}_{n},y)+bp^{*}(x,y^{\prime}_{n})}}-v\right|\leq 3\frac{1+\gamma/2}{\gamma/2}\varepsilon$ .

Hence equation (20) yields $|l^{\hat{x}^{n}_{\lambda},\hat{y}^{n}_{\lambda}}_{\lambda}(t)-tv|\leq(7+2\gamma+3\frac{1+\gamma/2}{\gamma/2})\varepsilon$ for all $\lambda$ small enough, uniformly in $t$ .

Case 4 : $p^{*}(x,y)=0$ and $\gamma=+\infty$ , hence $\gamma_{n}\geq 1$ (up to chosing a larger $n$ ). Then ${\overline{g}}^{*}(\hat{x}^{n}_{\lambda},\hat{y}^{n}_{\lambda})$ converges to $\displaystyle{\frac{aG^{*}(x^{\prime}_{n},y)+bG^{*}(x,y^{\prime}_{n})}{ap^{*}(x^{\prime}_{n},y)+bp^{*}(x,y^{\prime}_{n})}}$ as $\lambda$ go to 0, and by Proposition 7 c) $\left|\displaystyle{\frac{aG^{*}(x^{\prime}_{n},y)+bG^{*}(x,y^{\prime}_{n})}{ap^{*}(x^{\prime}_{n},y)+bp^{*}(x,y^{\prime}_{n})}}-v\right|\leq 6\varepsilon$ . Thus equation (20) yields $|l^{\hat{x}^{n}_{\lambda},\hat{y}^{n}_{\lambda}}_{\lambda}(t)-tv|\leq 9\varepsilon$ for all $\lambda$ small enough, uniformly in $t$ .

As claimed, in every case we see that $l(t)=tv$ . ∎

Remark 11.

Recall that $Q(\cdot)$ and $l(\cdot)$ represent expected cumulated occupation measure and payoff. By deriving these quantities with respect to $t$ we get that the asymptotic probability $q(t)$ of still being in the non absorbing state at time $t$ is $(1-t)^{\gamma}$ , and that the current asymptotic payoff is $v$ at any time.

Remark 12.

Let us give a simple heuristic behind the form $(1-t)^{\gamma}$ for $q(t)$ . Assuming that this quantity is well defined and smooth, note that at time $t$ the remaining game has a length $1-t$ and weight $1-q(t)$ hence by renormalization (see figure below)

[TABLE]

so that

[TABLE]

which leads, with $q(0)=0$ to $q(t)=1-(1-t)^{\gamma}$ for some $\gamma$ .

[math] $t$$1$$q(t)$$1$$q^{\prime}(0)$$q^{\prime}(t)$

Let us illustrate now the four cases in the preceding proof by giving examples.

Example 13.

Consider the absorbing game

[TABLE]

with asymptotic value 1. Let $x=1/2T+1/2B$ . Then $(x,x,n)$ is $1/n$ -optimal in ${\bf A}$ for Player 1, while any $(y,y^{\prime},b)$ is optimal for Player 2. Since $p^{*}(x,y)>0$ for all $y$ case 1occurs for any choice of $(y,y^{\prime},b)$ , hence the corresponding $\gamma$ is $+\infty$ and $Q(t)=0$ for all $t$ .

Notice that in $\Gamma_{\lambda}$ the only optimal stationary strategy is $(1/2,1/2)$ for each player, leading to the same asymptotic trajectory $Q(.)=0$ . I

Example 14.

Consider the absorbing game

[TABLE]

with asymptotic value 1. Then $(B,T,n)$ is $1/n$ -optimal in ${\bf A}$ , while any $(y,y^{\prime},b)$ is optimal. The associated $\gamma$ is $ny(L)$ , hence either $y(L)=0$ and case 2 holds with $\gamma=0$ and $Q(t)=t$ , or $y(L)>0$ and case 4 occurs with $\gamma=+\infty$ and $Q(t)=0$ .

Notice that in $\Gamma_{\lambda}$ the only optimal stationary strategy is $x_{\lambda}=y_{\lambda}=(\frac{\sqrt{\lambda}}{1+\sqrt{\lambda}},\frac{1}{1+\sqrt{\lambda}})$ for each player. Since $p^{*}(x_{\lambda},y_{\lambda})=\frac{\lambda}{(1+\sqrt{\lambda})^{2}}\sim\lambda$ , Lemma 8 implies that the asymptotic trajectory associated to optimal strategies is $Q(t)=t-\frac{t^{2}}{2}$ . Moreover for any $\gamma\geq 0$ the strategy of Player 2 $z_{\lambda}=(\frac{\gamma\sqrt{\lambda}}{1+\sqrt{\lambda}},1-\frac{\gamma\sqrt{\lambda}}{1+\sqrt{\lambda}})$ is $\varepsilon$ -optimal in $\Gamma_{\lambda}$ for $\lambda$ small enough, and $p^{*}(x_{\lambda},z_{\lambda})\sim\gamma\lambda$ hence any $Q(t)$ of the form $\frac{1-(1-t)^{1+\gamma}}{1+\gamma}$ is an asymptotic behavior.

Example 15.

Consider the Big Match

[TABLE]

with asymptotic value 1/2. Let $y=1/2L+1/2R$ . Then $(B,T,1)$ and $(y,y,0)$ are optimal in ${\bf A}$ , with $\gamma=1$ . Hence case 3 holds, and the corresponding $Q(t)$ is $t-\frac{t^{2}}{2}$ . The optimal strategies in $\Gamma_{\lambda}$ are $(\frac{\lambda}{1+\lambda},\frac{1}{1+\lambda})$ and $(1/2,1/2)$ respectively, leading to the same $Q(t)=t-\frac{t^{2}}{2}$ .

3.3. Finite case

We now prove that when the game is finite, the limit payoff trajectory is linear for every couple of near optimal stationary strategies, not only those given by Proposition 5. That is, $l(t)=tv$ is a strong limit behavior for the cumulated payoff.

Proposition 16.

Let $\Gamma$ be a finite absorbing game with asymptotic value $v$ , $x_{\lambda}$ and $y_{\lambda}$ families of $\varepsilon(\lambda)$ -optimal stationary strategies in $\Gamma_{\lambda}$ , with $\varepsilon(\lambda)$ going to 0 as $\lambda$ goes to 0. Then for every $t\in[0,1]$ , $l^{{x}_{\lambda},{y}_{\lambda}}_{\lambda}(t)$ converges to $tv$ as $\lambda$ goes to 0.

We will use in the proof of this proposition the following elementary lemma given without proof.

Lemma 17.

Let $a,b,c,d$ be real numbers with $c$ and $d$ positive. Then $min(\frac{a}{c},\frac{b}{d})\leq\frac{a+b}{c+d}\leq max(\frac{a}{c},\frac{b}{d})$ with equality if and only if $\frac{a}{c}=\frac{b}{d}$

Proof of Proposition 16.

The result is clear for $t=0$ or $1$ , assume by contradiction that it is false for some $t\in]0,1[$ . Hence there is a sequence $\lambda_{n}$ going to 0 and optimal strategies $x_{\lambda_{n}}$ and $y_{\lambda_{n}}$ such that $l^{{x}_{\lambda_{n}},{y}_{\lambda_{n}}}_{\lambda_{n}}(t)$ converges to $tw$ with $w\neq v$ . Up to extraction of subsequences, $x_{\lambda_{n}}$ and $y_{\lambda_{n}}$ converge to $x$ and $y$ respectively. Also up to extraction, all the following limits exist in $[0,+\infty]$ : $\alpha(i)=\lim_{n\rightarrow\infty}\frac{x_{\lambda_{n}}(i)}{\lambda_{n}}$ , $\beta(j)=\lim_{n\rightarrow\infty}\frac{y_{\lambda_{n}}(j)}{\lambda_{n}}$ , and $\gamma=\lim_{n\rightarrow\infty}\frac{p^{*}(x_{\lambda_{n}},y_{\lambda_{n}})}{\lambda_{n}}$ . If $\gamma\neq 0$ (and hence $p^{*}(x_{\lambda_{n}},y_{\lambda_{n}})>0$ for $n$ large enough), denote $\overline{g}^{*}(x,y):=\lim\frac{G^{*}(x_{\lambda_{n}},y_{\lambda_{n}})}{p^{*}(x_{\lambda_{n}},y_{\lambda_{n}})}$ , which also exists up to extraction.

Recall formula (4):

[TABLE]

and since $x_{\lambda}$ and $y_{\lambda}$ are families of $\varepsilon(\lambda)$ -optimal strategies in $\Gamma_{\lambda}$ , $r_{\lambda_{n}}(x_{\lambda_{n}},y_{\lambda_{n}})$ converges to $v$ as $n$ tends to infinity.

Recall that by Lemma 8, at the limit

[TABLE]

We first claim that $\gamma\in]0,+\infty[$ .

If $\gamma=0$ , $l(t)=tg(x,y)$ , and by near optimality of $x_{\lambda}$ and $y_{\lambda}$ $v=l(1)=g(x,y)$ , hence $l(t)=tv$ a contradiction.

If $\gamma=+\infty$ , $l(t)=t\overline{g}^{*}(x,y)$ , and by near optimality of $x_{\lambda}$ and $y_{\lambda}$ $v=l(1)=g^{*}(x,y)$ , hence $l(t)=tv$ a contradiction.

Hence $\gamma\in]0,+\infty[$ , and $w$ is a nontrivial convex combination of $g(x,y)$ and $\overline{g}^{*}(x,y)$ . Since $v=l(1)$ is also a convex combination of $g(x,y)$ and $\overline{g}^{*}(x,y)$ , the assumption that $v\neq w$ implies $g(x,y)\neq\overline{g}^{*}(x,y)$ . Assume without loss of generality $g(x,y)<v<\overline{g}^{*}(x,y)$ .

We class the actions $i$ of Player 1 in 4 categories $I_{1}$ to $I_{4}$ :

•

$i\in I_{1}$ if $x(i)>0$ ,

•

$i\in I_{2}$ if $x(i)=0$ and $\alpha(i)=+\infty$ ,

•

$i\in I_{3}$ if $x(i)=0$ and $\alpha(i)\in]0,+\infty[$ ,

•

$i\in I_{4}$ if $x(i)=0$ and $\alpha(i)=0$ .

Hence actions of category 1 are of order 1, actions of category 3 are of order $\lambda$ , actions of category 4 are of order $o(\lambda)$ , and actions of category 2 are played with probability going to 0 but large with respect to $\lambda$ . Define categories $J_{1}$ to $J_{4}$ of player 2 in a similar way. By definition,

[TABLE]

Recall that the left hand side converges to $\gamma\in]0,+\infty[$ , hence up to extraction $\frac{p^{*}(i,j)x_{\lambda_{n}}(i){y_{\lambda_{n}}}(j)}{\lambda_{n}}$ converge in $[0,+\infty[$ for any $i$ and $j$ , denote by $\delta_{ij}$ the limit. If $i$ and $j$ are of category $k$ and $l$ with $k+l>4$ then $\delta_{ij}=0$ . If $k+l<4$ then $\frac{x_{\lambda_{n}}(i){y_{\lambda_{n}}}(j)}{\lambda_{n}}$ diverges to $+\infty$ which implies that $p^{*}(i,j)=0=\delta_{ij}$ .

Hence going to the limit in (22) we get that $\gamma=\gamma_{1,3}+\gamma_{2,2}+\gamma_{3,1}$ where where $\gamma_{1,3}:=\sum_{I_{1}\times J_{3}}p^{*}(i,j)x(i)\beta(j)$ , $\gamma_{2,2}:=\sum_{I_{2}\times J_{2}}\delta_{ij}$ , and $\gamma_{3,1}:=\sum_{I_{3}\times J_{1}}p^{*}(i,j)\alpha(i)y(j)$ . Recall that $\gamma>0$ thus at least one of $\gamma_{1,3}$ , $\gamma_{2,2}$ or $\gamma_{3,1}$ is positive as well.

Similarly, passing to the limit in the definition of $G^{*}(x_{\lambda_{n}},y_{\lambda_{n}}):=\sum_{I\times J}G^{*}(i,j)x_{\lambda_{n}}(i)y_{\lambda_{n}}(j)$ yields $\mu:=\lim\frac{G_{kl}^{*}(x_{\lambda_{n}},y_{\lambda_{n}})}{\lambda_{n}}=\mu_{1,3}+\mu_{2,2}+\mu_{3,1}$ where $\mu_{1,3}:=\sum_{I_{1}\times J_{3}}x(i)\beta(j)G^{*}(i,j)$ , $\mu_{2,2}:=\sum_{I_{2}\times J_{2}}\delta_{ij}\overline{g}^{*}(i,j)$ , and $\mu_{3,1}:=\sum_{I_{3}\times J_{1}}\alpha(i)y(j)G^{*}(i,j)$ . Note that if $\gamma_{k,4-k}=0$ for some $k$ then $\mu_{k,4-k}=0$ as well.

Finally, going to the limit in equation (21) yields

[TABLE]

and similarly going to the limit in the definition of $\overline{g}^{*}(x,y):=\lim\frac{G^{*}(x_{\lambda_{n}},y_{\lambda_{n}})}{p^{*}(x_{\lambda_{n}},y_{\lambda_{n}})}$ yields

[TABLE]

Recall that we assumed $\overline{g}^{*}(x,y)>v$ . By Lemma 17, there exists $k$ such that $\gamma_{k,4-k}>0$ and $\frac{\mu_{k,4-k}}{\gamma_{k,4-k}}\geq\overline{g}^{*}(x,y)>v$ .

Assume first that $k=1$ . Consider now the following strategy $y^{\prime}_{\lambda_{n}}$ : $y^{\prime}_{\lambda_{n}}(j)=0$ for $j\in J_{3}$ , and $y^{\prime}_{\lambda_{n}}(j)=y_{\lambda_{n}}(j)$ for all other $j$ except for an arbitrary $j_{0}\in J_{1}$ for which $y^{\prime}_{\lambda_{n}}(j_{0})=y_{\lambda_{n}}(j_{0})+\sum_{j\in J_{3}}y_{\lambda_{n}}(j)$ . The only effect of this deviation is that now $\gamma^{\prime}_{1,3}=\mu^{\prime}_{1,3}=0$ . Hence

[TABLE]

which is strictly less than $v$ by Lemma 17 since $v<\frac{\mu_{1,3}}{\gamma_{1,3}}$ ; this contradicts the $\epsilon(\lambda_{n})$ -optimality of $x_{\lambda_{n}}$ .

Assume next that $k=2$ . Consider now the following strategy $y^{\prime}_{\lambda_{n}}$ : $y^{\prime}_{\lambda_{n}}(j)=0$ for $j\in J_{2}$ , and $y^{\prime}_{\lambda_{n}}(j)=y_{\lambda_{n}}(j)$ for all other $j$ except for an arbitrary $j_{0}\in J_{1}$ for which $y^{\prime}_{\lambda_{n}}(j_{0})=y_{\lambda_{n}}(j_{0})+\sum_{j\in J_{2}}y_{\lambda_{n}}(j)$ . The only effect of this deviation is that now $\gamma^{\prime}_{2,2}=\mu^{\prime}_{2,2}=0$ . Hence

[TABLE]

which is strictly less than $v$ by Lemma 17 since $v<\frac{\mu_{2,2}}{\gamma_{2,2}}$ ; this again contradicts the $\epsilon(\lambda_{n})$ -optimality of $x_{\lambda_{n}}$ .

Finally assume that $k=3$ . Consider now the following strategy $x^{\prime}_{\lambda_{n}}$ : $x_{\lambda_{n}}(i)=2x_{\lambda_{n}}(i)$ for $i\in I_{3}$ and $x^{\prime}_{\lambda_{n}}(i)=x_{\lambda_{n}}(i)$ for all other $i$ except for an arbitrary $i_{0}\in I_{1}$ for which $x^{\prime}_{\lambda_{n}}(i_{0})=x_{\lambda_{n}}(i_{0})-\sum_{j\in J_{3}}x_{\lambda_{n}}(j)$ (which is nonnegative for $n$ large enough). The only effect of this deviation is that now $\gamma^{\prime}_{3,1}=2\gamma_{3,1}$ and $\mu^{\prime}_{3,1}=2\mu_{3,1}$ . Hence

[TABLE]

which is strictly more than $v$ by Lemma 17 since $v<\frac{\mu_{3,1}}{\gamma_{3,1}}$ ; this contradicts the $\varepsilon(\lambda_{n})$ -optimality of $y_{\lambda_{n}}$ . ∎

4. Stochastic finite games

4.1. Non algebraic limit trajectories

Consider the following zero-sum stochastic game with two non absorbing states and two actions for each player. In the first state $s_{1}$ (which is the starting state) the payoff and transitions are as follows:

[TABLE]

where $*$ denotes absorption and $+$ that there is a deterministic transition to state 2. Starting from the second state $s_{2}$ the game is a linear variation of the Big Match:

[TABLE]

Since $v_{\lambda}(s_{2})=0$ for all $\lambda$ and since there is no return once the play has entered state $s_{2}$ , it implies that the optimal play in state $s_{1}$ is the same than in the Big Match, in which $\gamma=1$ . So in both states the optimal strategies in $\Gamma_{\lambda}$ are $D+\lambda U$ for Player 1 and $1/2L+1/2R$ for Player 2. By a scaling of time, the preceding section tells us that at the limit game, the probability of being in state 2 at time $t$ , given that there were transition from $s_{1}$ to $s_{2}$ at time $z$ , is $\frac{1-t}{1-z}$ . Since (also from the preceding section) the time of transition from $s_{1}$ to another state (which is $s_{2}$ with probability $1/2$ ) has a uniform law on $[0,1]$ , the probability of being in $s_{2}$ at time $t$ is

[TABLE]

Notice that this not an algebraic function of $t$ as it was always the case in the preceding section. Similarly the probability of absorption before time $t$ is $1-(1-t)(1-\frac{\ln(1-t)}{2})$ and is also non algebraic.

4.2. No $\varepsilon-$ optimal strategies of the form $x+a\lambda x^{\prime}$

In the following game with two non absorbing states the payoff is always 1 in state $a$ and -1 in state $b$ with the following deterministic transitions

[TABLE]

It is easy to see that the asymptotic value is 0 and that optimal strategies in the $\lambda$ -discounted game put a weight $\sim\sqrt{\lambda}$ on $D$ and $R$ in both states, hence the absorbing probability is of the order of $\lambda$ per stage in each state.

We show that strategies of the form $(x_{a}+C_{a}\lambda x^{\prime}_{a}$ , $x_{b}+C_{b}\lambda x^{\prime}_{b})$ cannot guarantee more than -1 to player 1, as $\lambda$ goes to [math].

If $x_{a}(D)x_{b}(D)>0$ , player 2 plays $L$ in $a$ and $R$ in $b$ inducing an absorbing payoff of -1.

From $a$ we reach eventually $b$ were $-1$ has a positive probability.

If $x_{a}(D)=0,x_{b}(D)>0$ , player plays $R$ in both games inducing an absorbing payoff of -1.

The payoff will be absorbing with high probability in finite time and the relative probability of $1^{*}$ vanishes with $\lambda$ .

If $x_{a}(D)>0,x_{b}(D)=0$ , player plays $L$ in both games inducing a non absorbing payoff of -1.

For $\lambda$ small, most of the time the state is $b$ .

If $x_{a}(D)=0,x_{b}(D)=0$ , player plays $R$ in game $a$ and $L$ in game $b$ . The event “absorbing payoff of 1” occurs at stage $n$ if $\omega_{n}=a$ and $i_{n}=D$ . Hence $\omega_{n-1}=b$ and $i_{n-1}=D$ . Now this event “ $i_{n}=D$ and $i_{n-1}=D$ ” has probability of order $\lambda^{2}$ . Then the absorbing component of the $\lambda$ discounted payoffs converges to 0 with $\lambda$ . Moreover the non absorbing payoff is mainly $-1$ .

5. An absorbing game with compact action sets and non linear LOTP

We consider the following absorbing game with compact actions sets. There are three states, two absorbing $0^{*}$ and $-1^{*}$ , and the non absorbing state $\omega$ , in which the payoff is 1 whatever the actions taken. The sets of action are $X=Y=\{0\}\cup\{1/n,\ n\in\mathbf{N}^{*}\}$ with the usual distance. The probabilities of absorption are given by :

[TABLE]

and

[TABLE]

It is easily checked that both functions $\rho(0^{*}|\cdot\cdot)$ and $\rho(-1^{*}|\cdot\cdot)$ are (jointly) continuous.

Proposition 18.

*For any discount factor $\lambda\in]0,1]$ , 0 (resp. 1) is optimal for Player 1 (resp. Player 2) in the $\lambda$ -discounted game, and $v_{\lambda}=\lambda$ .

The corresponding payoff trajectory is: $l(t)=0$ on $[0,1]$ .*

Proof.

Action 0 of Player 1 ensures that there will never be absorption to state $-1^{*}$ , and thus that the stage payoff from stage 2 on is nonnegative. Action 1 of Player 2 ensures that there will be absorption with probability 1 at the end of stage 1, and thus that the stage payoff from stage 2 on is nonpositive. Since the payoff in stage 1 is 1 irrespective of player’s actions, the proposition is established.

Notice that the play under this couple of optimal strategies is simple: there is immediate absorption to $0^{*}$ , and in particular the limit payoff trajectory is linear and equals 0 for every time $t$ . ∎

We now prove that there are other $\varepsilon$ -optimal strategies, with a different limit payoff trajectory. Denote $\{\lambda\}:=\frac{1}{[1/\lambda]}$ where $[\cdot]$ is the integer part ; hence $\lambda\leq\{\lambda\}<\frac{\lambda}{1-\lambda}$ and $1/\{\lambda\}\in\mathbb{N}^{*}$ for all $\lambda\in]0,1]$ .

Proposition 19.

*For any discount factor $\lambda\in]0,1]$ , $\{\lambda\}$ is $\lambda$ -optimal for Player 1 and $\sqrt{\lambda}$ -optimal for Player 2 in the $\lambda$ -discounted game.

The corresponding payoff trajectory is: $l(t)=t-t^{2}$ .*

Proof.

If both players play $\{\lambda\}$ , the payoff in the $\lambda$ -discounted game is, according to formula (4),

[TABLE]

which is nonnegative since $\{\lambda\}<\frac{\lambda}{1-\lambda}$ . On the other hand, since $\lambda\leq\{\lambda\}$ , one gets $r_{\lambda}(\{\lambda\},\{\lambda\})\leq\lambda$ .

If Player 1 plays $\{\lambda\}$ while Player 2 plays $y\neq\{\lambda\}$ , there is no absorption to $-1^{*}$ hence $r_{\lambda}(\{\lambda\},y)\geq 0$ .

Thus $\{\lambda\}$ is $\lambda$ -optimal for Player 1.

If If Player 2 plays $\{\lambda\}$ while Player 1 plays $x\neq\{\lambda\}$ , then, according once again to formula (4),

[TABLE]

Thus $\{\lambda\}$ is $\sqrt{\lambda}$ -optimal for Player 2.

Notice that while the limit value is 0 and $(\{\lambda\},\{\lambda\})$ is a couple of near optimal strategies, along the induced play the nonabsorbing payoff is 1 and the absorbing payoff is -1. One can compute that the associated $\gamma$ is 1, hence under these strategies $Q(t)=t-\frac{t^{2}}{2}$ . So that the accumulated limit payoff up to time $t$ is $t-t^{2}$ , which is non linear and positive for every $t\in]0,1[$ . ∎

Basically the players use a jointly controlled procedure either to follow $(\{\lambda\},\{\lambda\})$ or to get at most (resp. at least) 0.

6. Concluding comments

A first serie of interesting open questions is directly related to the results presented here like:

extension of Proposition 12 to general (not stationary) strategies,

or more generally analysis in the framework of arbitrary (not discounted) evaluations and general stochastic games.

It is also natural to consider other families of repeated games: a first class that is of interest is games with incomplete information. The natural equivalent of $LOTM$ is in this framework is the speed at which the information is transmitted during the game.

Bibliography16

The reference list from the paper itself. Each links out to its DOI / PubMed record.

1[1] Aumann R.J. and M. Maschler: Repeated Games with Incomplete Information , M.I.T. Press, (1995).
2[2] Cardaliaguet P., R. Laraki and S. Sorin: A continuous time approach for the asymptotic value in two-person zero-sum repeated games. SIAM J. Control and Optimization, 50 , 1573-1596, (2012).
3[3] Kohlberg E.: Repeated games with absorbing states. Annals of Statistics , 2 , 724-738, (1974).
4[4] Laraki R.: Explicit formulas for repeated games with absorbing states. International Journal of Game Theory, 39 , 53-69, (2010).
5[5] Mertens J.-F. and A. Neyman: Stochastic games. International Journal of Game Theory , 10 , 53-66, (1981).
6[6] Mertens J.-F., A. Neyman and D. Rosenberg: Absorbing games with compact action spaces. Mathematics of Operations Research, 34 , 257-262, (2009).
7[7] Mertens J.-F. and S. Zamir: The value of two-person zero-sum repeated games with lack of information on both sides. International Journal of Game Theory , 1 , 39-64, (1971).
8[8] Oliu-Barton M. and B. Ziliotto: Constant payoff in zero-sum stochastic games, ar Xiv:1811.04518 v 1, (2018).

TL;DR

Contribution

Findings

Abstract

Peer Reviews

Videos

Taxonomy

Zero-sum stochastic games: limit optimal trajectories

Abstract.

1. Introduction

2. Limit optimal trajectories

Definition 1**.**

Definition 2**.**

Definition 3**.**

Remark 4**.**

3. Absorbing games

3.1. An auxiliary game

3.1.1. General properties

Proposition 5**.**

Proof.

Corollary 6**.**

Proof.

3.1.2. Further properties of optimal strategies

Proposition 7**.**

Proof.

3.2. Asymptotics properties in Γλ\Gamma_{\lambda}Γλ​

Lemma 8**.**

Proof.

Corollary 9**.**

Proposition 10**.**

Proof.

Remark 11**.**

Remark 12**.**

Example 13**.**

Example 14**.**

Example 15**.**

3.3. Finite case

Proposition 16**.**

Lemma 17**.**

Proof of Proposition 16.

4. Stochastic finite games

4.1. Non algebraic limit trajectories

4.2. No ε−\varepsilon-ε−optimal strategies of the form x+aλx′x+a\lambda x^{\prime}x+aλx′

5. An absorbing game with compact action sets and non linear LOTP

Proposition 18**.**

Proof.

Proposition 19**.**

Proof.

6. Concluding comments

Definition 1.

Definition 2.

Definition 3.

Remark 4.

Proposition 5.

Corollary 6.

Proposition 7.

3.2. Asymptotics properties in $\Gamma_{\lambda}$

Lemma 8.

Corollary 9.

Proposition 10.

Remark 11.

Remark 12.

Example 13.

Example 14.

Example 15.

Proposition 16.

Lemma 17.

4.2. No $\varepsilon-$ optimal strategies of the form $x+a\lambda x^{\prime}$

Proposition 18.

Proposition 19.