Gambling and R\'enyi Divergence

C\'edric Bleuler; Amos Lapidoth; Christoph Pfister

arXiv:1901.06278·cs.IT·April 29, 2019

Gambling and R\'enyi Divergence

C\'edric Bleuler, Amos Lapidoth, Christoph Pfister

PDF

TL;DR

This paper introduces a new family of utility functions for horse gambling, connecting optimal betting strategies to Re9nyi divergence, and extends the analysis to scenarios with side information, leading to a novel conditional divergence.

Contribution

It proposes a one-parameter utility family encompassing Kelly and expected-return criteria, linking them to Re9nyi divergence, and introduces a new conditional divergence for informed betting strategies.

Findings

01

Derived strategies that maximize the new utility functions.

02

Established the connection between optimal strategies and Re9nyi divergence.

03

Introduced a novel conditional Re9nyi divergence for side information scenarios.

Abstract

For gambling on horses, a one-parameter family of utility functions is proposed, which contains Kelly's logarithmic criterion and the expected-return criterion as special cases. The strategies that maximize the utility function are derived, and the connection to the R\'enyi divergence is shown. Optimal strategies are also derived when the gambler has some side information; this setting leads to a novel conditional R\'enyi divergence.

Equations165

S ≜ b_{X} o_{X},

S ≜ b_{X} o_{X},

n \to \infty lim \frac{1}{n} lo g \frac{γ _{n}}{γ _{0}} = E [lo g S],

n \to \infty lim \frac{1}{n} lo g \frac{γ _{n}}{γ _{0}} = E [lo g S],

U_{β} ≜ \frac{1}{β} lo g E [S^{β}],

U_{β} ≜ \frac{1}{β} lo g E [S^{β}],

U_{β} = lo g [i = 1 \sum m p_{i} (b_{i} o_{i})^{β}]^{\frac{1}{β}} .

U_{β} = lo g [i = 1 \sum m p_{i} (b_{i} o_{i})^{β}]^{\frac{1}{β}} .

D_{α} (p_{X} ∥ q_{X}) ≜ \frac{1}{α - 1} lo g x \sum p (x)^{α} q (x)^{1 - α} .

D_{α} (p_{X} ∥ q_{X}) ≜ \frac{1}{α - 1} lo g x \sum p (x)^{α} q (x)^{1 - α} .

\IEEEeqnarraymulticol 3 l D_{α} (p_{X ∣ Y} ∥ q_{X ∣ Y} ∣ p_{Y})

\IEEEeqnarraymulticol 3 l D_{α} (p_{X ∣ Y} ∥ q_{X ∣ Y} ∣ p_{Y})

≜

c ≜ [i = 1 \sum m \frac{1}{o _{i}}]^{- 1},

c ≜ [i = 1 \sum m \frac{1}{o _{i}}]^{- 1},

r_{i} ≜ \frac{c}{o _{i}} .

r_{i} ≜ \frac{c}{o _{i}} .

\frac{1}{β} lo g E [S^{β}] = lo g c + D_{\frac{1}{1 - β}} (p ∥ r) - D_{1 - β} (g ∥ b),

\frac{1}{β} lo g E [S^{β}] = lo g c + D_{\frac{1}{1 - β}} (p ∥ r) - D_{1 - β} (g ∥ b),

g_{i} ≜ \frac{p _{i}^{\frac{1}{1 - β}} o _{i}^{\frac{β}{1 - β}}}{\sum _{j = 1}^{m} p _{j}^{\frac{1}{1 - β}} o _{j}^{\frac{β}{1 - β}}} .

g_{i} ≜ \frac{p _{i}^{\frac{1}{1 - β}} o _{i}^{\frac{β}{1 - β}}}{\sum _{j = 1}^{m} p _{j}^{\frac{1}{1 - β}} o _{j}^{\frac{β}{1 - β}}} .

β \to 0 lim \frac{1}{β} lo g E [S^{β}]

β \to 0 lim \frac{1}{β} lo g E [S^{β}]

=

\displaystyle\frac{1}{\beta}\log\mathrm{E}[S^{\beta}]\leq\log\max_{i\in\{1,\ldots,m\}}\bigl{(}\hskip 0.82993ptp_{i}^{1/\beta}o_{i}^{\vphantom{1/\beta}}\bigr{)}.

\displaystyle\frac{1}{\beta}\log\mathrm{E}[S^{\beta}]\leq\log\max_{i\in\{1,\ldots,m\}}\bigl{(}\hskip 0.82993ptp_{i}^{1/\beta}o_{i}^{\vphantom{1/\beta}}\bigr{)}.

b_{i} = {10 if i = i^{*}, otherwise,

b_{i} = {10 if i = i^{*}, otherwise,

\displaystyle p_{i^{*}}^{1/\beta}o_{i^{*}}^{\vphantom{1/\beta}}=\max_{i\in\{1,\ldots,m\}}\bigl{(}\hskip 0.82993ptp_{i}^{1/\beta}o_{i}^{\vphantom{1/\beta}}\bigr{)}.

\displaystyle p_{i^{*}}^{1/\beta}o_{i^{*}}^{\vphantom{1/\beta}}=\max_{i\in\{1,\ldots,m\}}\bigl{(}\hskip 0.82993ptp_{i}^{1/\beta}o_{i}^{\vphantom{1/\beta}}\bigr{)}.

β \to + \infty lim \frac{1}{β} lo g E [S^{β}]

β \to + \infty lim \frac{1}{β} lo g E [S^{β}]

\leq

b_{i} = {10 if i = i^{*}, otherwise,

b_{i} = {10 if i = i^{*}, otherwise,

β \to - \infty lim \frac{1}{β} lo g E [S^{β}]

β \to - \infty lim \frac{1}{β} lo g E [S^{β}]

\leq

c ≜ [x \sum \frac{1}{o ( x )}]^{- 1},

c ≜ [x \sum \frac{1}{o ( x )}]^{- 1},

r_{X} (x) ≜ \frac{c}{o ( x )} .

r_{X} (x) ≜ \frac{c}{o ( x )} .

\tilde{S} ≜ b_{X ∣ Y} (X ∣ Y) o (X) .

\tilde{S} ≜ b_{X ∣ Y} (X ∣ Y) o (X) .

\frac{1}{β} lo g E [\tilde{S}^{β}]

\frac{1}{β} lo g E [\tilde{S}^{β}]

g (x ∣ y)

g (x ∣ y)

g (y)

0

0

\leq

D_{α} (p_{X} ∥ r_{X}) \leq D_{α} (p_{X ∣ Y} ∥ r_{X} ∣ p_{Y}) .

D_{α} (p_{X} ∥ r_{X}) \leq D_{α} (p_{X ∣ Y} ∥ r_{X} ∣ p_{Y}) .

S_{0} ≜ b_{0} + b_{X} o_{X} .

S_{0} ≜ b_{0} + b_{X} o_{X} .

\frac{1}{β} lo g E [S_{0}^{' β}] \geq \frac{1}{β} lo g E [S_{0}^{β}] .

\frac{1}{β} lo g E [S_{0}^{' β}] \geq \frac{1}{β} lo g E [S_{0}^{β}] .

J

J

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Full text

Gambling and Rényi Divergence

Cédric Bleuler, Amos Lapidoth, and Christoph Pfister

Signal and Information Processing Laboratory

ETH Zurich, 8092 Zurich, Switzerland

Email: [email protected]; {lapidoth,pfister}@isi.ee.ethz.ch

Abstract

For gambling on horses, a one-parameter family of utility functions is proposed, which contains Kelly’s logarithmic criterion and the expected-return criterion as special cases. The strategies that maximize the utility function are derived, and the connection to the Rényi divergence is shown. Optimal strategies are also derived when the gambler has some side information; this setting leads to a novel conditional Rényi divergence.

I Introduction

Consider a horse race with $m\geq 1$ horses $1,\ldots,m$ , where the $i$ -th horse wins with probability $p_{i}>0$ , and on which a bookie offers odds $o_{i}>0$ for $1$ . A gambler spends all her wealth $\gamma_{0}>0$ to place bets on the horses. Let $b_{i}\geq 0$ denote the fraction of $\gamma_{0}$ that the gambler bets on the $i$ -th horse. Let the random variable $X$ denote the winning horse, and define the wealth relative $S$ as

[TABLE]

so the gambler’s wealth after one race is $\gamma_{1}=\gamma_{0}\hskip 0.82993ptS$ .

Kelly [1] observed that in the setting where the odds and winning probabilities remain constant over many independent races and the gambler keeps investing all her wealth with the same relative allocation $b_{1},\ldots,b_{m}$ , the exponential rate of growth of the gambler’s wealth tends to $\mathrm{E}[\log S]$ with probability one, i.e.,

[TABLE]

where $\gamma_{n}$ denotes the gambler’s wealth after $n$ horse races, and $\log\hskip 0.82993pt(\cdot)$ denotes the base-2 logarithm. The RHS of (2) is known as the doubling rate [2, Section 6.1].

In this paper, we seek betting strategies that maximize

[TABLE]

where $\beta\in\mathbb{R}\setminus\{0\}$ is a parameter. This family of utility functions generalizes several important cases:

a)

In the limit as $\beta$ tends to zero, $U_{\beta}$ tends to the doubling rate $\mathrm{E}[\log S]$ , and we recover Kelly’s result: irrespective of the odds, the optimal strategy is proportional betting, i.e., choosing $b_{i}=p_{i}$ for $i\in\{1,\ldots,m\}$ ; see Proposition 2. 2. b)

If $\beta=1$ , then maximizing $U_{\beta}$ is equivalent to maximizing $\mathrm{E}[S]$ , the expected return, and it is optimal to put all the money on a horse that maximizes $p_{i}\hskip 0.82993pto_{i}$ ; see Proposition 3. 3. c)

In general, if $\beta\geq 1$ , then it is optimal to put all the money on one horse; see Proposition 3. This is risky: if that horse loses, the gambler will be broke. 4. d)

In the limit as $\beta$ tends to $+\infty$ , it is optimal to put all the money on a horse that maximizes $o_{i}$ , ignoring the winning probabilities. This strategy maximizes the best-case payoff; see Proposition 4. 5. e)

In the limit as $\beta$ tends to $-\infty$ , it is optimal to choose $b_{i}=c/o_{i}$ for $i\in\{1,\ldots,m\}$ , where $c$ is the normalizing constant defined in (7) ahead. This strategy maximizes the worst-case payoff and is completely risk-free: irrespective of which horse wins, $S=c$ ; see Proposition 5.

Our utility function has the following underlying structure: it is the logarithm of a (weighted) power mean [3, 4]:

[TABLE]

For $\beta\in\{-\infty,0,1,\infty\}$ , the power mean is equal to the minimum, the geometric mean, the arithmetic mean, and the maximum of the set $\{b_{i}\hskip 0.82993pto_{i}\}_{i=1}^{m}$ , respectively. Campbell [5, 6] used a cost function with a structure similar to (4) to provide an operational meaning to the Rényi entropy in source coding. Other information-theoretic examples of exponential moments were studied in [7]. The utility function $U_{\beta}$ can be motivated by risk aversion models in finance theory [8, (8)].

Our main result is Theorem 1, which shows that for $\beta<1$ , $U_{\beta}$ can be written as the sum of three terms; the central role is played by the Rényi divergence. After dealing with the other values of $\beta$ , we treat in Theorem 6 the situation where the gambler, prior to placing her bets, observes some side information. This analysis features a novel conditional Rényi divergence, whose properties are studied in Propositions 7 and 8. In Proposition 9 and Theorem 10, we study the situation where the gambler invests only part of her money.

The rest of this paper is structured as follows: In Section II, we recall the Rényi divergence and define a conditional Rényi divergence, and in Section III, we present our results; all proofs are deferred to Section IV.

II Preliminaries

The following definitions are for probability mass functions (PMFs); the definitions for probability vectors are analogous. When clear from the context, we often omit sets and subscripts: for example, we write $\sum_{x}$ for $\sum_{x\in\mathcal{X}}$ and $p(x)$ for $p_{X}(x)$ . The Rényi divergence of order $\alpha$ between two PMFs $p_{X}$ and $q_{X}$ [9] is defined for positive $\alpha$ other than one as

[TABLE]

Its properties are studied in [10].

Let $p_{Y}$ be a PMF, and let $p_{X|Y}$ and $q_{X|Y}$ be conditional PMFs. We define the conditional Rényi divergence of order $\alpha$ for positive $\alpha$ other than one as

[TABLE]

This definition differs from other definitions of the conditional Rényi divergence [11, (6) and (8)]. Some of its properties are presented in Propositions 7 and 8 ahead.

III Results

We first analyze the situation where the gambler invests all her money, i.e., where $b\triangleq(b_{1},\ldots,b_{m})$ is a probability vector. (A probability vector is a vector with nonnegative components that add up to one.) As in [12, Section 10.3], define

[TABLE]

the probability vector $p\triangleq(p_{1},\ldots,p_{m})$ , and the probability vector $r\triangleq(r_{1},\ldots,r_{m})$ , where for $i\in\{1,\ldots,m\}$ ,

[TABLE]

Theorem 1.

Let $\beta\in(-\infty,0)\cup(0,1)$ , and let $b$ be a probability vector. Then,

[TABLE]

where for $i\in\{1,\ldots,m\}$ ,

[TABLE]

Thus, the choice $b=g$ uniquely maximizes $\frac{1}{\beta}\log\mathrm{E}[S^{\beta}]$ among all probability vectors $b$ .

We see from Theorem 1 that if $\beta\in(-\infty,0)\cup(0,1)$ , then our utility function can be written as the sum of three terms:

The first term, $\log c$ , depends only on the odds and is related to the fairness of the odds. The odds are called subfair if $c<1$ , fair if $c=1$ , and superfair if $c>1$ . 2. 2.

The second term, $D_{\!\frac{1}{1-\beta}}(p\|r)$ , is related to the bookie’s estimate of the winning probabilities. It is zero if and only if the odds are inversely proportional to the winning probabilities. 3. 3.

The third term, $-D_{1-\beta}(g\|b)$ , is related to the gambler’s estimate of the winning probabilities. It is zero if and only if $b$ is equal to $g$ .

Proposition 2.

Let $b$ be a probability vector. Then,

[TABLE]

We see from Proposition 2 that in the limit as $\beta$ tends to zero, the doubling rate $\mathrm{E}[\log S]$ is recovered from our utility function. Here, the analog of (9) is (12); note that (12) implies that $\mathrm{E}[\log S]$ is maximized if and only if $b$ is equal to $p$ .

Proposition 3.

Let $\beta\geq 1$ , and let $b$ be a probability vector. Then,

[TABLE]

Equality in (13) can be achieved by choosing

[TABLE]

where $i^{*}\in\{1,\ldots,m\}$ is such that

[TABLE]

We see from Proposition 3 that if $\beta\geq 1$ , then it is optimal to bet on a single horse. Unless $m=1$ , this is not the case when $\beta<1$ : When $\beta<1$ , an optimal betting strategy requires placing a bet on every horse. This follows from Theorem 1 and our assumption that $p_{i}$ and $o_{i}$ are all positive.

Proposition 4.

Let $b$ be a probability vector. Then,

[TABLE]

Equality in (17) can be achieved by choosing

[TABLE]

where $i^{*}\in\{1,\ldots,m\}$ is such that $o_{i^{*}}=\max_{i\in\{1,\ldots,m\}}o_{i}$ .

Proposition 5.

Let $b$ be a probability vector. Then,

[TABLE]

Equality in (20) is achieved if and only if $b_{i}=c/o_{i}$ for all $i\in\{1,\ldots,m\}$ .

Our next result concerns the situation where the gambler observes some side information $Y$ before placing her bets. To that end, we adapt our notation as follows: Let $p_{XY}$ be the joint PMF of $X$ and $Y$ . (Recall that $X$ denotes the winning horse.) Denote the range of $X$ and $Y$ by $\mathcal{X}$ and $\mathcal{Y}$ , respectively. We assume that $p(y)>0$ for all $y\in\mathcal{Y}$ . (Here, we do not assume that the winning probabilities $p(x)$ are positive.) We view the odds as a function $o\colon\mathcal{X}\to\mathbb{R}_{>0}$ . Define

[TABLE]

and the PMF $r_{X}$ for $x\in\mathcal{X}$ as

[TABLE]

(These definitions are equivalent to (7) and (8), respectively.) We continue to assume that the gambler invests all her wealth, so a betting strategy is now a conditional PMF $b_{X|Y}$ . The wealth relative $\tilde{S}$ is defined as

[TABLE]

The following theorem parallels Theorem 1:

Theorem 6.

Let $\beta\in(-\infty,0)\cup(0,1)$ . Then,

[TABLE]

where for $x\in\mathcal{X}$ and $y\in\mathcal{Y}$ ,

[TABLE]

Thus, choosing $b_{X|Y}=g_{X|Y}$ uniquely maximizes $\frac{1}{\beta}\log\mathrm{E}[\tilde{S}^{\beta}]$ among all conditional PMFs $b_{X|Y}$ .

The conditional Rényi divergence $D_{\alpha}({\cdot\|\cdot}|\cdot)$ appearing in Theorem 6 was defined in Section II and seems to be novel. It is easy to see that $D_{\alpha}(p_{X}\|q_{X}|p_{Y})=D_{\alpha}(p_{X}\|q_{X})$ if $p_{X}$ , $q_{X}$ , and $p_{Y}$ are PMFs. We now present some more properties:

Proposition 7.

Let $\alpha\in(0,1)\cup(1,\infty)$ , let $p_{Y}$ be a PMF, and let $p_{X|Y}$ and $q_{X|Y}$ be conditional PMFs. Then,

[TABLE]

Because everything that can be achieved without side information can also be achieved with side information, comparing Theorem 1 and Theorem 6 suggests that $D_{\alpha}(p_{X}\|r_{X})\leq D_{\alpha}(p_{X|Y}\|r_{X}|p_{Y})$ , which is indeed the case:

Proposition 8.

Let $\alpha\in(0,1)\cup(1,\infty)$ , let $p_{XY}$ be a joint PMF, and let $r_{X}$ be a PMF. Then,

[TABLE]

Our last results treat the possibility that the gambler does not invest all her wealth. (We only treat the setting without side information.) Denote by $b_{0}$ the fraction of her wealth that the gambler does not use for betting. Then, $b\triangleq(b_{0},b_{1},\ldots,b_{m})$ is a probability vector, and the wealth relative $S_{0}$ is given by

[TABLE]

If $c\geq 1$ , then it is optimal to invest all the money:

Proposition 9.

Assume $c\geq 1$ , let $\beta\in\mathbb{R}\setminus\{0\}$ , and let $b$ be a probability vector with wealth relative $S_{0}$ . Then, there exists a probability vector $b^{\prime}$ with wealth relative $S_{0}^{\prime}$ satisfying $b_{0}^{\prime}=0$ and

[TABLE]

On the other hand, if the odds are subfair, i.e., if $c<1$ , then investing all the money is not optimal in the case $\beta<1$ , as Claim 3 of the following theorem shows:

Theorem 10.

Assume $c<1$ , let $\beta\in(-\infty,0)\cup(0,1)$ , and let $b^{*}$ be a probability vector that maximizes $\frac{1}{\beta}\log\mathrm{E}[S_{0}^{\beta}]$ among all probability vectors $b$ . Define

[TABLE]

and for $i\in\{1,\ldots,m\}$ ,

[TABLE]

Then, the following claims hold:

The quantity $\Gamma$ is well-defined and satisfies $\Gamma>0$ . 2. 2.

For all $i\in\{1,\ldots,m\}$ ,

[TABLE] 3. 3.

The quantity $b_{0}^{*}$ satisfies

[TABLE]

In particular, $b_{0}^{*}>0$ .

Claim 2 implies that for all $i\in\{1,\ldots,m\}$ , $b_{i}^{*}>0$ if and only if $p_{i}\hskip 0.82993pto_{i}>\Gamma$ . Assuming without loss of generality that $p_{1}\hskip 0.82993pto_{1}\geq p_{2}\hskip 0.82993pto_{2}\geq\ldots\geq p_{m}\hskip 0.82993pto_{m}$ , the set $\mathcal{J}$ thus has a special structure: it is either empty or equal to $\{1,2,\ldots,k\}$ for some integer $k$ . To maximize $\frac{1}{\beta}\log\mathrm{E}[S_{0}^{\beta}]$ , the following procedure can be used: for every $\mathcal{J}$ with the above structure, compute the corresponding $b$ according to (33)–(36); and from these $b$ ’s, take one that maximizes $\frac{1}{\beta}\log\mathrm{E}[S_{0}^{\beta}]$ . This procedure leads to an optimal solution: an optimal solution $b^{*}$ exists because we are optimizing a continuous function over a compact set, and $b^{*}$ corresponds to a set $\mathcal{J}$ that will be considered by the procedure.

IV Proofs

Proof of Theorem 1.

We first show the maximization claim. The only term on the RHS of (9) that depends on $b$ is $-D_{1-\beta}(g\|b)$ . Because $1-\beta>0$ , this term is maximized if and only if $b=g$ [10, Theorem 8].

We now show (9). By the definition of $S$ ,

[TABLE]

For every $i\in\{1,\ldots,m\}$ ,

[TABLE]

where (39) follows from (10). From (37) and (39) we obtain

[TABLE]

where (41) follows from identifying the Rényi divergence ( $g$ and $b$ are probability vectors); (42) follows from (7) and (8); and (43) follows from identifying the Rényi divergence ( $p$ and $r$ are probability vectors). This proves (9). ∎

Proof of Proposition 2.

Equation (11) holds because

[TABLE]

where (44) follows from the definition of $S$ , and (45) holds because in the limit as $\beta$ tends to zero, the power mean tends to the geometric mean since $p$ is a probability vector [3, Problem 8.1]. Equation (12) is proved in [12, Section 10.3]. ∎

Proof of Proposition 3.

Inequality (13) holds because

[TABLE]

where (48) follows from the definition of $S$ ; (49) holds because $b_{i}\in[0,1]$ and $\beta\geq 1$ ; and (51) holds because $b$ is a probability vector. It is easy to see that (13) holds with equality if $b$ is chosen according to (14). ∎

Proof of Proposition 4.

Equation (16) holds because

[TABLE]

where (53) follows from the definition of $S$ , and (54) holds because in the limit as $\beta$ tends to $+\infty$ , the power mean tends to the maximum since $p$ is a probability vector [3, Chapter 8]. Inequality (17) holds because $b_{i}\leq 1$ for $i\in\{1,\ldots,m\}$ . It is easy to see that (17) holds with equality if $b$ is chosen according to (18). ∎

Proof of Proposition 5.

Equation (19) holds because

[TABLE]

where (55) follows from the definition of $S$ , and (56) holds because in the limit as $\beta$ tends to $-\infty$ , the power mean tends to the minimum since $p$ is a probability vector [3, Chapter 8].

We show (20) by contradiction. Assume that there exists a probability vector $b$ such that $\min_{i\in\{1,\ldots,m\}}b_{i}\hskip 0.82993pto_{i}>c$ , i.e.,

[TABLE]

for all $i\in\{1,\ldots,m\}$ . Then,

[TABLE]

where (58) holds because $b$ is a probability vector; (59) follows from (57); and (60) follows from the definition of $c$ . Because $1>1$ is impossible, such a $b$ cannot exist, which proves (20).

It is easy to see that (20) holds with equality if $b_{i}=c/o_{i}$ for all $i\in\{1,\ldots,m\}$ . Conversely, if (20) holds with equality, then for all $i\in\{1,\ldots,m\}$ ,

[TABLE]

We claim that (61) holds with equality for all $i\in\{1,\ldots,m\}$ . Indeed, if this were not the case, then there would exist a $j\in\{1,\ldots,m\}$ for which $b_{j}\hskip 0.82993pto_{j}>c$ , so (58)–(60) would hold, which would lead to a contradiction. Hence, if (20) holds with equality, then $b_{i}=c/o_{i}$ for all $i\in\{1,\ldots,m\}$ . ∎

Proof of Theorem 6.

We first show the maximization claim. The only term on the RHS of (24) that depends on $b_{X|Y}$ is $-D_{1-\beta}(g_{X|Y}\hskip 0.82993ptg_{Y}\|b_{X|Y}\hskip 0.82993ptg_{Y})$ . Because $1-\beta>0$ , this term is maximized if and only if $b_{X|Y}\hskip 0.82993ptg_{Y}=g_{X|Y}\hskip 0.82993ptg_{Y}$ [10, Theorem 8]. By our assumptions that $p(y)>0$ for all $y\in\mathcal{Y}$ and $o(x)>0$ for all $x\in\mathcal{X}$ , we have $g(y)>0$ for all $y\in\mathcal{Y}$ . Consequently, $b_{X|Y}\hskip 0.82993ptg_{Y}=g_{X|Y}\hskip 0.82993ptg_{Y}$ if and only if $b_{X|Y}=g_{X|Y}$ .

We now show (24). By the definition of $\tilde{S}$ ,

[TABLE]

From (25) and (26) we obtain that for every $(x,y)\in\mathcal{X}\times\mathcal{Y}$ ,

[TABLE]

Now, (24) holds because

[TABLE]

where (66) follows from plugging (63) into (62) and using the fact that $g(y)=g(y)^{1-\beta}\hskip 0.82993ptg(y)^{\beta}$ ; (66) follows from (22); and (66) follows from identifying the conditional Rényi divergence and the (unconditional) Rényi divergence. ∎

Proof of Proposition 7.

We first show (27). If $\alpha\in(0,1)$ , then Hölder’s inequality implies that for all $y\in\mathcal{Y}$ ,

[TABLE]

The RHS of (67) equals one, so

[TABLE]

which implies (27) because $\frac{\alpha}{\alpha-1}<0$ . If $\alpha>1$ , then the inequalities in (67) and (68) are reversed; since now $\frac{\alpha}{\alpha-1}>0$ , (27) holds also in this case.

We now show (28). If $\alpha>1$ , then (28) holds because

[TABLE]

where (69) follows from Jensen’s inequality because $z\mapsto z^{\frac{1}{\alpha}}$ is a concave function on $\mathbb{R}_{\geq 0}$ , and (70) holds because $p(y)=p(y)^{\alpha}\hskip 0.82993ptp(y)^{1-\alpha}$ . If $\alpha\in(0,1)$ , then $z\mapsto z^{\frac{1}{\alpha}}$ is convex, so Jensen’s inequality is reversed; because $\frac{\alpha}{\alpha-1}<0$ , (69) and thus (28) hold also in this case. ∎

Proof of Proposition 8.

If $\alpha>1$ , then (29) holds because

[TABLE]

where (73) follows from the Minkowski inequality [4, III 2.4 Theorem 9]. If $\alpha\in(0,1)$ , then the Minkowski inequality is reversed; since now $\frac{\alpha}{\alpha-1}<0$ , (73) and thus (29) hold also in this case. ∎

Proof of Proposition 9.

Set $b_{0}^{\prime}=0$ and $b_{i}^{\prime}=r_{i}\hskip 0.82993ptb_{0}+b_{i}$ for all $i\in\{1,\ldots,m\}$ . Then, $\sum_{i=0}^{m}b_{i}^{\prime}=1$ , and for $i\in\{1,\ldots,m\}$ ,

[TABLE]

where (76) holds because $c\geq 1$ . It is not difficult to see that (76) implies (31). ∎

Proof of Theorem 10.

In the Appendix. ∎

Bibliography13

The reference list from the paper itself. Each links out to its DOI / PubMed record.

1[1] J. L. Kelly, “A new interpretation of information rate,” Bell Syst. Tech. J. , vol. 35, no. 4, pp. 917–926, Jul. 1956.
2[2] T. M. Cover and J. A. Thomas, Elements of Information Theory . 2nd ed. Hoboken, NJ, USA: John Wiley & Sons, 2006.
3[3] J. M. Steele, The Cauchy–Schwarz Master Class . Cambridge: Cambridge Univ. Press, 2004.
4[4] P. S. Bullen, Handbook of Means and Their Inequalities . Dordrecht, The Netherlands: Kluwer Academic Publishers, 2003.
5[5] L. L. Campbell, “A coding theorem and Rényi’s entropy,” Inf. Control , vol. 8, no. 4, pp. 423–429, Aug. 1965.
6[6] L. L. Campbell, “Definition of entropy by means of a coding problem,” Z. Wahrscheinlichkeitstheorie verw. Geb. , vol. 6, no. 2, pp. 113–118, Jun. 1966.
7[7] N. Merhav, “On optimum strategies for minimizing the exponential moments of a loss function,” in Proc. 2012 IEEE Int. Symp. Inf. Theory , 2012, pp. 140–144.
8[8] A. N. Soklakov, “Economics of disagreement – financial intuition for the Rényi divergence,” 2018, ar Xiv:1811.08308 v 4.

TL;DR

Contribution

Findings

Abstract

Peer Reviews

Videos

Gambling and Rényi Divergence

Abstract

I Introduction

II Preliminaries

III Results

Theorem 1**.**

Proposition 2**.**

Proposition 3**.**

Proposition 4**.**

Proposition 5**.**

Theorem 6**.**

Proposition 7**.**

Proposition 8**.**

Proposition 9**.**

Theorem 10**.**

IV Proofs

Proof of Theorem 1.

Proof of Proposition 2.

Proof of Proposition 3.

Proof of Proposition 4.

Proof of Proposition 5.

Proof of Theorem 6.

Proof of Proposition 7.

Proof of Proposition 8.

Proof of Proposition 9.

Proof of Theorem 10.

Theorem 1.

Proposition 2.

Proposition 3.

Proposition 4.

Proposition 5.

Theorem 6.

Proposition 7.

Proposition 8.

Proposition 9.

Theorem 10.