Gambling and R\'enyi Divergence
C\'edric Bleuler, Amos Lapidoth, Christoph Pfister

TL;DR
This paper introduces a new family of utility functions for horse gambling, connecting optimal betting strategies to Re9nyi divergence, and extends the analysis to scenarios with side information, leading to a novel conditional divergence.
Contribution
It proposes a one-parameter utility family encompassing Kelly and expected-return criteria, linking them to Re9nyi divergence, and introduces a new conditional divergence for informed betting strategies.
Findings
Derived strategies that maximize the new utility functions.
Established the connection between optimal strategies and Re9nyi divergence.
Introduced a novel conditional Re9nyi divergence for side information scenarios.
Abstract
For gambling on horses, a one-parameter family of utility functions is proposed, which contains Kelly's logarithmic criterion and the expected-return criterion as special cases. The strategies that maximize the utility function are derived, and the connection to the R\'enyi divergence is shown. Optimal strategies are also derived when the gambler has some side information; this setting leads to a novel conditional R\'enyi divergence.
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Gambling and Rényi Divergence
Cédric Bleuler, Amos Lapidoth, and Christoph Pfister
Signal and Information Processing Laboratory
ETH Zurich, 8092 Zurich, Switzerland
Email: [email protected]; {lapidoth,pfister}@isi.ee.ethz.ch
Abstract
For gambling on horses, a one-parameter family of utility functions is proposed, which contains Kelly’s logarithmic criterion and the expected-return criterion as special cases. The strategies that maximize the utility function are derived, and the connection to the Rényi divergence is shown. Optimal strategies are also derived when the gambler has some side information; this setting leads to a novel conditional Rényi divergence.
I Introduction
Consider a horse race with horses , where the -th horse wins with probability , and on which a bookie offers odds for . A gambler spends all her wealth to place bets on the horses. Let denote the fraction of that the gambler bets on the -th horse. Let the random variable denote the winning horse, and define the wealth relative as
[TABLE]
so the gambler’s wealth after one race is .
Kelly [1] observed that in the setting where the odds and winning probabilities remain constant over many independent races and the gambler keeps investing all her wealth with the same relative allocation , the exponential rate of growth of the gambler’s wealth tends to with probability one, i.e.,
[TABLE]
where denotes the gambler’s wealth after horse races, and denotes the base-2 logarithm. The RHS of (2) is known as the doubling rate [2, Section 6.1].
In this paper, we seek betting strategies that maximize
[TABLE]
where is a parameter. This family of utility functions generalizes several important cases:
- a)
In the limit as tends to zero, tends to the doubling rate , and we recover Kelly’s result: irrespective of the odds, the optimal strategy is proportional betting, i.e., choosing for ; see Proposition 2. 2. b)
If , then maximizing is equivalent to maximizing , the expected return, and it is optimal to put all the money on a horse that maximizes ; see Proposition 3. 3. c)
In general, if , then it is optimal to put all the money on one horse; see Proposition 3. This is risky: if that horse loses, the gambler will be broke. 4. d)
In the limit as tends to , it is optimal to put all the money on a horse that maximizes , ignoring the winning probabilities. This strategy maximizes the best-case payoff; see Proposition 4. 5. e)
In the limit as tends to , it is optimal to choose for , where is the normalizing constant defined in (7) ahead. This strategy maximizes the worst-case payoff and is completely risk-free: irrespective of which horse wins, ; see Proposition 5.
Our utility function has the following underlying structure: it is the logarithm of a (weighted) power mean [3, 4]:
[TABLE]
For , the power mean is equal to the minimum, the geometric mean, the arithmetic mean, and the maximum of the set , respectively. Campbell [5, 6] used a cost function with a structure similar to (4) to provide an operational meaning to the Rényi entropy in source coding. Other information-theoretic examples of exponential moments were studied in [7]. The utility function can be motivated by risk aversion models in finance theory [8, (8)].
Our main result is Theorem 1, which shows that for , can be written as the sum of three terms; the central role is played by the Rényi divergence. After dealing with the other values of , we treat in Theorem 6 the situation where the gambler, prior to placing her bets, observes some side information. This analysis features a novel conditional Rényi divergence, whose properties are studied in Propositions 7 and 8. In Proposition 9 and Theorem 10, we study the situation where the gambler invests only part of her money.
The rest of this paper is structured as follows: In Section II, we recall the Rényi divergence and define a conditional Rényi divergence, and in Section III, we present our results; all proofs are deferred to Section IV.
II Preliminaries
The following definitions are for probability mass functions (PMFs); the definitions for probability vectors are analogous. When clear from the context, we often omit sets and subscripts: for example, we write for and for . The Rényi divergence of order between two PMFs and [9] is defined for positive other than one as
[TABLE]
Its properties are studied in [10].
Let be a PMF, and let and be conditional PMFs. We define the conditional Rényi divergence of order for positive other than one as
[TABLE]
This definition differs from other definitions of the conditional Rényi divergence [11, (6) and (8)]. Some of its properties are presented in Propositions 7 and 8 ahead.
III Results
We first analyze the situation where the gambler invests all her money, i.e., where is a probability vector. (A probability vector is a vector with nonnegative components that add up to one.) As in [12, Section 10.3], define
[TABLE]
the probability vector , and the probability vector , where for ,
[TABLE]
Theorem 1**.**
Let , and let be a probability vector. Then,
[TABLE]
where for ,
[TABLE]
Thus, the choice uniquely maximizes among all probability vectors .
We see from Theorem 1 that if , then our utility function can be written as the sum of three terms:
The first term, , depends only on the odds and is related to the fairness of the odds. The odds are called subfair if , fair if , and superfair if . 2. 2.
The second term, , is related to the bookie’s estimate of the winning probabilities. It is zero if and only if the odds are inversely proportional to the winning probabilities. 3. 3.
The third term, , is related to the gambler’s estimate of the winning probabilities. It is zero if and only if is equal to .
Proposition 2**.**
Let be a probability vector. Then,
[TABLE]
We see from Proposition 2 that in the limit as tends to zero, the doubling rate is recovered from our utility function. Here, the analog of (9) is (12); note that (12) implies that is maximized if and only if is equal to .
Proposition 3**.**
Let , and let be a probability vector. Then,
[TABLE]
Equality in (13) can be achieved by choosing
[TABLE]
where is such that
[TABLE]
We see from Proposition 3 that if , then it is optimal to bet on a single horse. Unless , this is not the case when : When , an optimal betting strategy requires placing a bet on every horse. This follows from Theorem 1 and our assumption that and are all positive.
Proposition 4**.**
Let be a probability vector. Then,
[TABLE]
Equality in (17) can be achieved by choosing
[TABLE]
where is such that .
Proposition 5**.**
Let be a probability vector. Then,
[TABLE]
Equality in (20) is achieved if and only if for all .
Our next result concerns the situation where the gambler observes some side information before placing her bets. To that end, we adapt our notation as follows: Let be the joint PMF of and . (Recall that denotes the winning horse.) Denote the range of and by and , respectively. We assume that for all . (Here, we do not assume that the winning probabilities are positive.) We view the odds as a function . Define
[TABLE]
and the PMF for as
[TABLE]
(These definitions are equivalent to (7) and (8), respectively.) We continue to assume that the gambler invests all her wealth, so a betting strategy is now a conditional PMF . The wealth relative is defined as
[TABLE]
The following theorem parallels Theorem 1:
Theorem 6**.**
Let . Then,
[TABLE]
where for and ,
[TABLE]
Thus, choosing uniquely maximizes among all conditional PMFs .
The conditional Rényi divergence appearing in Theorem 6 was defined in Section II and seems to be novel. It is easy to see that if , , and are PMFs. We now present some more properties:
Proposition 7**.**
Let , let be a PMF, and let and be conditional PMFs. Then,
[TABLE]
Because everything that can be achieved without side information can also be achieved with side information, comparing Theorem 1 and Theorem 6 suggests that , which is indeed the case:
Proposition 8**.**
Let , let be a joint PMF, and let be a PMF. Then,
[TABLE]
Our last results treat the possibility that the gambler does not invest all her wealth. (We only treat the setting without side information.) Denote by the fraction of her wealth that the gambler does not use for betting. Then, is a probability vector, and the wealth relative is given by
[TABLE]
If , then it is optimal to invest all the money:
Proposition 9**.**
Assume , let , and let be a probability vector with wealth relative . Then, there exists a probability vector with wealth relative satisfying and
[TABLE]
On the other hand, if the odds are subfair, i.e., if , then investing all the money is not optimal in the case , as Claim 3 of the following theorem shows:
Theorem 10**.**
Assume , let , and let be a probability vector that maximizes among all probability vectors . Define
[TABLE]
and for ,
[TABLE]
Then, the following claims hold:
The quantity is well-defined and satisfies . 2. 2.
For all ,
[TABLE] 3. 3.
The quantity satisfies
[TABLE]
In particular, .
Claim 2 implies that for all , if and only if . Assuming without loss of generality that , the set thus has a special structure: it is either empty or equal to for some integer . To maximize , the following procedure can be used: for every with the above structure, compute the corresponding according to (33)–(36); and from these ’s, take one that maximizes . This procedure leads to an optimal solution: an optimal solution exists because we are optimizing a continuous function over a compact set, and corresponds to a set that will be considered by the procedure.
IV Proofs
Proof of Theorem 1.
We first show the maximization claim. The only term on the RHS of (9) that depends on is . Because , this term is maximized if and only if [10, Theorem 8].
We now show (9). By the definition of ,
[TABLE]
For every ,
[TABLE]
where (39) follows from (10). From (37) and (39) we obtain
[TABLE]
where (41) follows from identifying the Rényi divergence ( and are probability vectors); (42) follows from (7) and (8); and (43) follows from identifying the Rényi divergence ( and are probability vectors). This proves (9). ∎
Proof of Proposition 2.
Equation (11) holds because
[TABLE]
where (44) follows from the definition of , and (45) holds because in the limit as tends to zero, the power mean tends to the geometric mean since is a probability vector [3, Problem 8.1]. Equation (12) is proved in [12, Section 10.3]. ∎
Proof of Proposition 3.
Inequality (13) holds because
[TABLE]
where (48) follows from the definition of ; (49) holds because and ; and (51) holds because is a probability vector. It is easy to see that (13) holds with equality if is chosen according to (14). ∎
Proof of Proposition 4.
Equation (16) holds because
[TABLE]
where (53) follows from the definition of , and (54) holds because in the limit as tends to , the power mean tends to the maximum since is a probability vector [3, Chapter 8]. Inequality (17) holds because for . It is easy to see that (17) holds with equality if is chosen according to (18). ∎
Proof of Proposition 5.
Equation (19) holds because
[TABLE]
where (55) follows from the definition of , and (56) holds because in the limit as tends to , the power mean tends to the minimum since is a probability vector [3, Chapter 8].
We show (20) by contradiction. Assume that there exists a probability vector such that , i.e.,
[TABLE]
for all . Then,
[TABLE]
where (58) holds because is a probability vector; (59) follows from (57); and (60) follows from the definition of . Because is impossible, such a cannot exist, which proves (20).
It is easy to see that (20) holds with equality if for all . Conversely, if (20) holds with equality, then for all ,
[TABLE]
We claim that (61) holds with equality for all . Indeed, if this were not the case, then there would exist a for which , so (58)–(60) would hold, which would lead to a contradiction. Hence, if (20) holds with equality, then for all . ∎
Proof of Theorem 6.
We first show the maximization claim. The only term on the RHS of (24) that depends on is . Because , this term is maximized if and only if [10, Theorem 8]. By our assumptions that for all and for all , we have for all . Consequently, if and only if .
We now show (24). By the definition of ,
[TABLE]
From (25) and (26) we obtain that for every ,
[TABLE]
Now, (24) holds because
[TABLE]
where (66) follows from plugging (63) into (62) and using the fact that ; (66) follows from (22); and (66) follows from identifying the conditional Rényi divergence and the (unconditional) Rényi divergence. ∎
Proof of Proposition 7.
We first show (27). If , then Hölder’s inequality implies that for all ,
[TABLE]
The RHS of (67) equals one, so
[TABLE]
which implies (27) because . If , then the inequalities in (67) and (68) are reversed; since now , (27) holds also in this case.
We now show (28). If , then (28) holds because
[TABLE]
where (69) follows from Jensen’s inequality because is a concave function on , and (70) holds because . If , then is convex, so Jensen’s inequality is reversed; because , (69) and thus (28) hold also in this case. ∎
Proof of Proposition 8.
If , then (29) holds because
[TABLE]
where (73) follows from the Minkowski inequality [4, III 2.4 Theorem 9]. If , then the Minkowski inequality is reversed; since now , (73) and thus (29) hold also in this case. ∎
Proof of Proposition 9.
Set and for all . Then, , and for ,
[TABLE]
where (76) holds because . It is not difficult to see that (76) implies (31). ∎
Proof of Theorem 10.
In the Appendix. ∎
The reference list from the paper itself. Each links out to its DOI / PubMed record.
- 1[1] J. L. Kelly, “A new interpretation of information rate,” Bell Syst. Tech. J. , vol. 35, no. 4, pp. 917–926, Jul. 1956.
- 2[2] T. M. Cover and J. A. Thomas, Elements of Information Theory . 2nd ed. Hoboken, NJ, USA: John Wiley & Sons, 2006.
- 3[3] J. M. Steele, The Cauchy–Schwarz Master Class . Cambridge: Cambridge Univ. Press, 2004.
- 4[4] P. S. Bullen, Handbook of Means and Their Inequalities . Dordrecht, The Netherlands: Kluwer Academic Publishers, 2003.
- 5[5] L. L. Campbell, “A coding theorem and Rényi’s entropy,” Inf. Control , vol. 8, no. 4, pp. 423–429, Aug. 1965.
- 6[6] L. L. Campbell, “Definition of entropy by means of a coding problem,” Z. Wahrscheinlichkeitstheorie verw. Geb. , vol. 6, no. 2, pp. 113–118, Jun. 1966.
- 7[7] N. Merhav, “On optimum strategies for minimizing the exponential moments of a loss function,” in Proc. 2012 IEEE Int. Symp. Inf. Theory , 2012, pp. 140–144.
- 8[8] A. N. Soklakov, “Economics of disagreement – financial intuition for the Rényi divergence,” 2018, ar Xiv:1811.08308 v 4.
