Prediction with Expert Advice: a PDE Perspective
Nadejda Drenska, Robert V. Kohn

TL;DR
This paper models online prediction with expert advice as a zero-sum game and characterizes its value through a nonlinear PDE, providing a continuum perspective and revealing optimal strategies for both predictor and adversary.
Contribution
It introduces a PDE-based framework for analyzing online prediction with expert advice, connecting game theory with optimal control and continuum limits.
Findings
Game value characterized as viscosity solution of a nonlinear PDE
Optimal strategies for predictor and adversary derived from PDE analysis
Provides a continuum perspective linking discrete prediction games to PDEs
Abstract
This work addresses a classic problem of online prediction with expert advice. We assume an adversarial opponent, and we consider both the finite-horizon and random-stopping versions of this zero-sum, two-person game. Focusing on an appropriate continuum limit and using methods from optimal control, we characterize the value of the game as the viscosity solution of a certain nonlinear partial differential equation. The analysis also reveals the predictor's and the opponent's minimax optimal strategies. Our work provides, in particular, a continuum perspective on recent work of Gravin, Peres, and Sivan (Proc SODA 2016). Our techniques are similar to those of Kohn and Serfaty (Comm Pure Appl Math 2010), where scaling limits of some two-person games led to elliptic or parabolic PDEs.
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Prediction with Expert Advice: a PDE
Perspective111This research was partially supported by NSF grant DMS-1311833.
Nadejda Drenska222Department of Mathematics, University of Minnesota; [email protected]. This work is a refinement of the first author’s PhD thesis, A PDE Approach to a Prediction Problem Involving Randomized Strategies, NYU, 2017. and Robert V. Kohn333Courant Institute of Mathematical Sciences, New York University; [email protected]
This work addresses a classic problem of online prediction with expert advice. We assume an adversarial opponent, and we consider both the finite-horizon and random-stopping versions of this zero-sum, two-person game. Focusing on an appropriate continuum limit and using methods from optimal control, we characterize the value of the game as the viscosity solution of a certain nonlinear partial differential equation. The analysis also reveals the predictor’s and the opponent’s minimax optimal strategies. Our work provides, in particular, a continuum perspective on recent work of Gravin, Peres, and Sivan (Proc SODA 2016). Our techniques are similar to those of Kohn and Serfaty (Comm Pure Appl Math 2010), where scaling limits of some two-person games led to elliptic or parabolic PDEs.
1 Introduction
Our work addresses a problem involving ‘prediction with expert advice.’ This is a well-established framework in which a player tries to use ‘expert advice’ to invest optimally (for the worst case scenario) against an adversarial market. The measure of effectiveness of the player’s strategy is regret minimisation: performance under the metric of ‘regret’, or distance between a player’s performance and that of the (retrospectively) best-performing ‘expert’. We use linear regret, in other words the difference between a player’s loss and an expert’s loss. Here, ‘prediction’ is not about modelling a time series probabilistically; instead, the player tries to synthesise the advice of the experts in a way that guarantees good performance in a worst case setting.
We consider the following setup. There are two entities – a ’player’ and a ’market’ – and a fixed number of ’experts’. The market chooses which experts win or lose at every time step. The player chooses which expert to listen to at each time step. The two entities’ optimal strategies are mixed, i.e. the strategies involve probability distributions over the space of available outcomes. The player’s goal is to accumulate overall winnings as close as possible to those of the best performing expert at the ’end’ of the game (assuming that the market works against the player). There are two variants: one with a fixed stopping time (’the finite horizon problem’) and one where the stopping time is random with a constant probability of stopping at every time step (’the geometric stopping problem’). The goal in each variant is to identify the optimal strategies of the player and the market, as well as the associated value function.
The general approach is ‘numerical analysis in reverse’ – interpreting each discrete formulation as a numerical scheme for an appropriate nonlinear PDE. We prove that the solution to the discrete problem is asymptotically close to the unique viscosity solution of the PDE; as a result, knowledge of the PDE solution provides an indication about the optimal strategy for the discrete game. The ’finite horizon problem’ leads to a parabolic PDE, whereas the ’geometric stopping problem’ is associated to an elliptic PDE.
The overall outline of our analysis is as follows. Firstly, for each variant we define a discrete approximation scheme associated with a dynamic programming principle for the game. For the geometric stopping problem the existence of a solution to the scheme is nontrivial. Its construction relies on a time dependent problem which is run to equilibrium (or equivalently, a contraction mapping argument). For the finite horizon problem, existence of a solution to the scheme is easily established by induction. Convergence of each scheme is obtained through standard viscosity technology: the scheme is stable, monotone, and consistent, hence its solution converges to the unique viscosity solution of the PDE. (Our proof uses the framework of Barles and Souganidis [1], adjusted to accommodate the special features of our problem.) Finally, we give an explicit solution for the elliptic PDE associated to the geometric stopping problem with three experts (it is the continuous analogue of the solution obtained using discrete methods in Gravin, Peres, and Sivan’s paper [2]).
Our work shows that although online machine learning is not in any conventional sense a stochastic control problem, continuous methods are useful for its analysis (in much the same way that PDEs are useful for studying stochastic control). It should be noted that we are not the first to apply PDE methods to an online machine learning problem. Indeed, Kangping Zhu’s thesis [3] used PDE methods to achieve a similar goal in a somewhat different setting.
To put this work in context, we briefly review some of the machine learning literature on prediction with expert advice. Most of this work focuses on regret bounds (e.g. using specific strategies to prove upper bounds on the predictor’s regret). A prediction problem appears in Cover’s article [4] as far back as 1965, where he establishes an regret bound, where is the number of rounds played; Cover also solves the problem for . A classical treatment is available in Cesa-Bianchi and Lugosi’s book [5]; it outlines the theoretical foundation of the area and provides a self-contained treatment of many results, including an upper bound on the regret of order , proved using a well-chosen multiplicative weight algorithm. Some earlier, foundational works include Vovk’s [6] and Littlestone and Warmuth’s [7]; they introduced the weighted majority algorithm as a method the predictor can use to weight the experts’ bids. Haussler et all [8] achieve a regret bound in the case of absolute loss. Abernethy et al [9] consider a game played until a fixed number of losses is incurred by an expert. Luo and Schapire [10] investigate a version of the game with a randomly chosen final time. In [11] Rakhlin et al. present algorithms using ”random play out”. A recent paper by Gravin, Peres, and Sivan [2] analyzes the same problems that we consider here. That work uses discrete methods and connections to random walks; ours can be viewed as providing its continuous-time analogue. For more detail on the relationship between our work and [2], see Subsection 3.5. Our PDE characterization of the value function has already seen an interesting application: in [12], Bayraktar et al use it to obtain an explicit solution for the geometric stopping version of the game with experts.
There are other instances in the literature where scaling limits of multistep decision processes lead to parabolic or elliptic PDEs. For example, the work of Kohn and Serfaty on two-person game interpretations of motion by curvature [13] and many other PDE problems [14] has this character. So does the work of Peres, Sheffield, Schramm, and Wilson connecting the ‘tug-of-war’ game to the infinity-Laplacian [15] and the p-Laplacian [16] (this work has seen many extensions, e.g. [17], [18], [19], [20]).
A particular advantage of our treatment is that it is not limited to the classical payoff function in the online machine learning literature, namely regret with respect to the best expert , where is regret with respect to expert . In fact, it works for a more general class of payoff functions, namely functions that are globally Lipschitz continuous, non-decreasing, symmetric in their dependent variables , have linear growth at , and satisfy . Different choices of represent generalizations of the classic linear regret performance measure. We prove results for the general class of payoff functions described above; we restrict to only to find the explicit solution of the elliptic case.
The outline of this paper is as follows. In section 2 we introduce notation and the discrete formulation of the problem we wish to solve, as well as the dynamic programming principle (DPP) for each case. In section 3 we derive heuristically the associated PDEs. In section 4 we prove that both in the finite horizon and in the geometric stopping cases the discrete dynamic programming principle introduced in section 3 has a unique at most linear growth solution. In section 5 we cite results showing that each of our PDEs has a unique solution among functions with at most linear growth. In section 6 we relate the discrete solutions to the solutions of the PDEs by proving that the solutions of the appropriately scaled DPP solve the appropriate PDE in the limit . In section 7 we investigate the particular case of experts in the geometric stopping problem, and provide an explicit formula for the solution of the PDE.
2 Notation and Formulation
In this section we introduce our notation and formulate the two variants of our problem. We start in 2.1 with the basic setup; subsections 2.2 and 2.3 present the two classical variants of the game (described in detail, for example, in [2]). Lastly, in 2.4 we present the scaled variants of the game.
2.1 Notation
We will be considering a game with randomized strategies but let us focus on a non-probabilistic set up first. There are two entities - a ’market’ and a ’player’ – as well as experts denoted by . The game is played for rounds (in the ’finite horizon’ problem), or else with a random stopping time (using a fixed probability of stopping at each time step – we call this the ’geometric stopping’ problem). At each round , every expert makes a prediction (say, whether stock will go up or down), and the player chooses to follow a particular expert, say the th one. The market determines the gains of each expert ( if expert made an accurate prediction at round and otherwise). Then the outcomes of the player and the market are revealed. We denote by the player’s ’regret with respect to expert ’; this is, by definition, expert ’s cumulative gains minus the player’s cumulative gains. Thus the increment of at time is
[TABLE]
if the player follows expert .
The game we study is similar to the one just described, except that the player and the market choose randomized strategies:
- •
At each step , without knowing the player’s move, the market chooses a probability distribution , over all the possible outcomes for the experts, which we represent by vectors . (An outcome is thus a choice of the subset of experts making correct predictions; for example, if all the experts are correct then is the vector of ones.)
- •
Simultaneously, at every turn without knowing the market’s move, the player chooses a probability distribution over the experts, i.e. a vector , where and . Its meaning is that the player follows expert at time with probability (obtaining the same outcome as expert , namely ).
- •
The player seeks to maximize (and the market seems to minimize) the expected final-time regret (the expectation being taken with respect to probabilities associated with the randomized strategies).
The state variables for this game are the player’s regret with respect to the expert, meaning expert’s gain minus player’s gain. At risk of redundancy, we emphasize that market and the player know they are playing against each other, and this influences their optimal strategies. The player chooses the probability distribution on so as to minimize her expected regret at the end of the game; meanwhile the market chooses the probability distribution which maximizes expected regret at the end of the game. These distributions are not fixed throughout the game and will depend on various unknowns, and on which version of the game is being considered (the ’finite horizon’ version or the ’geometric stopping’ one).
For notational convenience, whenever we look at the player’s optimization subject to and , we will write this choice as . Similarly, whenever the market chooses an optimal probability distribution on the set of all possible choices , we denote the market’s maximization with . We write for the expected value over the mixed strategies. Lastly, whenever the market chooses a probability distribution on the set of all possible choices , subject to the condition of ’balance’, i.e.
[TABLE]
we denote this by .
As the final time measure of regret, we consider an arbitrary function that satisfies the following properties:
[TABLE]
One such function is .
2.2 The Finite Horizon Problem
The finite horizon problem is to determine the player’s expected regret (the value function of the game) and the associated optimal strategies for both the player and the market, provided that the game ends at an a priori fixed time and starts at time such that with initial regret vector . One can write the value function through a dynamic programming principle (DPP): it is the expected payoff at final time, provided the player and the market play optimally against each other, in particular doing the best that could be done after one time step. Through the dynamic programming principle, the discrete finite horizon formulation becomes:
[TABLE]
2.3 The Geometric Stopping Problem
The geometric stopping problem is to determine the player’s expected regret (the value function) provided the game starts at regret vector . The game either stops with probability , , in which case the payoff is ; or else it continues, with probability , for at least one more round, with player and market playing against each other optimally. One can thus express the value function through a DPP:
[TABLE]
Observe that there is no time-dependence in this case. (The probability of stopping, , is assumed constant, i.e. independent of time).
2.4 The Scaled Games
Since we are interested in the behavior of the games over long periods of time, we consider scaled versions of them. For the finite horizon problem we scale spatial steps to be [math] and (instead of [math] and ) and time steps to be (instead of ), so the game is played for steps. The reason for this scaling is that we expect to obtain a parabolic PDE in the limit. Then, the analogue of equation (2.7) is:
[TABLE]
For the geometric stopping case (2.8) we observe that the expected number of rounds until stopping is , since the probability of stopping after any step is . We choose, just as in the previous case, to have spatial steps , and a typical number of steps of order , hence we choose . The analogue of (2.8) is thus:
[TABLE]
The goal of this work is to investigate the limiting behavior of the solutions of (2.9) and (2.10). A key observation is that the statements of the DPP, as are semi-discrete numerical schemes for corresponding PDEs. We prove that the solution of (2.9) converges to that of the parabolic problem
[TABLE]
as goes to [math], whereas the solution of (2.10) converges to that of
[TABLE]
as goes to
A central question is whether the scaled games are equivalent to the unscaled ones. Whenever satisfies , the answer is yes. In particular it is true for the classical choice of regret . For the finite horizon case, let the discrete-in-time, continuous-in-space function solve (2.7) and define
[TABLE]
It satisfies
[TABLE]
with at the final time. So if , is the solution of (2.9). The situation with the geometric stopping case is similar. We scale
[TABLE]
and take in (2.8). Then solves
[TABLE]
Here, too, if , then solves the scaled DPP (2.10).
2.5 Balanced Strategies
The goal of this subsection is to prove that for finite, positive an optimal strategy of the market can be achieved using ’balanced strategies’ (to be explained in the lemma below). The argument for the following lemma generalizes an argument in [2].
Lemma 1**.**
Let be a function satisfying the following properties:
* is monotone nondecreasing in each * 2. 2.
* for all .*
Then, the market has at least one optimal strategy for
[TABLE]
that is balanced in the sense that
[TABLE]
for all and .
Proof.
Firstly, we examine (2.13), calling it ‘W’. Then,
[TABLE]
Here is the probability that the player follows expert , is the market’s probability distribution on the expert’s outcomes, and The equalities above follow by the definition of expected value, using translation invariance (i.e. property 2 above) and the fact that .
Suppose there exists an optimal strategy for the market which is not balanced. We will construct an optimal strategy which is balanced. Since the market is unbalanced, there exists an expert with a largest expected value, say it is expert , i.e. . The expression (2.17) is a linear programming problem in and , so , i.e. the optimal strategies are unchanged if the player minimizes first. The player wants to minimize the second sum, because she has no influence over the first sum, so she may choose to follow expert , i.e. she may choose . Pick an expert such that ; to simplify notation, suppose and we shall write instead of . Then, consider the pair of market outcomes where the only difference is ’s value - 0 or 1. Observe that if the market increases the probability of a term where at the expense of a term where , he increases , since, by monotonicity
[TABLE]
By changing the probabilities of these two outcomes appropriately, the market obtains a strategy satisfying that is at least as good as the original one; note that the other expectations remain unchanged. Performing this operation for every such that , we obtain a balanced strategy for the market which performs at least as well as the original optimal one. ∎
3 Heuristic PDE Derivations
In this section we use the DPP formulation to derive, at least heuristically, the associated PDEs. First we consider the geometric stopping case, then the finite horizon case.
3.1 The PDE for Geometric Stopping Case
We ‘derive’ formally a limiting elliptic PDE. This derivation makes assumptions on the behavior of , for example sufficient smoothness. For now the derivation is heuristic, but later on it will be justified, in the sense that we will prove that this game is a convergent numerical scheme for the PDE. Substituting the Taylor expansion of into the DPP (2.10) gives
[TABLE]
As , the dominating term in the is , so we focus on it:
[TABLE]
The equality follows by linearity, inner product definition, rearranging, change of summation, and the fact that is a probability distribution. We focus on the expression on the last line:
[TABLE]
This expression is a pair of dual linear programs in min max form, with variables and , which represent the player’s and the market’s probability distributions, respectively. As such,
[TABLE]
We prove in Subsection 4.4 that satisfies the following properties: monotonicity in each variable and the translation property, i.e.
[TABLE]
Later on, we will prove that and thus inherits those properties. Moreover, we are assuming for this heuristic discussion that is differentiable, so monotonicity turns into , whereas differentiating translation invariance, we obtain
[TABLE]
We claim that
the player’s optimal strategy is ; 2. 2.
the market’s optimal strategy is any probability distribution satisfying for every such that ; and 3. 3.
the value of the minmax in (3.3) is [math].
To prove 1, we observe that if , then (3.3) is [math] for every choice of the market’s strategy . Suppose . Since , then there would exist a pair of indicies so that and The market can take advantage of this and put all the weight into , obtaining a positive contribution which is a worse outcome for the player. So the choice of is superior to the player’s other options.
To prove 2, we note that attains the minimum when at summands where . Using and , we obtain
[TABLE]
The maximal value the market can obtain is [math], achieved when
[TABLE]
for all indices such that . If the market doesn’t follow this strategy, the resulting value will be less than [math]. The proof of the claims is now complete.
Reviewing the preceding results, and assuming (as it seems natural) that for all , we see that the strategy of the player is fully determined:
[TABLE]
whereas the player influences (but doesn’t fully determine) market’s choices:
[TABLE]
The optimal value of the is [math], so the order term in the Taylor expansion vanishes. In order to obtain a PDE, we need to go to the second order of the Taylor expansion. We incorporate the knowledge of strategies of the player and the market by writing to indicate that is determined by (3.6) and is restricted to (3.7). Thus, we obtain:
[TABLE]
In the limit we obtain the equation
[TABLE]
where
[TABLE]
3.2 The PDE for the Finite Horizon Problem
Returning to the time dependent problem, we observe a lot of similarities. Again we start by substituting the Taylor expansion of into the DPP (2.9); this gives
[TABLE]
Again, as , the dominating term is . The analysis of this term done in subsection 3.1 applies here too. In particular, the ‘market indifference’ and the ‘balance’ conditions are the same. This leaves the same restrictions over the as in the previous case, hence the -order term has the same ‘balance’ condition as in the previous case. This yields the limiting equation
[TABLE]
for the operator defined by (3.9), with a final time condition
[TABLE]
3.3 The Operator
We need to understand the operator . Firstly, we investigate the expectation part. Let be the probability of a particular vector , and let . Then,
[TABLE]
where is the indicator function.
Substituting in , we obtain
[TABLE]
since takes the same value for and for (for any triplet ). In view of (3.12) we can treat as a probability distribution on pairs of complementary strategies. The restriction of ‘balance’ can be ignored, since if we choose and to have the same probability for every , then
[TABLE]
Recall that equation (3.5) holds:
[TABLE]
For any fixed we write this as
[TABLE]
and differentiate again to get
[TABLE]
Thus we obtain the equality
[TABLE]
which we will use in the following calculation of the sum on the right hand side of (3.12). For any fixed , let
[TABLE]
Then
[TABLE]
(by rearrangement of derivatives, combining terms, and observing that the sum of equals .) Returning now to (3.12), we have
[TABLE]
For the second line above we used that the probabilities sum up to 1, so the maximum linear combination, weighted by those probabilities, is achieved by assigning all the weight on the largest term.
In conclusion, the elliptic PDE (3.8) is
[TABLE]
and the parabolic PDE (3.11) is
[TABLE]
as announced earlier in (2.12) and (2.11).
The justification of our heuristic calculation, to be presented in Section 6, relies on the fact that our operator is degenerate elliptic. We check this now. Recall that, by definition, an operator is degenerate elliptic if
[TABLE]
when is non-negative, that is as matrices.
Lemma 2**.**
The operator
[TABLE]
is degenerate elliptic.
Proof.
Let . Then, for any we have . We take the maximum over the set of vectors such that : first on the left side, then on the right side, obtaining
[TABLE]
Finally, we multiply by to obtain the desired inequality
[TABLE]
∎
3.4 Optimal strategies
A remaining question is what the PDEs tell us about the optimal strategies for the player and the market. The answer lies (formally, at least) in the preceding calculation. Consider the elliptic PDE and suppose its solution is known and . Suppose the vector of regrets so far is . Then the best move of the player is to follow expert with probability
[TABLE]
In turn, the market looks for a (and its complement ) that saturates the maximum in
[TABLE]
Observe that by (3.12), saturates the maximum precisely when saturates the maximum. Having found , the market’s optimal strategy is this: with probability advance the experts such that , and with probability advance the rest of the experts, i.e. those for which . If is achieved for more than one pair of vectors and its complement , then the market’s strategy is not unique.
3.5 Comparison with paper [2] by Gravin, Peres, Sivan
Our work is closely related to paper [2] by Gravin, Peres, and Sivan. Briefly: this paper and [2] look at the same problem through different lenses. The fundamental difference is that we study a natural continuum limit, while they focus on the problem in its original discrete-time form. This leads to differences with respect to [2] in both the character of our results and the methods used to demonstrate them. Our rigorous results are mainly concerned with the value function, which we characterize as the unique viscosity solution of an appropriate PDE problem; in deriving these results, we also obtain heuristic guidance about how the optimal strategies are related to the solution of the PDE. In [2], by contrast, no PDE is studied; instead, the value of the game is studied using methods from random walks, combined with what an optimal control theorist would call “verification arguments.” Of course [2] also studies the form of the optimal strategies, and its conclusions are similar to ours. However our continuum viewpoint offers a different perspective, in which the main features of the optimal strategies are understood by considering a linear programming problem.
Another distinction from [2] is the choice of how to measure “regret.” Our methods permit treatment of the continuum problem with a relatively broad class of measures of regret: if is the player’s regret with respect to the th expert, we require mainly that be increasing in each variable, satisfy , and have linear growth at infinity. The paper [2], by contrast, focuses exclusively on the classic measure (i.e. the player’s shortfall compared to the best-performing expert).
There are, of course, many similarities and parallels between our work and [2]. In fact, our work began when we read [2] and realized that a continuum perspective might be of interest. A particular parallel is worth noting: our exact solution of the geometric stopping problem with 3 players and objective is the continuum analogue of a result proved in the discrete setting in [2]. (We found it by looking at the optimal strategies identified in [2] and considering their continuum analogues.)
4 The Games as Numerical Schemes for the PDEs
This section discusses the discrete solutions and . Concerning the former: even the existence of is not immediately obvious. We prove it (and obtain an estimate that is uniform in ) by representing the time-independent dynamic programming principle as a "numerical scheme for the PDE (2.12)" similar to those discussed by eg Oberman’s paper [21].
In this section we represent the time-independent discrete problem as a numerical scheme for the elliptic PDE (2.12). Throughout this section we follow the setup of Oberman’s paper [21] in discussing the scheme and showing that the DPP has a unique solution. In particular, all the definitions in this section are from [21], as well as adapted theorem statements and proofs. Our treatment differs from [21] in that we work with a scheme which is continuous, not discrete, in space.
This section also discusses the solution of the finite horizon problem. There the existence and uniqueness of are easily established, but we need to prove uniform estimates as .
4.1 Definitions of , , and Basic Properties
In writing the DPP, one considers a point and all its ‘neighbors’, which are of the form ; we write for the collection of all such neighbors as ranges over . We order the neighbors in some order, say increasing if were written in binary as a -letter word, to obtain neighbors , where ; altogether there are neighbors, where is the number of experts. From now on, we write . In particular, we use the convention that .
We consider the solution to the geometric stopping problem, which we rearrange by subtracting , combining all terms on one side, and dividing by :
[TABLE]
so
[TABLE]
Inspired by this rearrangement of the geometric DPP, we define the time-independent approximation scheme as , where
[TABLE]
Evidently, for any fixed the value of
[TABLE]
depends only on the values at , and its neighbors In the first argument refers to the function before , and the subsequent arguments , refer to the finite differences in the expected value terms.
We will prove that the scheme has a number of properties, whose analogues can be found in [21]:
Definition 1**.**
The scheme is proper if there exists such that for all and ,
[TABLE]
Definition 2**.**
The scheme is degenerate elliptic if the map
[TABLE]
is non-decreasing in each variable for all .
Definition 3**.**
The finite difference scheme is Lipschitz continuous if there exists a constant such that for all ,
[TABLE]
Lemma 3**.**
The scheme is proper and degenerate elliptic.
Proof.
The scheme is proper as .
The operator is degenerate elliptic as a of a positive linear combination of its -differences. Therefore, the scheme is degenerate elliptic: it is a sum of the function , the function , and a degenerate elliptic operator. ∎
Lemma 4**.**
The scheme is Lipschitz continuous with
Proof.
Firstly, observe that the sum of two Lipschitz continuous schemes is Lipschitz continuous. Since is Lipschitz continuous with a constant 1, we only need to find a Lipschitz constant C for the part of the scheme; then .
Define . Observe that is a linear combination of its independent variables with weights that are non-negative and sum up to , as the non-negative weights come from an expectation. Then, is Lipschitz continuous with constant . For any admissible vectors , we get the following sequence of inequalities:
[TABLE]
The same equality holds, of course, with and switched. Hence, is Lipschitz continuous with constant . This means that the overall Lipschitz constant is . ∎
We introduce some notation for the next lemma. Given , define , , . The following lemma is found in [21].
Lemma 5**.**
(ordered Lipschitz continuity property) Let be a Lipschitz continuous, degenerate elliptic scheme with Lipschitz constant . Then for any we have
[TABLE]
4.2 The Euler Map
We define the Euler map associated to our scheme .
Definition 4**.**
For , define the explicit Euler map by
[TABLE]
Intuitively: the scheme is a numerical approximation of an elliptic PDE, and the map is the time step map for an explicit discretization of the associated parabolic equation. The following theorem and its proof are found in [21].
Theorem 1**.**
Fix such that Then, the Euler map is monotone.
Proof.
Suppose . Then,
[TABLE]
The first inequality follows from the ordered Lipschitz continuity property. The second inequality follows from , and the last one from the assumption of the theorem. This establishes monotonicity. ∎
4.3 Properties of
We work with - a measure of regret and a Lipschitz continuous function which also satisfies properties (1.2 -1.6). One example of such a function is the classical
[TABLE]
which has discontinuous first derivatives, so we don’t want to assume that is smooth. We will need a smoothed version of . We define it using a mollifier , defined as:
[TABLE]
where the constant is chosen so that integrates to Our smoothed version of is
[TABLE]
The following specific properties of are easily verified:
[TABLE]
Now, we estimate the expectation term, when replaces . In order to do so, we use its Taylor expansion:
[TABLE]
Let us focus on the -order factor. Because of Lemma 1 it is sufficient to consider balanced strategies for the market. For such strategies we have
[TABLE]
So the order term is [math]. Then, we can bound the term with (using the uniform bound on ), obtaining
[TABLE]
We use this result in the following lemma.
Lemma 6**.**
The function is an almost-solution to the scheme, i.e. for some constant , independent of the small parameter .
Proof.
Let us bound the absolute value of the scheme at . We use the preceding estimate for
[TABLE]
This has the form we want:
[TABLE]
∎
4.4 Existence and Uniqueness of a Solution of
Theorem 2**.**
Fix so that Then, for some (independent of ) the Euler map is a strict contraction in the sup norm on a ball of size , centered at .
The proof of Theorem 2 is parallel to the proof of Theorem 7 from [21].
We now present the main result of this subsection:
Theorem 3**.**
The scheme has a unique solution in the class of functions such that is uniformly bounded on . Moreover the solution has the following properties:
There is a constant such that . 2. 2.
The function is monotone nondecreasing in each variable . 3. 3.
The function has the translation property, i.e.
[TABLE]
Proof.
Observe that is bounded if and only if is bounded. By theorem 2, is a strict contraction (with the maximum norm) on the set of functions for some . Here is a constant independent of . We realize that the assertion holds for all sufficiently large , independent of . By the contraction mapping theorem, has a unique fixed point in the set above. The solution is obtained by iterating (with sufficiently small) starting from arbitrary initial data in the ball about with radius . Being a fixed point, i.e. satisfying , is equivalent to satisfying , which is equivalent to satisfying the geometric dynamic programming principle. Therefore we see that the fixed point of , namely , is the desired solution of the scheme.
We already addressed the growth of our solution. As for monotonicity and translation invariance, we present the proofs in lemmas 8 and 9 below. ∎
Lemma 7**.**
The solution is symmetric, i.e. we can switch the values of every pair of spatial coordinates without changing the function’s value:
[TABLE]
Proof.
This is a consequence of uniqueness but for clarity we prove it using induction.
For simplicity of notation we prove the above claim for and . The proof goes by induction on the iterates of the Euler map . Consider any , small. Firstly, is symmetric, i.e. Next, suppose is symmetric, i.e. Then we observe that the function
[TABLE]
is also symmetric since experts and have symmetric roles in the game. Observe that the function above is simply equal to :
[TABLE]
Thus if is symmetric, then is symmetric. So we iterate applying the Euler map , starting from the symmetric . By theorem 3, the iterates of the Euler map converge to the unique solution to . We pass the symmetry property through the limit, obtaining that is symmetric. ∎
Lemma 8**.**
The solution is monotone, i.e. if , then
[TABLE]
This property follows for every coordinate, as the proof for all other coordinates is identical.
Proof.
The argument here is similar to the one in Lemma 7.
∎
Lemma 9**.**
The solution has the following property: for any , and any
[TABLE]
Proof.
The argument here is similar to the one in Lemma 7. ∎
4.5 Growth and Qualitative Behavior of the Solutions to the Finite Horizon Problem
In the previous subsection, we showed that the solution to the discrete geometric stopping problem has at most linear growth as . We now show that the discrete solution of the finite horizon problem also has at most linear growth in . This is achieved by the following theorem:
Theorem 4**.**
A solution to the time-dependent dynamic programming principle (2.9) exists and is unique. In addition, it satisfies
[TABLE]
with a constant that is independent of . Moreover,
* grows at most linearly as (with a bound that is uniform as )* 2. 2.
The function is monotone nondecreasing in each variable 3. 3.
The function satisfies translation invariance, i.e.
[TABLE]
Proof.
Existence and uniqueness follow directly from the dynamic programming principle: solutions are built one time step at a time: at levels . The proof of the estimate is by induction on the number of time steps. For , by definition and the bound is an immediate consequence of our choice of (a smoothed out version of , see 4.8). For the inductive step, suppose the bound holds at , i.e.
[TABLE]
Then, let us consider what happens at . The argument used to prove (4.11) shows that for the optimal choices of strategy by the market and the player, the following holds:
[TABLE]
We use this in the second line of the estimate:
[TABLE]
This concludes the inductive step. ∎
The symmetry, monotonicity, and translation invariance properties are easily established inductively, using arguments parallel to the one used for Lemma 7.
5 Review of Known Results about Viscosity Solutions of our PDEs
In section 3 we showed that the discrete solutions to the finite horizon and geometric stopping problems have at most linear growth as . We will prove in section 6 that the solutions converge as to the viscosity solution of the appropriate PDE. Since the discrete solutions have linear growth as (with a bound that is independent of ), we only need to concern ourselves with at most linear growth solutions to the PDEs.
The existence and uniqueness of viscosity solutions of our PDE’s (with at most linear growth at ) are well known. This short section provides the relevant definitions and results.
5.1 The Time Dependent Case
The following definitions are standard.
Definition 5**.**
A real-valued, lower-semicontinuous function defined for and is a viscosity supersolution of the final-value problem (2.11) if for any with and any smooth such that has a local minimum at we have
[TABLE]
and at the final time .
Definition 6**.**
A real-valued, upper-semicontinuous function defined for and is a viscosity subsolution of the final-value problem (2.11) if for any with and any smooth such that has a local maximum at we have
[TABLE]
and at the final time .
Definition 7**.**
A viscosity solution of the final-value problem (2.11) is a continuous function that is both a subsolution and a supersolution.
Theorem 5**.**
The final-value problem (2.11) - informally written as
[TABLE]
subject to - has a unique viscosity solution that grows at most linearly and is uniformly continuous. Moreover, if is a subsolution, and is a supersolution, then necessarily .
Proof.
The statement is a special case of theorem 2.1 in [22], applied backwards in time. ∎
5.2 The Stationary Case
Now we focus on viscosity solutions for the stationary equation. As before, the following definitions are well-known.
Definition 8**.**
A real-valued, lower-semicontinuous function defined for is a viscosity supersolution of the stationary problem (2.12) if for any and any smooth such that has a local minimum at we have
[TABLE]
Definition 9**.**
A real-valued, upper-semicontinuous function defined for is a viscosity subsolution of the stationary problem (2.12) if for any and any smooth such that has a local maximum at we have
[TABLE]
Definition 10**.**
A viscosity solution of (2.12) is a continuous function that is both a subsolution and a supersolution.
Theorem 6**.**
The stationary equation (2.12), informally written as
[TABLE]
has a unique viscosity solution that is uniformly continuous and grows at most linearly at infinity.
Proof.
We check that the conditions of Theorem 5.1 in [23] hold: is of at most linear growth. Moreover,
[TABLE]
is degenerate elliptic by the Lemma 2. This establishes the conditions of Theorem 5.1; we now conclude from [23] that the elliptic equation has a unique viscosity solution that grows at most linearly as . ∎
6 Convergence to the Viscosity Solution
In this section we show that the solutions of our discrete problems converge to the viscosity solution of our PDEs as . In order to do so, we follow the setup of Barles and Souganidis [1]. The essence of the Barles-Souganidis convergence result is that if an approximation scheme is monotone, stable, and consistent, then solutions converge as to the viscosity solution of the associated PDE. This section provides the argument in a self-contained form as it applies to our setting. Following standard notation, in the geometric stopping case we write
[TABLE]
The induction defining the finite-horizon problem solution can also be viewed as solving a ‘scheme’ and this viewpoint will be useful for analyzing the limit as . We define the finite horizon approximation scheme as:
[TABLE]
Following standard notation here too, we write this as
[TABLE]
6.1 Monotonicity
Definition 11**.**
A time-independent scheme is monotone if
[TABLE]
whenever for all , , , and .
A time-dependent scheme is monotone if
[TABLE]
whenever for all , , , , and .
Lemma 10**.**
Our schemes and are monotone.
Proof.
Firstly, let us prove the statement for the time-dependent scheme:
[TABLE]
The inequality follows from applying an expected value to and reversing signs.
Next, we prove the statement for the stationary scheme:
[TABLE]
The inequality follows from applying the expected value to . ∎
6.2 Main Result
As already mentioned, [1] shows that if a numerical scheme is stable, monotone, and consistent then its solutions converge to those of the associated PDE. In this paper stability is provided by Theorems 3 and 4, which proves uniform bounds on and (independent of ). The heuristic argument in Section 2 provides the essence of the argument for consistency (taking into account that and are increasing in each and satisfy the "translation property"). A more rigorous proof of consistency will be part of the proof of the following convergence theorem.
Theorem 7**.**
The unique solutions and of and converge to the unique solutions of (2.12) and (2.11), respectively.
Proof.
The first part of the proof follows [1] and [23]. We do the proof in the time dependent case (the stationary case is identical). We define , by
[TABLE]
and
[TABLE]
The functions and have the translation property and are monotone in each variable because the sequences have those properties. We prove that is a sub-solution (the proof that is a supersolution is completely parallel).
Consider , which touches at - a local maximum of ; we also assume (the other case is presented towards the end of this proof). To make notation simpler we can modify , (without loss of generality) so that (i) has a maximum at and (ii) .
We change coordinates so that is the projection of orthogonal to whereas , is the projection of onto . Since has the translation property, there is a unique function defined for such that .
We fix a and employ Theorem 3.2 from [23]. We obtain a sequence of functions with the following properties:
touches at near , so has a strict local maximum at and without loss of generality . 2. 2.
The first derivatives of at converge (as ) to the first derivatives of at . 3. 3.
The second derivative matrix of at (with respect to spatial variables ) converges to the matrix , which satisfies
[TABLE]
where is the Hessian of at in the variables and is a constant depending on only.
We extend to , and observe that has the same second derivatives as with respect to , as well as . We observe that
[TABLE]
by construction, regardless of location. Therefore, we can differentiate the above expression, obtaining, for every ,
[TABLE]
We will use this relation in the argument below.
We argue similarly to [1]. Consider
[TABLE]
which implies that touches whenever touches . In particular, touches at for any . Since has a local max at there exists a ball with radius , so that on the ball. Moreover, because we want the local maximum to be a global maximum, we can change so that
[TABLE]
outside the ball The second inequality is a consequence of theorem 4. The function is the smooth version of , introduced in subsection 4.3. After the adjustment of we obtain that is a global max of .
Since and has the translation property (i.e. ), we can obtain sequences and such that and
achieves its global max at 2. 2.
3. 3.
.
Denote Since we have global maxima, we obtain
[TABLE]
or equivalently
[TABLE]
We are prepared to use the properties of the scheme:
[TABLE]
The equalities follows from being a solution to the scheme, while the inequality follows from monotonicity with respect to the larger function . Now we take limits in order to apply consistency of the scheme:
[TABLE]
We begin with two observations. First, divided by the denominator is insignificant as it vanishes in the limit; thus the can (and will) be ignored in what follows. Our second observation is that the term
[TABLE]
can be simplified using translation invariance; in fact it can be rewritten into its PDE form in an entirely parallel fashion to the one used in the heuristic derivation found in subsection 2.3. In particular, its value depends only on the market’s choices (not the player’s choices).
Observe that the over all the player’s choices is less than or equal to the expression with a particular choice of the player. Thus
[TABLE]
is less than or equal to the value of
[TABLE]
when the player chooses the particular strategy
[TABLE]
Note that we use that and for . The equality comes from having the translation property; the inequalities follow by a standard argument from the facts that is nondecreasing in , and that has a local maximum at the point around which we perform the Taylor expansion.
The expression 6.6 seems to have a term proportional to , i.e.
[TABLE]
However, for the particular choice of values for this term vanishes as shown in subsection 3.1:
[TABLE]
Thus (6.6) is actually equal to :
[TABLE]
which using the arguments in subection 3.3 equals
[TABLE]
We conclude that:
[TABLE]
The equalities above essentially follow the heuristic argument in section 2: applying the definition, canceling terms, and Taylor expansion. The last inequality follows, because and have matching time derivatives by construction, and because the matrix comparison in 6.4 holds. In the expression above we may chose as small as we like; sending it to 0 completes the proof that is a supersolution for .
Finally, let us consider the final time for the time-dependent case. We need to show that . In fact, we will prove that . Because of the translation property, we can examine points such that , and a barrier function , such that
[TABLE]
for . Just as before, we extend and so that
[TABLE]
and
[TABLE]
Since
[TABLE]
we can focus on maximizing (and not ). We consider the half-space and let be the point where maximum of attains its max. We see that
[TABLE]
Moreover,
[TABLE]
Because of the above and , we see that
[TABLE]
Consider the maximum point . If , then we repeat the argument presented above for the interior case to get
[TABLE]
We restrict our attention to choices of so that . Then,
[TABLE]
which is a contradiction. Therefore, if with , then when and are sufficiently small. We have leftover to prove that ; in order to do that, by 6.8 it is enough to show that , when - sufficiently small. The proof is parallel to the one of the interior case. We use that
[TABLE]
and that has the translation property to obtain sequences and , for which and
- •
is maximized on at
- •
- •
If for infinitely many , we obtain equation (6.10), a contradiction. Hence, for all large we obtain , which implies . Combining with the fact that is continuous, we deduce that . This concludes the proof that is a subsolution.
As already mentioned, the proof that is a supersolution is parallel. The main difference is working with the optimal choice for the market instead of the player.
We would like to show that is the unique viscosity solution to the PDE (2.11). One inequality comes from comparison principle: since is a upper semicontinuous sub-solution, as we just proved, and is a lower semicontunous super-solution, then by comparison principle (Theorem 5) we obtain the desired inequality . The other inequality follows by the definition of and . Therefore , which is what we wanted to show. ∎
6.3 Consequences of the main result
We proved that and . As a result a lot of the properties of the solutions to the discrete problem are inherited.
Lemma 11**.**
The solution of the time-dependent problem (2.11) is symmetric, monotone, and translation invariant, ie if and - any constant, then
[TABLE]
Proof.
We observe that as by Theorem 7, so we pass the equality through the limit, obtaining in the end the desired identities. ∎
Lemma 12**.**
The solution of the elliptic PDE (2.12) is symmetric, monotone, and translation invariant, ie if and - any constant, then
[TABLE]
Proof.
We observe that as by theorem 7, so we pass the equality through the limit, obtaining in the end the desired identities. ∎
7 Exact Solution
It is natural to ask how the PDE might be used. We offer two simple applications in this section: an exact solution of the geometric stopping case for experts and a demonstration that the associated argument does not generalise straightforwardly to experts. (There is now an explicit solution for the geometric stopping case with experts [12]. Its derivation makes use of our PDE.)
7.1 The Geometric Stopping Case with
The following result is a continuous analogue of one in [2].
Theorem 8**.**
The solution of our PDE (2.12) in the geometric stopping case for experts and is symmetric with respect to , and in the quadrant where , its formula is
[TABLE]
Proof.
Since by Theorem 6 the PDE (2.12) has a unique at most linear growth solution, all we need to do is verify that , which has linear growth, is a solution.
First, let us establish that the expression is a solution within a quadrant. One can differentiate the formula to find the first derivatives:
[TABLE]
We see that indeed as expected. The interesting are , and , i.e.
[TABLE]
and we find second derivatives
[TABLE]
Hence, in this quadrant
[TABLE]
Plugging into the PDE, we establish that
[TABLE]
hence is a solution to the PDE in this quadrant, and by symmetry in all quadrants.
All we have left to show is that the expression stays across the surfaces bounding the quadrants, and at the origin. Observe that the expression is , with bounded third derivatives away from the surfaces ( and ). The expression is symmetric across the surfaces, and even, as
[TABLE]
and
[TABLE]
Because the expression is even, it is across these surfaces. It remains to show that is at the origin. Let us consider the Taylor expansion of the function in the quadrant . It is
[TABLE]
which is a symmetric function up to second order, with bounded third derivatives. By symmetry, the second order part of the Taylor expansion is the same in other sectors. Thus the formula is at the origin. Thus the function is at the origin, as well as everywhere else. We established that is a solution of the PDE. ∎
Now that we have presented the solution to the geometric stopping problem, we analyze which strategy the solution corresponds to (see subsection 3.4). On the quadrant , the solution has:
[TABLE]
Since when and when we see (via the discussion in Section 3.4) that the market has two optimal strategies:
- •
choose and with probability each, or
- •
choose and with probability each.
How could one find the explicit solution (7.1)? Well, suppose we know the optimal strategy on a region . Then,
[TABLE]
is the corresponding PDE (or ODE). We know that the solutions of involve exponentials, so we expect a solution of the form
[TABLE]
The boundary condition of at most linear growth at infinity helps rule out the exponentials that grow at infinity, whereas the boundary conditions on the walls , helps one determine the explicit solution formula.
7.2 The Geometric Stopping Case with is different
It is natural to ask whether the geometric stopping case with experts (and ) can be solved explictly by making an educated guess based on what we just did for . We show in this section that the answer is no. (In fact an exact solution for the geometric stopping case with experts is now known; it was found by [12], using our PDE characterization and arguments much more involved than those in this section.)
Recall that for , one of the market’s two optimal strategies in the sector was to advance the leading expert (i.e. take ) with probability , and to advance everyone else (i.e. take ) with probability . With this in mind, we ask whether for it would be optimal in the sector for the market to advance the leading expert with probability and advance everyone else with probability . If so, then in this sector the value function would satisfy
[TABLE]
It must also have linear growth at infinity, and at the sector’s boundaries symmetry demands that when , when , and when These conditions fully determine the function; after some calculation, one obtains
[TABLE]
We shall show that the proposed strategy is not optimal (and is not the value function in the sector ) by showing that in part of this sector. It suffices to show that
[TABLE]
in part of the sector. Explicit calculation gives
[TABLE]
Therefore
[TABLE]
Evidently, when , , which is strictly smaller than . So (by continuity)
[TABLE]
in part of the sector near . Thus the proposed strategy is not optimal, and is not the value function in this sector.
The reference list from the paper itself. Each links out to its DOI / PubMed record.
- 1[1] G. Barles and P. E. Souganidis, “Convergence of approximation schemes for fully nonlinear second order equations,” Asymptotic analysis , vol. 4, no. 3, pp. 271–283, 1991.
- 2[2] N. Gravin, Y. Peres, and B. Sivan, “Towards optimal algorithms for prediction with expert advice,” in Proceedings of the Twenty-Seventh Annual ACM-SIAM Symposium on Discrete Algorithms , SODA ’16, (Philadelphia, PA, USA), pp. 528–547, Society for Industrial and Applied Mathematics, 2016.
- 3[3] K. Zhu, “Two problems in applications of pde,” http://pqdtopen.proquest.com/pubnum/3635320.html , 2014.
- 4[4] T. M. Cover, “Behavior of sequential predictors of binary sequences.,” tech. rep., DTIC Document, 1966.
- 5[5] N. Cesa-Bianchi and G. Lugosi, Prediction, Learning, and Games . New York, NY, USA: Cambridge University Press, 2006.
- 6[6] V. G. Vovk, “Aggregating strategies,” in Proceedings of the Third Annual Workshop on Computational Learning Theory , COLT ’90, (San Francisco, CA, USA), pp. 371–386, Morgan Kaufmann Publishers Inc., 1990.
- 7[7] N. Littlestone and M. K. Warmuth, “The weighted majority algorithm,” Inf. Comput. , vol. 108, pp. 212–261, Feb. 1994.
- 8[8] D. Haussler, J. Kivinen, and M. Warmuth, “Tight worst-case loss bounds for predicting with expert advice,” tech. rep., Santa Cruz, CA, USA, 1994.
