PDE approach to the problem of online prediction with expert advice: a construction of potential-based strategies
Dmitry B. Rokhlin

TL;DR
This paper introduces a PDE-based framework for online prediction with expert advice, linking supersolutions of a nonlinear PDE to potential functions that guide regret-minimizing strategies.
Contribution
It develops a novel PDE approach to construct potential-based strategies for online prediction, extending classical methods with a rigorous mathematical foundation.
Findings
Supersolutions of a nonlinear PDE relate to potential functions in prediction.
Potential-based strategies satisfy the Blackwell condition.
A new upper bound for worst-case regret is established.
Abstract
We consider a sequence of repeated prediction games and formally pass to the limit. The supersolutions of the resulting non-linear parabolic partial differential equation are closely related to the potential functions in the sense of N.\,Cesa-Bianci, G.\,Lugosi (2003). Any such supersolution gives an upper bound for forecaster's regret and suggests a potential-based prediction strategy, satisfying the Blackwell condition. A conventional upper bound for the worst-case regret is justified by a simple verification argument.
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
PDE approach to the problem of online prediction with expert advice: a construction of potential-based strategies
Dmitry B. Rokhlin
Institute of Mathematics, Mechanics and Computer Sciences, Southern Federal University, Mil’chakova str., 8a, 344090, Rostov-on-Don, Russia
Abstract.
We consider a sequence of repeated prediction games and formally pass to the limit. The supersolutions of the resulting non-linear parabolic partial differential equation are closely related to the potential functions in the sense of N. Cesa-Bianci, G. Lugosi (2003). Any such supersolution gives an upper bound for forecaster’s regret and suggests a potential-based prediction strategy, satisfying the Blackwell condition. A conventional upper bound for the worst-case regret is justified by a simple verification argument.
Key words and phrases:
regret, online learning, potentials, non-linear parabolic PDE, weighted average forecaster
2010 Mathematics Subject Classification:
68T05, 68W27, 35K55
1. Introduction
Let be any set. In the problem of online prediction with expert advice a forecaster predicts a sequence , on the basis of expert opinions , , where is a convex subset of a vector space. More precisely, at round forecaster’s guess is a convex combination of expert advices:
[TABLE]
based on the available history and current advices: , .
Let be a loss function. Forecaster’s aim is to keep the regret
[TABLE]
small. This regret measures the quality of predictions by comparing the cumulative loss of the forecaster with that of a best expert, chosen in hindsight.
We refer to [4] for more information on this problem. The basic result (see, e.g, [4, Theorem 2.2]) guarantees the existence of a prediction strategy achieving the uniform bound
[TABLE]
for any , under the assumption that is convex in its first argument. Moreover, this bound cannot be improved without further assumptions: [4, Theorem 3.7]. The inequality (1.1) implies that in the long run on average the forecaster predicts as well as a best expert: , .
There are plenty of strategies achieving the bound (1.1). In [3] it was shown that for a rather general class of online learning problems the construction of such strategies can be based on the notion of potential function. More recently [7] proposed a systematic way for the construction of potentials in the case of randomized prediction, mentioning that “The origin/recipe for “good” potential functions has always been a mystery (at least to the authors).” The authors of [7] considered a recurrence relation, for the value function of a repeated game, determining the optimal regret, and showed that potential functions are related to relaxations of this function, which are consistent with the mentioned recurrence relation. To obtain such relaxations they used upper bounds, developed in the theory of online learning and capturing the complexity of the problem.
In this paper we show that for the problem of prediction with expert advice there is another “natural” way for the appearance of potential-based algorithms. As in [7], we consider a repeated game, determining the optimal regret, and the correspondent recurrence relation for the value functions . Further, in contrast to [7], we simply pass to the limit as and get a non-linear parabolic Bellman-Isaacs type partial differential equation in . A rigorous justification of this procedure can be performed within the theory of viscosity solutions. However, being interested only in the construction of prediction strategies, we need not do it! As usual, a Bellman-type equation at least formally produces optimal strategies. More precisely, we consider the strategies, generated by appropriate smooth supersolutions, and then directly check the inequality (1.1), using the argumentation similar to that of the verification method from the theory of optimal control.
The described approach is mainly inspired by the paper [6], where there was studied a link between fully non-linear second order (parabolic and elliptic) PDE and repeated games. Its application to the problems of online learning theory was initiated in [10], where an asymptotics of the sequential Rademacher complexity (the last notion was introduced in [8]) of a finite function class was related to the viscosity solution of a -heat equation. In turn, the result of [10] is based on the central limit theorem under model uncertainty, studied within the same approach in [9].
2. Prediction game and the limiting PDE
The worst-case regret
[TABLE]
is a result of the repeated game between the predictor, an adversary and experts. In this game the adversary has an informational advantage over the predictor and experts, since is chosen after the sequences , are revealed. Furthermore, the predictor has an informational advantage over the experts, since the choice of can be based on , . Finally, experts can use only the information contained in , . The adversary and experts play against the predictor, trying to maximize his regret.
To get a recurrent formula for let us introduce the family of state processes
[TABLE]
Summing up the increments , we obtain
[TABLE]
Let us introduce the value functions
[TABLE]
[TABLE]
where , . From the dynamic programming theory it is known that satisfies the recurrence relations
[TABLE]
, . We stress that we need not rigorously justify this and subsequent claims, since our goal is to formally construct prediction strategies. Their verification is delayed to the last step.
For a moment imagine that is a smooth function, satisfying (2.3) on . Then, by Taylor’s formula we get
[TABLE]
where , are the gradient vector and the Hessian matrix.
We will say that the loss function satisfies the Blackwell condition if
[TABLE]
for all . Clearly, . The Blackwell condition (2.5) is satisfied if is convex in its first argument. In this case for , since
[TABLE]
by Jensen’s inequality.
By the nature of these functions are non-decreasing in each . Indeed, is the optimal worst-case regret if the initial regret with respect to -th expert at time moment equals to . From (2) we get
[TABLE]
So, we expect that the limiting function satisfies the inequality
[TABLE]
and the boundary condition . Note that , if the symmetric matrix is non-negative definite. Hence,
[TABLE]
is a fully non-linear parabolic equation (see [5]). Along with (2.7) we consider the boundary condition
[TABLE]
The functions are defined on To describe their limiting behavior in a rigorous way, one can consider the Barles-Perthame half-relaxed (weak) upper limit:
[TABLE]
From the results of [1, 2, 6] and the above calculations we expect that is a viscosity subsolution of (2.7), (2.8). Note, that by the definition,
[TABLE]
3. Smooth supersolutions and induced weighted average forecasting strategies
Take a smooth supersolution of (2.7), (2.8):
[TABLE]
which is non-decreasing in each variable . Assuming a comparison result: , we conclude that the inequality (2.9) holds true for instead of . We also expect that a strategy will produce the regret, satisfying this bound.
Let us look for supersolutions of the form , where is a constant,
[TABLE]
and is non-decreasing in each variable. The differential inequality (3.10) implies the condition . This condition is satisfied if
[TABLE]
Then by the Blackwell condition (2.5) there exists a vector-function
[TABLE]
If is convex in its first argument and is strictly increasing in each variable, then, according to the remark after the formula (2.5), one can take
[TABLE]
Consider the discrete-time state process (2.2), generated by the prediction strategy, related to :
[TABLE]
Note, that automatically satisfies the inequality
[TABLE]
which is also called the Blackwell condition: see [3, 4]. For a convex function from (3.14) we get a weighted average forecaster:
[TABLE]
Theorem 1**.**
Let the Blackwell condition (2.5) be satisfied, and let be a twice continuously differentiable function, which non-decreases in each variable and meets the conditions (3.11), (3.12). Then a prediction strategy
[TABLE]
where satisfies (3.13) and is defined by (3.15), produces the regret, satisfying the inequality (1.1) with
Proof.
For by Taylor’s formula we get
[TABLE]
for some , where the last inequality is implied by (3.16) and (3.12). Now the assertion of the theorem follows from the condition (3.11):
[TABLE]
Following [3, 4] we call a potential function. The most natural smooth upper bound for , and hence a candidate for a potential, is a soft-maximum function
[TABLE]
This function is included in a more general class considered in [3, 4], where and are assumed to be concave and convex respectively. The following inequality is also taken from [3, 4]:
[TABLE]
For (3.18) we have , ,
[TABLE]
For generated by (3.18), in accordance with Theorem 1 we have
[TABLE]
for an “optimal” choice (cf. [4, Corollary 2.2]). The formula (3.17) reduces to
[TABLE]
where is the cumulative loss of -th expert. This is a basic version of the exponentially weighted average forecaster: see [4, Chapter 2].
4. Randomized prediction
Assume that the forecaster randomly chooses a prediction by taking a sample from a probability distribution over . His cumulative loss is compared with the cumulative loss of a best fixed prediction:
[TABLE]
and the regret is defined as the expectation of this quantity with respect to the induced artificial probability measure:
[TABLE]
[TABLE]
The game, where the forecaster knows the previous moves: , and the adversary knows the prediction algorithm: but not the predictions itself, corresponds to the case of an oblivious adversary: [4, Chapter 2]. However, the case of non-oblivious adversary is not interesting for the problem of this form: see [4, Lemma 4.1].
The described game is simpler than that considered above, since the “experts”, corresponding to fixed predictions, do not play against the forecaster. Moreover, the condition (2.5) is satisfied regardless of the convexity of . Repeating the reasoning of Section 2, we get the inequality (2.6) with
[TABLE]
[TABLE]
So, a prediction strategy satisfying
[TABLE]
where meets the conditions of Theorem 1, and is defined by the recursion of the form (3.15), produces the regret . In particular, for the exponentially weighted average forecaster, discussed after Theorem 1.
Finally, we note that the case of internal regret (see [4, Section 4.4]) can be considered in the same way.
5. Acknowledgments
The research is supported by the Russian Science Foundation, project 17-19-01038.
The reference list from the paper itself. Each links out to its DOI / PubMed record.
- 1[1] Barles, G., Perthame, B.: Exit time problems in optimal control and vanishing viscosity method. SIAM J. Control Optim. 26 (5), 1133–1148 (1988)
- 2[2] Barles, G., Souganidis, P.E.: Convergence of approximation schemes for fully nonlinear second order equations. Asymptot. Anal. 4 , 271–283 (1991)
- 3[3] Cesa-Bianchi, N., Lugosi, G.: Potential-based algorithms in on-line prediction and game theory. Mach. Learn. 51 (3), 239 261 (2003)
- 4[4] Cesa-Bianchi, N., Lugosi, G.: Prediction, learning, and games. Cambridge University Press, New York (2006)
- 5[5] Crandall, M., Ishii, H., Lions, P.L.: User’s guide to viscosity solutions of second order partial differential equations. Bull. Amer. Math. Soc. 27 (1), 1–67 (1992)
- 6[6] Kohn, R., Serfaty, S.: A deterministic-control-based approach to fully nonlinear parabolic and elliptic equations. Commun. Pur. Appl. Math. 63 (10), 1298–1350 (2010)
- 7[7] Rakhlin, A., Shamir, O., Sridharan, K.: Relax and randomize: from value to algorithms. In: F. Pereira, C.J.C. Burges, L. Bottou, K.Q. Weinberger (eds.) Advances in Neural Information Processing Systems 25, pp. 2141–2149. Curran Associates, Inc. (2012)
- 8[8] Rakhlin, A., Sridharan, K., Tewari, A.: Online learning: random averages, combinatorial parameters, and learnability. In: J.D. Lafferty, C.K.I. Williams, J. Shawe-Taylor, R.S. Zemel, A. Culotta (eds.) Advances in Neural Information Processing Systems 23, pp. 1984–1992. Curran Associates, Inc. (2010)
