PDE approach to the problem of online prediction with expert advice: a   construction of potential-based strategies

Dmitry B. Rokhlin

arXiv:1705.01091·cs.LG·May 3, 2017

PDE approach to the problem of online prediction with expert advice: a construction of potential-based strategies

Dmitry B. Rokhlin

PDF

TL;DR

This paper introduces a PDE-based framework for online prediction with expert advice, linking supersolutions of a nonlinear PDE to potential functions that guide regret-minimizing strategies.

Contribution

It develops a novel PDE approach to construct potential-based strategies for online prediction, extending classical methods with a rigorous mathematical foundation.

Findings

01

Supersolutions of a nonlinear PDE relate to potential functions in prediction.

02

Potential-based strategies satisfy the Blackwell condition.

03

A new upper bound for worst-case regret is established.

Abstract

We consider a sequence of repeated prediction games and formally pass to the limit. The supersolutions of the resulting non-linear parabolic partial differential equation are closely related to the potential functions in the sense of N.\,Cesa-Bianci, G.\,Lugosi (2003). Any such supersolution gives an upper bound for forecaster's regret and suggests a potential-based prediction strategy, satisfying the Blackwell condition. A conventional upper bound for the worst-case regret is justified by a simple verification argument.

Equations86

a_{t} = ⟨ p_{t}, f_{t} ⟩ := i = 1 \sum N p_{t}^{i} f_{t}^{i}, p_{t} \in Δ := {z \geq 0 : ⟨ z, 1 ⟩ = 1},

a_{t} = ⟨ p_{t}, f_{t} ⟩ := i = 1 \sum N p_{t}^{i} f_{t}^{i}, p_{t} \in Δ := {z \geq 0 : ⟨ z, 1 ⟩ = 1},

R_{n} = t = 0 \sum n - 1 l (⟨ p_{t}, f_{t} ⟩, b_{t}) - 1 \leq i \leq N min t = 0 \sum n - 1 l (f_{t}^{i}, b_{t})

R_{n} = t = 0 \sum n - 1 l (⟨ p_{t}, f_{t} ⟩, b_{t}) - 1 \leq i \leq N min t = 0 \sum n - 1 l (f_{t}^{i}, b_{t})

\frac{R _{n} ( ( p _{t}^{*} ) _{t = 0}^{n - 1} , ( f _{t} ) _{t = 0}^{n - 1} , ( b _{t} ) _{t = 0}^{n - 1} )}{n} \leq C

\frac{R _{n} ( ( p _{t}^{*} ) _{t = 0}^{n - 1} , ( f _{t} ) _{t = 0}^{n - 1} , ( b _{t} ) _{t = 0}^{n - 1} )}{n} \leq C

R_{n} = f_{0} \in A^{N} sup p_{0} \in Δ in f b_{0} \in B sup \dots f_{n - 1} \in A^{N} sup p_{n - 1} \in Δ in f b_{n - 1} \in B sup R_{n} ((p_{t})_{t = 0}^{n - 1}, (f_{t})_{t = 0}^{n - 1}, (b_{t})_{t = 0}^{n - 1})

R_{n} = f_{0} \in A^{N} sup p_{0} \in Δ in f b_{0} \in B sup \dots f_{n - 1} \in A^{N} sup p_{n - 1} \in Δ in f b_{n - 1} \in B sup R_{n} ((p_{t})_{t = 0}^{n - 1}, (f_{t})_{t = 0}^{n - 1}, (b_{t})_{t = 0}^{n - 1})

X_{s + 1}^{i, t, x, p, f, b}

X_{s + 1}^{i, t, x, p, f, b}

r^{i}

\frac{1}{n} R_{n} ((p_{t})_{t = 0}^{n - 1}, (f_{t})_{t = 0}^{n - 1}, (b_{t})_{t = 0}^{n - 1}) = max {X_{n}^{1, 0, 0, p, f, b}, \dots, X_{n}^{N, 0, 0, p, f, b}} .

\frac{1}{n} R_{n} ((p_{t})_{t = 0}^{n - 1}, (f_{t})_{t = 0}^{n - 1}, (b_{t})_{t = 0}^{n - 1}) = max {X_{n}^{1, 0, 0, p, f, b}, \dots, X_{n}^{N, 0, 0, p, f, b}} .

v^{n} (t / n, x) = f_{t} \in A^{N} sup p_{t} \in Δ in f b_{t} \in B sup \dots f_{n - 1} \in A^{N} sup p_{n - 1} \in Δ in f b_{n - 1} \in B sup g (X_{n}^{t, x, p, f, b}),

v^{n} (t / n, x) = f_{t} \in A^{N} sup p_{t} \in Δ in f b_{t} \in B sup \dots f_{n - 1} \in A^{N} sup p_{n - 1} \in Δ in f b_{n - 1} \in B sup g (X_{n}^{t, x, p, f, b}),

v^{n} (1, x) = g (x),

v^{n} (1, x) = g (x),

v^{n} (t / n, x) = f \in A^{N} sup p \in Δ in f b \in B sup v^{n} ((t + 1) / n, x + r (p, f, b) / n),

v^{n} (t / n, x) = f \in A^{N} sup p \in Δ in f b \in B sup v^{n} ((t + 1) / n, x + r (p, f, b) / n),

0

0

\displaystyle+\frac{1}{2}\langle v^{n}_{xx}(t,x)r(p,f,b),r(p,f,b)\rangle+o(1)\Bigr{\}},

Γ (γ, f) := {p \in Δ : ⟨ γ, r (p, f, b)⟩ \leq 0, b \in B} \neq = \emptyset

Γ (γ, f) := {p \in Δ : ⟨ γ, r (p, f, b)⟩ \leq 0, b \in B} \neq = \emptyset

i = 1 \sum N γ^{i} r^{i} (\frac{γ}{⟨ γ , 1 ⟩}, f, b) = ⟨ γ, 1 ⟩ l (\frac{⟨ γ , f ⟩}{⟨ γ , 1 ⟩}, b) - i = 1 \sum N γ^{i} l (f^{i}, b) \leq 0

i = 1 \sum N γ^{i} r^{i} (\frac{γ}{⟨ γ , 1 ⟩}, f, b) = ⟨ γ, 1 ⟩ l (\frac{⟨ γ , f ⟩}{⟨ γ , 1 ⟩}, b) - i = 1 \sum N γ^{i} l (f^{i}, b) \leq 0

0 \leq f \in A^{N} sup p \in Γ (v_{x}^{n}, f) in f b \in B sup {v_{t}^{n} (t, x) + \frac{1}{2} ⟨ v_{xx}^{n} (t, x) r (p, f, b), r (p, f, b)⟩ + o (1)} .

0 \leq f \in A^{N} sup p \in Γ (v_{x}^{n}, f) in f b \in B sup {v_{t}^{n} (t, x) + \frac{1}{2} ⟨ v_{xx}^{n} (t, x) r (p, f, b), r (p, f, b)⟩ + o (1)} .

- v_{t} (t, x) - G (v_{x} (t, x), v_{xx} (t, x)) \leq 0,

- v_{t} (t, x) - G (v_{x} (t, x), v_{xx} (t, x)) \leq 0,

G (γ, S) = \frac{1}{2} f \in A^{N} sup p \in Γ (γ, f) in f b \in B sup ⟨ S r (p, f, b), r (p, f, b)⟩,

- u_{t} (t, x) - G (u_{x} (t, x), u_{xx} (t, x)) = 0

- u_{t} (t, x) - G (u_{x} (t, x), u_{xx} (t, x)) = 0

u (1, x) = g (x) = max {x_{1}, \dots, x_{n}} .

u (1, x) = g (x) = max {x_{1}, \dots, x_{n}} .

\overline{v} (t, x) = sup {

\overline{v} (t, x) = sup {

and v^{n_{k}} (t_{k}, x_{k}) converges} .

n \to \infty lim sup \frac{1}{n} R_{n} \leq \overline{v} (0, 0) .

n \to \infty lim sup \frac{1}{n} R_{n} \leq \overline{v} (0, 0) .

- w_{t} (t, x) - G (w_{x} (t, x), w_{xx} (t, x)) \geq 0, w (1, x) \geq g (x),

- w_{t} (t, x) - G (w_{x} (t, x), w_{xx} (t, x)) \geq 0, w (1, x) \geq g (x),

Φ (x) \geq g (x),

Φ (x) \geq g (x),

\frac{1}{2} x \in R^{N} sup ⟨ Φ_{xx} (x) h, h ⟩ \leq c, ∣ h^{i} ∣ \leq 1, i = 1, \dots, N .

\frac{1}{2} x \in R^{N} sup ⟨ Φ_{xx} (x) h, h ⟩ \leq c, ∣ h^{i} ∣ \leq 1, i = 1, \dots, N .

p^{*} (x, f) \in Γ (Φ_{x} (x), f) .

p^{*} (x, f) \in Γ (Φ_{x} (x), f) .

p^{*} (x, f) = \frac{Φ _{x} ( x )}{⟨ Φ _{x} ( x ) , 1 ⟩} \in Γ (Φ_{x} (x), f) .

p^{*} (x, f) = \frac{Φ _{x} ( x )}{⟨ Φ _{x} ( x ) , 1 ⟩} \in Γ (Φ_{x} (x), f) .

X_{s + 1}^{*, i} = X_{s}^{*, i} + \frac{1}{n} r^{i} (p_{s}^{*}, f_{s}, b_{s}), p_{s}^{*} = p^{*} (X_{s}^{*}, f_{s}), X_{0}^{*} = 0.

X_{s + 1}^{*, i} = X_{s}^{*, i} + \frac{1}{n} r^{i} (p_{s}^{*}, f_{s}, b_{s}), p_{s}^{*} = p^{*} (X_{s}^{*}, f_{s}), X_{0}^{*} = 0.

⟨ Φ_{x} (X_{t}^{*}), r (p_{t}^{*}, f_{t}, b_{t})⟩ \leq 0

⟨ Φ_{x} (X_{t}^{*}), r (p_{t}^{*}, f_{t}, b_{t})⟩ \leq 0

a_{t}^{*} = ⟨ p_{t}^{*}, f_{t} ⟩ = \frac{⟨ Φ _{x} ( X _{t}^{*} ) , f _{t} ⟩}{⟨ Φ _{x} ( X _{t}^{*} ) , 1 ⟩} .

a_{t}^{*} = ⟨ p_{t}^{*}, f_{t} ⟩ = \frac{⟨ Φ _{x} ( X _{t}^{*} ) , f _{t} ⟩}{⟨ Φ _{x} ( X _{t}^{*} ) , 1 ⟩} .

a_{t}^{*} = ⟨ p_{t}^{*}, f ⟩, p_{t}^{*} = p^{*} (X_{t}^{*}, f_{t}), t = 0, \dots, n - 1,

a_{t}^{*} = ⟨ p_{t}^{*}, f ⟩, p_{t}^{*} = p^{*} (X_{t}^{*}, f_{t}), t = 0, \dots, n - 1,

w ((t + 1) / n, X_{t + 1}^{*})

w ((t + 1) / n, X_{t + 1}^{*})

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Full text

PDE approach to the problem of online prediction with expert advice: a construction of potential-based strategies

Dmitry B. Rokhlin

Institute of Mathematics, Mechanics and Computer Sciences, Southern Federal University, Mil’chakova str., 8a, 344090, Rostov-on-Don, Russia

[email protected]

Abstract.

We consider a sequence of repeated prediction games and formally pass to the limit. The supersolutions of the resulting non-linear parabolic partial differential equation are closely related to the potential functions in the sense of N. Cesa-Bianci, G. Lugosi (2003). Any such supersolution gives an upper bound for forecaster’s regret and suggests a potential-based prediction strategy, satisfying the Blackwell condition. A conventional upper bound for the worst-case regret is justified by a simple verification argument.

Key words and phrases:

regret, online learning, potentials, non-linear parabolic PDE, weighted average forecaster

2010 Mathematics Subject Classification:

68T05, 68W27, 35K55

1. Introduction

Let $B$ be any set. In the problem of online prediction with expert advice a forecaster predicts a sequence $(b_{t})_{t=0}^{n-1}$ , $b_{t}\in B$ on the basis of expert opinions $f_{t}^{i}\in A$ , $i=1,\dots,N$ , where $A$ is a convex subset of a vector space. More precisely, at round $t\in\{0,\dots,n-1\}$ forecaster’s guess $a_{t}$ is a convex combination of expert advices:

[TABLE]

based on the available history and current advices: $p_{t}=p_{t}((b_{s})_{s=0}^{t-1}$ , $(f_{s})_{s=0}^{t})$ .

Let $l:A\times B\mapsto[0,1]$ be a loss function. Forecaster’s aim is to keep the regret

[TABLE]

small. This regret $R_{n}$ measures the quality of predictions by comparing the cumulative loss of the forecaster with that of a best expert, chosen in hindsight.

We refer to [4] for more information on this problem. The basic result (see, e.g, [4, Theorem 2.2]) guarantees the existence of a prediction strategy $p^{*}$ achieving the uniform bound

[TABLE]

for any $(b_{t})_{t=0}^{n-1}$ , $(f_{t})_{t=0}^{n-1}$ under the assumption that $l$ is convex in its first argument. Moreover, this bound cannot be improved without further assumptions: [4, Theorem 3.7]. The inequality (1.1) implies that in the long run on average the forecaster predicts as well as a best expert: $R_{n}/n\to 0$ , $n\to\infty$ .

There are plenty of strategies achieving the bound (1.1). In [3] it was shown that for a rather general class of online learning problems the construction of such strategies can be based on the notion of potential function. More recently [7] proposed a systematic way for the construction of potentials in the case of randomized prediction, mentioning that “The origin/recipe for “good” potential functions has always been a mystery (at least to the authors).” The authors of [7] considered a recurrence relation, for the value function of a repeated game, determining the optimal regret, and showed that potential functions are related to relaxations of this function, which are consistent with the mentioned recurrence relation. To obtain such relaxations they used upper bounds, developed in the theory of online learning and capturing the complexity of the problem.

In this paper we show that for the problem of prediction with expert advice there is another “natural” way for the appearance of potential-based algorithms. As in [7], we consider a repeated game, determining the optimal regret, and the correspondent recurrence relation for the value functions $v^{n}$ . Further, in contrast to [7], we simply pass to the limit as $n\to\infty$ and get a non-linear parabolic Bellman-Isaacs type partial differential equation in $[0,1]\times\mathbb{R}^{N}$ . A rigorous justification of this procedure can be performed within the theory of viscosity solutions. However, being interested only in the construction of prediction strategies, we need not do it! As usual, a Bellman-type equation at least formally produces optimal strategies. More precisely, we consider the strategies, generated by appropriate smooth supersolutions, and then directly check the inequality (1.1), using the argumentation similar to that of the verification method from the theory of optimal control.

The described approach is mainly inspired by the paper [6], where there was studied a link between fully non-linear second order (parabolic and elliptic) PDE and repeated games. Its application to the problems of online learning theory was initiated in [10], where an asymptotics of the sequential Rademacher complexity (the last notion was introduced in [8]) of a finite function class was related to the viscosity solution of a $G$ -heat equation. In turn, the result of [10] is based on the central limit theorem under model uncertainty, studied within the same approach in [9].

2. Prediction game and the limiting PDE

The worst-case regret

[TABLE]

is a result of the repeated game between the predictor, an adversary and experts. In this game the adversary has an informational advantage over the predictor and experts, since $b_{t}$ is chosen after the sequences $(p_{j})_{j=0}^{t}$ , $(f_{j})_{j=0}^{t}$ are revealed. Furthermore, the predictor has an informational advantage over the experts, since the choice of $p_{t}$ can be based on $(f_{j})_{j=0}^{t}$ , $(b_{j})_{j=0}^{t-1}$ . Finally, experts can use only the information contained in $(p_{j})_{j=0}^{t-1}$ , $(b_{j})_{j=0}^{t-1}$ . The adversary and experts play against the predictor, trying to maximize his regret.

To get a recurrent formula for $\mathscr{R}_{n}$ let us introduce the family of state processes

[TABLE]

Summing up the increments $X_{s+1}^{i,0,0,p,f,b}-X_{s}^{i,0,0,p,f,b}$ , we obtain

[TABLE]

Let us introduce the value functions

[TABLE]

where $t=0,\dots,n-1$ , $g(x)=\max\{x_{1},\dots,x_{n}\}$ . From the dynamic programming theory it is known that $v^{n}$ satisfies the recurrence relations

[TABLE]

$t\leq n-1$ , $r=(r^{1},\dots,r^{N})$ . We stress that we need not rigorously justify this and subsequent claims, since our goal is to formally construct prediction strategies. Their verification is delayed to the last step.

For a moment imagine that $v^{n}$ is a smooth function, satisfying (2.3) on $[0,1-1/n]\times\mathbb{R}^{N}$ . Then, by Taylor’s formula we get

[TABLE]

where $v^{n}_{x}$ , $v^{n}_{xx}$ are the gradient vector and the Hessian matrix.

We will say that the loss function $l$ satisfies the Blackwell condition if

[TABLE]

for all $(\gamma,f)\in\mathbb{R}^{N}_{+}\times A^{N}$ . Clearly, $\Gamma(0,f)=\Delta$ . The Blackwell condition (2.5) is satisfied if $l$ is convex in its first argument. In this case $p=\gamma/\langle\gamma,1\rangle\in\Gamma(\gamma,f)$ for $\gamma\in\mathbb{R}^{N}_{+}\backslash\{0\}$ , since

[TABLE]

by Jensen’s inequality.

By the nature of $v^{n}$ these functions are non-decreasing in each $x_{i}$ . Indeed, $v^{n}(t/n,x)$ is the optimal worst-case regret if the initial regret with respect to $i$ -th expert at time moment $t$ equals to $x_{i}$ . From (2) we get

[TABLE]

So, we expect that the limiting function $v$ satisfies the inequality

[TABLE]

and the boundary condition $v(1,x)=g(x)$ . Note that $G(\gamma,S)\geq G(\gamma,S^{\prime})$ , if the symmetric $N\times N$ matrix $S-S^{\prime}$ is non-negative definite. Hence,

[TABLE]

is a fully non-linear parabolic equation (see [5]). Along with (2.7) we consider the boundary condition

[TABLE]

The functions $v^{n}$ are defined on $Q^{n}=\{0,1/n,\dots,(n-1)/n,1\}\times\mathbb{R}^{N}.$ To describe their limiting behavior in a rigorous way, one can consider the Barles-Perthame half-relaxed (weak) upper limit:

[TABLE]

From the results of [1, 2, 6] and the above calculations we expect that $\overline{v}$ is a viscosity subsolution of (2.7), (2.8). Note, that by the definition,

[TABLE]

3. Smooth supersolutions and induced weighted average forecasting strategies

Take a smooth supersolution $w$ of (2.7), (2.8):

[TABLE]

which is non-decreasing in each variable $x_{i}$ . Assuming a comparison result: $\overline{v}\leq w$ , we conclude that the inequality (2.9) holds true for $w(0,0)$ instead of $\overline{v}(0,0)$ . We also expect that a strategy $p_{t}(x)\in\Gamma(w_{x}(t,x),f)$ will produce the regret, satisfying this bound.

Let us look for supersolutions of the form $w(t,x)=c(1-t)+\Phi(x)$ , where $c$ is a constant,

[TABLE]

and $\Phi$ is non-decreasing in each variable. The differential inequality (3.10) implies the condition $G(\Phi_{x}(x),\Phi_{xx}(x))\leq c$ . This condition is satisfied if

[TABLE]

Then by the Blackwell condition (2.5) there exists a vector-function

[TABLE]

If $l$ is convex in its first argument and $\Phi$ is strictly increasing in each variable, then, according to the remark after the formula (2.5), one can take

[TABLE]

Consider the discrete-time state process (2.2), generated by the prediction strategy, related to $p^{*}$ :

[TABLE]

Note, that $p_{t}^{*}$ automatically satisfies the inequality

[TABLE]

which is also called the Blackwell condition: see [3, 4]. For a convex function $a\mapsto l(a,b)$ from (3.14) we get a weighted average forecaster:

[TABLE]

Theorem 1.

Let the Blackwell condition (2.5) be satisfied, and let $\Phi:\mathbb{R}^{N}\mapsto\mathbb{R}$ be a twice continuously differentiable function, which non-decreases in each variable and meets the conditions (3.11), (3.12). Then a prediction strategy

[TABLE]

where $p^{*}$ satisfies (3.13) and $X^{*}$ is defined by (3.15), produces the regret, satisfying the inequality (1.1) with $C=c+\Phi(0).$

Proof.

For $w(t,x)=c(1-t)+\Phi(x)$ by Taylor’s formula we get

[TABLE]

for some $\xi_{t}$ , where the last inequality is implied by (3.16) and (3.12). Now the assertion of the theorem follows from the condition (3.11):

[TABLE]

Following [3, 4] we call $\Phi$ a potential function. The most natural smooth upper bound for $\max\{x_{1},\dots,x_{N}\}$ , and hence a candidate for a potential, is a soft-maximum function

[TABLE]

This function is included in a more general class $\Phi(x)=\psi\left(\sum_{i=1}^{N}\phi(x_{i})\right)$ considered in [3, 4], where $\psi$ and $\phi$ are assumed to be concave and convex respectively. The following inequality is also taken from [3, 4]:

[TABLE]

For (3.18) we have $\psi(x)=\eta^{-1}\ln x$ , $\phi(x)=e^{\eta x}$ ,

[TABLE]

For $p_{t}^{*}$ generated by (3.18), in accordance with Theorem 1 we have

[TABLE]

for an “optimal” choice $\eta=\sqrt{2\ln N}$ (cf. [4, Corollary 2.2]). The formula (3.17) reduces to

[TABLE]

where $L_{t}^{i}=\sum_{s=0}^{t-1}l(f^{i}_{t},b_{s})$ is the cumulative loss of $i$ -th expert. This is a basic version of the exponentially weighted average forecaster: see [4, Chapter 2].

4. Randomized prediction

Assume that the forecaster randomly chooses a prediction by taking a sample $I_{t}$ from a probability distribution $p_{t}=(p_{t}^{1},\dots,p_{t}^{N})$ over $\{y_{1},\dots,y_{N}\}$ . His cumulative loss is compared with the cumulative loss of a best fixed prediction:

[TABLE]

and the regret is defined as the expectation of this quantity with respect to the induced artificial probability measure:

[TABLE]

The game, where the forecaster knows the previous moves: $p_{t}=p_{t}(b_{0},\dots,b_{t-1})$ , and the adversary knows the prediction algorithm: $b_{t}=b_{t}(p_{0},\dots,p_{t})$ but not the predictions $I_{t}$ itself, corresponds to the case of an oblivious adversary: [4, Chapter 2]. However, the case of non-oblivious adversary is not interesting for the problem of this form: see [4, Lemma 4.1].

The described game is simpler than that considered above, since the “experts”, corresponding to fixed predictions, do not play against the forecaster. Moreover, the condition (2.5) is satisfied regardless of the convexity of $l$ . Repeating the reasoning of Section 2, we get the inequality (2.6) with

[TABLE]

So, a prediction strategy satisfying

[TABLE]

where $\Phi$ meets the conditions of Theorem 1, and $X_{t}^{*}$ is defined by the recursion of the form (3.15), produces the regret $R_{n}\leq C/\sqrt{n}$ . In particular, $C=\sqrt{2\ln N}$ for the exponentially weighted average forecaster, discussed after Theorem 1.

Finally, we note that the case of internal regret (see [4, Section 4.4]) can be considered in the same way.

5. Acknowledgments

The research is supported by the Russian Science Foundation, project 17-19-01038.

Bibliography10

The reference list from the paper itself. Each links out to its DOI / PubMed record.

1[1] Barles, G., Perthame, B.: Exit time problems in optimal control and vanishing viscosity method. SIAM J. Control Optim. 26 (5), 1133–1148 (1988)
2[2] Barles, G., Souganidis, P.E.: Convergence of approximation schemes for fully nonlinear second order equations. Asymptot. Anal. 4 , 271–283 (1991)
3[3] Cesa-Bianchi, N., Lugosi, G.: Potential-based algorithms in on-line prediction and game theory. Mach. Learn. 51 (3), 239 261 (2003)
4[4] Cesa-Bianchi, N., Lugosi, G.: Prediction, learning, and games. Cambridge University Press, New York (2006)
5[5] Crandall, M., Ishii, H., Lions, P.L.: User’s guide to viscosity solutions of second order partial differential equations. Bull. Amer. Math. Soc. 27 (1), 1–67 (1992)
6[6] Kohn, R., Serfaty, S.: A deterministic-control-based approach to fully nonlinear parabolic and elliptic equations. Commun. Pur. Appl. Math. 63 (10), 1298–1350 (2010)
7[7] Rakhlin, A., Shamir, O., Sridharan, K.: Relax and randomize: from value to algorithms. In: F. Pereira, C.J.C. Burges, L. Bottou, K.Q. Weinberger (eds.) Advances in Neural Information Processing Systems 25, pp. 2141–2149. Curran Associates, Inc. (2012)
8[8] Rakhlin, A., Sridharan, K., Tewari, A.: Online learning: random averages, combinatorial parameters, and learnability. In: J.D. Lafferty, C.K.I. Williams, J. Shawe-Taylor, R.S. Zemel, A. Culotta (eds.) Advances in Neural Information Processing Systems 23, pp. 1984–1992. Curran Associates, Inc. (2010)

TL;DR

Contribution

Findings

Abstract

Peer Reviews

Videos

PDE approach to the problem of online prediction with expert advice: a construction of potential-based strategies

Abstract.

Key words and phrases:

2010 Mathematics Subject Classification:

1. Introduction

2. Prediction game and the limiting PDE

3. Smooth supersolutions and induced weighted average forecasting strategies

Theorem 1**.**

Proof.

4. Randomized prediction

5. Acknowledgments

Theorem 1.