Online Prediction of Stochastic Sequences with High Probability Regret Bounds

Matthias Frey; Jonathan H. Manton; Jingge Zhu

arXiv:2602.16236·cs.LG·February 19, 2026

Online Prediction of Stochastic Sequences with High Probability Regret Bounds

Matthias Frey, Jonathan H. Manton, Jingge Zhu

PDF

Open Access 3 Reviews

TL;DR

This paper establishes high-probability regret bounds for universal prediction of stochastic sequences, showing convergence rates similar to expectation bounds and proving limitations on improving these bounds without extra assumptions.

Contribution

It introduces high-probability regret bounds for stochastic sequence prediction and demonstrates their near-optimality through an impossibility result.

Findings

01

High-probability bounds match expectation bounds in form.

02

Convergence rate of O(T^{-1/2} δ^{-1/2}) with probability at least 1-δ.

03

Impossibility result limits improvement of δ-exponent without extra assumptions.

Abstract

We revisit the classical problem of universal prediction of stochastic sequences with a finite time horizon $T$ known to the learner. The question we investigate is whether it is possible to derive vanishing regret bounds that hold with high probability, complementing existing bounds from the literature that hold in expectation. We propose such high-probability bounds which have a very similar form as the prior expectation bounds. For the case of universal prediction of a stochastic process over a countable alphabet, our bound states a convergence rate of $O (T^{- 1/2} δ^{- 1/2})$ with probability as least $1 - δ$ compared to prior known in-expectation bounds of the order $O (T^{- 1/2})$ . We also propose an impossibility result which proves that it is not possible to improve the exponent of $δ$ in a bound of the same form without making additional assumptions.

Peer Reviews

Decision·ICLR 2026 Poster

Reviewer 01Rating 8Confidence 3

Strengths

- The paper studies a fundamental problem. - The paper is well-written and has a clear related-work section. - The paper presents an impossibility theorem clarifying the optimality of the proposed dependence on $\delta$

Weaknesses

- The paper heavily focusses on theoretical analysis while leaving numerical experiments as future research directions. - The related work section could be tightened. Citations to multi-armed bandit and MDP literature feel tangential since those problems involve decision-making and exploration-exploitation tradeoffs whereas the present paper studies passive sequence prediction. Such citations distract from the main focus on universal prediction although there might be possible methodological ove

Reviewer 02Rating 4Confidence 3

Strengths

1. Prediction of stochastic sequences is a fundamental problem, and a high-probability guarantee is often more relevant for practitioners than expected error. This work fills a gap in previous work in that only an expected regret was known. 2. The authors show that in the high-probability setting, the same convergence rate of $T^{-1/2}$ w.r.t the time horizon can be achieved. The algorithmic framework has some generality, in that similar bounds are also obtained for other settings with different

Weaknesses

1. The main weakness is that the dependence on $\delta$ is not ideal, and it seems that the lower bound result does not hold generally for all algorithms. Specifically, Theorem 5 appears to be specific to some class of policies satisfying equation (3) in the paper. Thus, it only shows that the *error analysis* for the proposed algorithm is in a way optimal, but does not seem to be a fundamental limit that generally applies to all algorithms.

Reviewer 03Rating 6Confidence 3

Strengths

- Clear modular proof strategy (martingale concentration around an empirical TV term, plus information-distance control) that feels broadly reusable - First high-probability regret guarantees in this stochastic-sequence setup with general bounded loss and non-i.i.d. data - Useful universal-prediction corollaries via mixtures; the rate summary is easy to parse.

Weaknesses

- I find the dependence on the confidence parameter heavy: once the random pathwise term is de-randomized, the bounds pick up $1/\delta$ or $1/\sqrt{\delta}$ factors. I get that the impossibility result shows this is unavoidable in full generality, but for practical confidence targets the guarantees feel conservative. Readers would be curious to see whether under mild added assumptions (mixing, exp-concavity/log-loss) one can recover something closer to $\log(1/\delta)$. - From a practicality an

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsAdvanced Bandit Algorithms Research · Machine Learning and Algorithms · Stochastic Gradient Optimization Techniques