Online Prediction of Stochastic Sequences with High Probability Regret Bounds
Matthias Frey, Jonathan H. Manton, Jingge Zhu

TL;DR
This paper establishes high-probability regret bounds for universal prediction of stochastic sequences, showing convergence rates similar to expectation bounds and proving limitations on improving these bounds without extra assumptions.
Contribution
It introduces high-probability regret bounds for stochastic sequence prediction and demonstrates their near-optimality through an impossibility result.
Findings
High-probability bounds match expectation bounds in form.
Convergence rate of O(T^{-1/2} δ^{-1/2}) with probability at least 1-δ.
Impossibility result limits improvement of δ-exponent without extra assumptions.
Abstract
We revisit the classical problem of universal prediction of stochastic sequences with a finite time horizon known to the learner. The question we investigate is whether it is possible to derive vanishing regret bounds that hold with high probability, complementing existing bounds from the literature that hold in expectation. We propose such high-probability bounds which have a very similar form as the prior expectation bounds. For the case of universal prediction of a stochastic process over a countable alphabet, our bound states a convergence rate of with probability as least compared to prior known in-expectation bounds of the order . We also propose an impossibility result which proves that it is not possible to improve the exponent of in a bound of the same form without making additional assumptions.
Peer Reviews
Decision·ICLR 2026 Poster
- The paper studies a fundamental problem. - The paper is well-written and has a clear related-work section. - The paper presents an impossibility theorem clarifying the optimality of the proposed dependence on $\delta$
- The paper heavily focusses on theoretical analysis while leaving numerical experiments as future research directions. - The related work section could be tightened. Citations to multi-armed bandit and MDP literature feel tangential since those problems involve decision-making and exploration-exploitation tradeoffs whereas the present paper studies passive sequence prediction. Such citations distract from the main focus on universal prediction although there might be possible methodological ove
1. Prediction of stochastic sequences is a fundamental problem, and a high-probability guarantee is often more relevant for practitioners than expected error. This work fills a gap in previous work in that only an expected regret was known. 2. The authors show that in the high-probability setting, the same convergence rate of $T^{-1/2}$ w.r.t the time horizon can be achieved. The algorithmic framework has some generality, in that similar bounds are also obtained for other settings with different
1. The main weakness is that the dependence on $\delta$ is not ideal, and it seems that the lower bound result does not hold generally for all algorithms. Specifically, Theorem 5 appears to be specific to some class of policies satisfying equation (3) in the paper. Thus, it only shows that the *error analysis* for the proposed algorithm is in a way optimal, but does not seem to be a fundamental limit that generally applies to all algorithms.
- Clear modular proof strategy (martingale concentration around an empirical TV term, plus information-distance control) that feels broadly reusable - First high-probability regret guarantees in this stochastic-sequence setup with general bounded loss and non-i.i.d. data - Useful universal-prediction corollaries via mixtures; the rate summary is easy to parse.
- I find the dependence on the confidence parameter heavy: once the random pathwise term is de-randomized, the bounds pick up $1/\delta$ or $1/\sqrt{\delta}$ factors. I get that the impossibility result shows this is unavoidable in full generality, but for practical confidence targets the guarantees feel conservative. Readers would be curious to see whether under mild added assumptions (mixing, exp-concavity/log-loss) one can recover something closer to $\log(1/\delta)$. - From a practicality an
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsAdvanced Bandit Algorithms Research · Machine Learning and Algorithms · Stochastic Gradient Optimization Techniques
