Constant regret for sequence prediction with limited advice
El Mehdi Saad (LMO), G. Blanchard (LMO, DATASHAPE)

TL;DR
This paper introduces a new sequence prediction strategy that achieves constant regret by observing multiple experts per round, improving over traditional methods that have regret depending on the horizon length.
Contribution
It proposes a novel approach allowing the learner to observe and predict using multiple experts simultaneously, leading to near-optimal constant regret bounds in limited advice settings.
Findings
Allowing two experts to be observed per round yields constant regret.
The proposed strategy is optimal up to a logarithmic factor in the number of experts.
Single expert observation per round results in slow, square-root regret.
Abstract
We investigate the problem of cumulative regret minimization for individual sequence prediction with respect to the best expert in a finite family of size K under limited access to information. We assume that in each round, the learner can predict using a convex combination of at most p experts for prediction, then they can observe a posteriori the losses of at most m experts. We assume that the loss function is range-bounded and exp-concave. In the standard multi-armed bandits setting, when the learner is allowed to play only one expert per round and observe only its feedback, known optimal regret bounds are of the order O(\sqrt KT). We show that allowing the learner to play one additional expert per round and observe one additional feedback improves substantially the guarantees on regret. We provide a strategy combining only p = 2 experts per round for prediction and observing m…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsAdvanced Bandit Algorithms Research · Machine Learning and Algorithms · Distributed Sensor Networks and Detection Algorithms
