Online Learning with Bounded Recall
Jon Schneider, Kiran Vodrahalli

TL;DR
This paper investigates bounded-recall online learning algorithms, demonstrating their limitations, proposing a new optimal algorithm with regret decreasing as 1/√M, and establishing the importance of loss ordering awareness.
Contribution
It introduces a stationary bounded-recall algorithm with optimal regret bounds and highlights the necessity of loss ordering awareness in such algorithms.
Findings
Bounded-recall algorithms can incur constant regret if naively constructed from mean-based methods.
A stationary bounded-recall algorithm achieves regret of Θ(1/√M).
Loss ordering awareness is essential for low regret in bounded-recall algorithms.
Abstract
We study the problem of full-information online learning in the "bounded recall" setting popular in the study of repeated games. An online learning algorithm is - if its output at time can be written as a function of the previous rewards (and not e.g. any other internal state of ). We first demonstrate that a natural approach to constructing bounded-recall algorithms from mean-based no-regret learning algorithms (e.g., running Hedge over the last rounds) fails, and that any such algorithm incurs constant regret per round. We then construct a stationary bounded-recall algorithm that achieves a per-round regret of , which we complement with a tight lower bound. Finally, we show that unlike the perfect recall setting, any low regret bound bounded-recall algorithm must be aware of the ordering of the past…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsAdvanced Bandit Algorithms Research · Machine Learning and Algorithms · Data Stream Mining Techniques
