Partition Tree Weighting for Non-Stationary Stochastic Bandits
Joel Veness, Marcus Hutter, Andras Gyorgy, Jordi Grau-Moya

TL;DR
This paper introduces a novel algorithm for non-stationary stochastic Bernoulli bandits that extends universal source coding techniques to control settings, effectively handling interleaved actions and observations.
Contribution
It generalizes the Partition Tree Weighting method from passive prediction to active control in non-stationary bandit problems, addressing the challenge of action-observation interleaving.
Findings
Algorithm achieves high performance in non-stationary environments.
Effectively handles the self-delusion problem in universal coding.
Demonstrates improved control in stochastic Bernoulli bandits.
Abstract
This paper considers a generalisation of universal source coding for interaction data, namely data streams that have actions interleaved with observations. Our goal will be to construct a coding distribution that is both universal \emph{and} can be used as a control policy. Allowing for action generation needs careful treatment, as naive approaches which do not distinguish between actions and observations run into the self-delusion problem in universal settings. We showcase our perspective in the context of the challenging non-stationary stochastic Bernoulli bandit problem. Our main contribution is an efficient and high performing algorithm for this problem that generalises the Partition Tree Weighting universal source coding technique for passive prediction to the control setting.
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsAdvanced Bandit Algorithms Research · Stochastic Gradient Optimization Techniques · Data Stream Mining Techniques
