Partition Tree Weighting for Non-Stationary Stochastic Bandits

Joel Veness; Marcus Hutter; Andras Gyorgy; Jordi Grau-Moya

arXiv:2502.19325·cs.LG·February 27, 2025

Partition Tree Weighting for Non-Stationary Stochastic Bandits

Joel Veness, Marcus Hutter, Andras Gyorgy, Jordi Grau-Moya

PDF

Open Access

TL;DR

This paper introduces a novel algorithm for non-stationary stochastic Bernoulli bandits that extends universal source coding techniques to control settings, effectively handling interleaved actions and observations.

Contribution

It generalizes the Partition Tree Weighting method from passive prediction to active control in non-stationary bandit problems, addressing the challenge of action-observation interleaving.

Findings

01

Algorithm achieves high performance in non-stationary environments.

02

Effectively handles the self-delusion problem in universal coding.

03

Demonstrates improved control in stochastic Bernoulli bandits.

Abstract

This paper considers a generalisation of universal source coding for interaction data, namely data streams that have actions interleaved with observations. Our goal will be to construct a coding distribution that is both universal \emph{and} can be used as a control policy. Allowing for action generation needs careful treatment, as naive approaches which do not distinguish between actions and observations run into the self-delusion problem in universal settings. We showcase our perspective in the context of the challenging non-stationary stochastic Bernoulli bandit problem. Our main contribution is an efficient and high performing algorithm for this problem that generalises the Partition Tree Weighting universal source coding technique for passive prediction to the control setting.

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsAdvanced Bandit Algorithms Research · Stochastic Gradient Optimization Techniques · Data Stream Mining Techniques