Stationary Mixing Bandits
Julien Audiffren (CMLA), Liva Ralaivola (LIF)

TL;DR
This paper investigates stationary phi-mixing bandits, proposing algorithms that balance exploration, exploitation, and independence, with regret analysis and adaptive block size estimation, including a restless case extension.
Contribution
It introduces a UCB strategy for phi-mixing bandits, analyzes regret with fixed and adaptive independence blocks, and extends to restless processes.
Findings
Proposed a UCB algorithm with regret bounds for fixed independence blocks.
Developed an adaptive method to estimate block size from data.
Extended analysis to restless phi-mixing bandit processes.
Abstract
We study the bandit problem where arms are associated with stationary phi-mixing processes and where rewards are therefore dependent: the question that arises from this setting is that of recovering some independence by ignoring the value of some rewards. As we shall see, the bandit problem we tackle requires us to address the exploration/exploitation/independence trade-off. To do so, we provide a UCB strategy together with a general regret analysis for the case where the size of the independence blocks (the ignored rewards) is fixed and we go a step beyond by providing an algorithm that is able to compute the size of the independence blocks from the data. Finally, we give an analysis of our bandit problem in the restless case, i.e., in the situation where the time counters for all mixing processes simultaneously evolve.
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsAdvanced Bandit Algorithms Research · Reinforcement Learning in Robotics · Receptor Mechanisms and Signaling
