An Information-Theoretic Analysis of Nonstationary Bandit Learning
Seungki Min, Daniel Russo

TL;DR
This paper applies an information-theoretic framework to analyze nonstationary bandit problems, establishing bounds on regret based on the entropy rate of the optimal action process, thus linking information structure to learning performance.
Contribution
It introduces an entropy-based analysis of nonstationary bandit learning, providing general regret bounds that incorporate the environment's information complexity.
Findings
Bound on per-period regret in terms of entropy rate
Applicable to various nonstationary bandit problems
Highlights the role of information structure in learning efficiency
Abstract
In nonstationary bandit learning problems, the decision-maker must continually gather information and adapt their action selection as the latent state of the environment evolves. In each time period, some latent optimal action maximizes expected reward under the environment state. We view the optimal action sequence as a stochastic process, and take an information-theoretic approach to analyze attainable performance. We bound limiting per-period regret in terms of the entropy rate of the optimal action process. The bound applies to a wide array of problems studied in the literature and reflects the problem's information structure through its information-ratio.
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
Taxonomy
TopicsAdvanced Bandit Algorithms Research · Auction Theory and Applications · Data Stream Mining Techniques
