An Information-Theoretic Analysis of Nonstationary Bandit Learning

Seungki Min; Daniel Russo

arXiv:2302.04452·cs.LG·December 27, 2023·1 cites

An Information-Theoretic Analysis of Nonstationary Bandit Learning

Seungki Min, Daniel Russo

PDF

Open Access 1 Video

TL;DR

This paper applies an information-theoretic framework to analyze nonstationary bandit problems, establishing bounds on regret based on the entropy rate of the optimal action process, thus linking information structure to learning performance.

Contribution

It introduces an entropy-based analysis of nonstationary bandit learning, providing general regret bounds that incorporate the environment's information complexity.

Findings

01

Bound on per-period regret in terms of entropy rate

02

Applicable to various nonstationary bandit problems

03

Highlights the role of information structure in learning efficiency

Abstract

In nonstationary bandit learning problems, the decision-maker must continually gather information and adapt their action selection as the latent state of the environment evolves. In each time period, some latent optimal action maximizes expected reward under the environment state. We view the optimal action sequence as a stochastic process, and take an information-theoretic approach to analyze attainable performance. We bound limiting per-period regret in terms of the entropy rate of the optimal action process. The bound applies to a wide array of problems studied in the literature and reflects the problem's information structure through its information-ratio.

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

An Information-Theoretic Analysis of Nonstationary Bandit Learning· slideslive

Taxonomy

TopicsAdvanced Bandit Algorithms Research · Auction Theory and Applications · Data Stream Mining Techniques