Offline Planning and Online Learning under Recovering Rewards

David Simchi-Levi; Zeyu Zheng; Feng Zhu

arXiv:2106.14813·stat.ML·December 23, 2021·1 cites

Offline Planning and Online Learning under Recovering Rewards

David Simchi-Levi, Zeyu Zheng, Feng Zhu

PDF

Open Access

TL;DR

This paper introduces a novel class of non-stationary multi-armed bandit problems with recovering rewards, proposing periodic policies for offline and online settings, achieving near-optimal performance guarantees.

Contribution

The paper develops a unified framework for offline planning and online learning in non-stationary bandits with recovering rewards, including new policies with proven performance bounds.

Findings

01

Offline policy achieves near-optimal approximation ratio of 1 - O(1/√K).

02

Online policy attains regret of approximately Õ(N√T).

03

Framework extends to broader applications with non-stationary, recovering rewards.

Abstract

Motivated by emerging applications such as live-streaming e-commerce, promotions and recommendations, we introduce and solve a general class of non-stationary multi-armed bandit problems that have the following two features: (i) the decision maker can pull and collect rewards from up to $K (\geq 1)$ out of $N$ different arms in each time period; (ii) the expected reward of an arm immediately drops after it is pulled, and then non-parametrically recovers as the arm's idle time increases. With the objective of maximizing the expected cumulative reward over $T$ time periods, we design a class of ``Purely Periodic Policies'' that jointly set a period to pull each arm. For the proposed policies, we prove performance guarantees for both the offline problem and the online problems. For the offline problem when all model parameters are known, the proposed periodic policy obtains an…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsAdvanced Bandit Algorithms Research · Optimization and Search Problems · Age of Information Optimization