Bandits with Stochastic Experts: Constant Regret, Empirical Experts and   Episodes

Nihal Sharma; Rajat Sen; Soumya Basu; Karthikeyan Shanmugam; Sanjay; Shakkottai

arXiv:2107.03263·cs.LG·October 29, 2024

Bandits with Stochastic Experts: Constant Regret, Empirical Experts and Episodes

Nihal Sharma, Rajat Sen, Soumya Basu, Karthikeyan Shanmugam, Sanjay, Shakkottai

PDF

Open Access

TL;DR

This paper introduces algorithms for a variant of the contextual bandit problem involving stochastic experts, achieving constant regret bounds and effective performance in episodic environments with changing contexts and rewards.

Contribution

The paper proposes the D-UCB and ED-UCB algorithms that leverage importance sampling for expert sharing and handle approximate knowledge, with proven regret bounds in both fixed and episodic settings.

Findings

01

D-UCB achieves horizon-independent constant regret.

02

ED-UCB performs well with approximate expert knowledge.

03

Regret scales as O(E(N+1) + N√E/T²) over episodes.

Abstract

We study a variant of the contextual bandit problem where an agent can intervene through a set of stochastic expert policies. Given a fixed context, each expert samples actions from a fixed conditional distribution. The agent seeks to remain competitive with the 'best' among the given set of experts. We propose the Divergence-based Upper Confidence Bound (D-UCB) algorithm that uses importance sampling to share information across experts and provide horizon-independent constant regret bounds that only scale linearly in the number of experts. We also provide the Empirical D-UCB (ED-UCB) algorithm that can function with only approximate knowledge of expert distributions. Further, we investigate the episodic setting where the agent interacts with an environment that changes over episodes. Each episode can have different context and reward distributions resulting in the best expert changing…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsAdvanced Bandit Algorithms Research · Auction Theory and Applications · Mobile Crowdsensing and Crowdsourcing