Distributed Stochastic Bandit Learning with Delayed Context Observation

Jiabin Lin; Shana Moothedath

arXiv:2207.14391·cs.LG·November 16, 2022·1 cites

Distributed Stochastic Bandit Learning with Delayed Context Observation

Jiabin Lin, Shana Moothedath

PDF

Open Access

TL;DR

This paper introduces a distributed UCB-based algorithm for stochastic contextual bandits with delayed context observation, addressing real-world scenarios where context is only available after rewards are observed, and provides theoretical regret bounds and empirical validation.

Contribution

The paper proposes a novel distributed algorithm for delayed-context stochastic bandits with theoretical regret and communication bounds, validated on synthetic and real data.

Findings

01

Regret bounds are established for the proposed algorithm.

02

Algorithm performs well on synthetic and Movielens data.

03

Effective handling of delayed context observation in distributed settings.

Abstract

We consider the problem where M agents collaboratively interact with an instance of a stochastic K-armed contextual bandit, where K>>M. The goal of the agents is to simultaneously minimize the cumulative regret over all the agents over a time horizon T. We consider a setting where the exact context is observed after a delay and at the time of choosing the action the agents are unaware of the context and only a distribution on the set of contexts is available. Such a situation arises in different applications where at the time of the decision the context needs to be predicted (e.g., weather forecasting or stock market prediction), and the context can be estimated once the reward is obtained. We propose an Upper Confidence Bound (UCB)-based distributed algorithm and prove the regret and communications bounds for linearly parametrized reward functions. We validated the performance of our…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsAdvanced Bandit Algorithms Research · Distributed Sensor Networks and Detection Algorithms · Data Stream Mining Techniques