An LP-based Sampling Policy for Multi-Armed Bandits with Side-Observations and Stochastic Availability
Ashutosh Soni, Peizhong Ju, Atilla Eryilmaz, Ness B. Shroff

TL;DR
This paper introduces UCB-LP-A, a novel LP-based policy for stochastic multi-armed bandits with side-observations and dynamic availability, effectively balancing exploration and exploitation in volatile environments.
Contribution
It develops a new LP-based sampling policy that accounts for stochastic availability and network structure, improving decision-making in complex bandit settings.
Findings
UCB-LP-A outperforms existing heuristics in simulations.
Theoretical regret bounds incorporate network and availability factors.
Policy efficiently manages exploration with dynamic action sets.
Abstract
We study the stochastic multi-armed bandit (MAB) problem where an underlying network structure enables side-observations across related actions. We use a bipartite graph to link actions to a set of unknowns, such that selecting an action reveals observations for all the unknowns it is connected to. While previous works rely on the assumption that all actions are permanently accessible, we investigate the more practical setting of stochastic availability, where the set of feasible actions (the "activation set") varies dynamically in each round. This framework models real-world systems with both structural dependencies and volatility, such as social networks where users provide side-information about their peers' preferences, yet are not always online to be queried. To address this challenge, we propose UCB-LP-A, a novel policy that leverages a Linear Programming (LP) approach to optimize…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
