
TL;DR
This paper introduces a new multi-armed bandit framework using credal sets to model uncertainty, extending traditional reward-based models and providing algorithms with regret bounds for this more general setting.
Contribution
It proposes a novel credal set-based bandit framework, defines a new regret measure, and develops algorithms with theoretical regret bounds for this setting.
Findings
Proposed a credal set-based multi-armed bandit model.
Developed algorithms with proven upper regret bounds.
Established lower bounds for specific cases.
Abstract
We introduce a novel multi-armed bandit framework, where each arm is associated with a fixed unknown credal set over the space of outcomes (which can be richer than just the reward). The arm-to-credal-set correspondence comes from a known class of hypotheses. We then define a notion of regret corresponding to the lower prevision defined by these credal sets. Equivalently, the setting can be regarded as a two-player zero-sum game, where, on each round, the agent chooses an arm and the adversary chooses the distribution over outcomes from a set of options associated with this arm. The regret is defined with respect to the value of game. For certain natural hypothesis classes, loosely analgous to stochastic linear bandits (which are a special case of the resulting setting), we propose an algorithm and prove a corresponding upper bound on regret. We also prove lower bounds on regret for…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsAdvanced Bandit Algorithms Research · Data Stream Mining Techniques · Smart Grid Energy Management
MethodsSparse Evolutionary Training
