Combinatorial Multi-Armed Bandits with Filtered Feedback
James A. Grant, David S. Leslie, Kevin Glazebrook, Roberto Szechtman

TL;DR
This paper introduces a new algorithm for combinatorial multi-armed bandits with filtered semibandit feedback, handling heavy-tailed rewards and providing theoretical regret bounds for search and detection applications.
Contribution
It proposes Robust-F-CUCB, an upper confidence bound algorithm tailored for filtered feedback and heavy-tailed rewards in CMAB problems, with proven logarithmic regret bounds.
Findings
Algorithm achieves near-optimal regret bounds.
Handles heavy-tailed reward distributions effectively.
Applicable to search and detection scenarios with filtered feedback.
Abstract
Motivated by problems in search and detection we present a solution to a Combinatorial Multi-Armed Bandit (CMAB) problem with both heavy-tailed reward distributions and a new class of feedback, filtered semibandit feedback. In a CMAB problem an agent pulls a combination of arms from a set in each round, generating random outcomes from probability distributions associated with these arms and receiving an overall reward. Under semibandit feedback it is assumed that the random outcomes generated are all observed. Filtered semibandit feedback allows the outcomes that are observed to be sampled from a second distribution conditioned on the initial random outcomes. This feedback mechanism is valuable as it allows CMAB methods to be applied to sequential search and detection problems where combinatorial actions are made, but the true rewards (number of objects of interest…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsAdvanced Bandit Algorithms Research · Machine Learning and Algorithms · Optimization and Search Problems
