Adversarial Combinatorial Semi-bandits with Graph Feedback
Yuxiao Wen

TL;DR
This paper extends combinatorial semi-bandits to include graph feedback, deriving optimal regret bounds that interpolate between full information and semi-bandit feedback, with new technical insights on action realization and regret analysis.
Contribution
It introduces a framework for graph feedback in combinatorial semi-bandits, establishes tight regret bounds, and proposes novel action realization techniques.
Findings
Optimal regret scales as tildeTheta(Ssqrt{T}+sqrt{alpha ST})
Convexified action realization with negative correlations improves performance
OSMD with convexified actions in expectation is suboptimal
Abstract
In combinatorial semi-bandits, a learner repeatedly selects from a combinatorial decision set of arms, receives the realized sum of rewards, and observes the rewards of the individual selected arms as feedback. In this paper, we extend this framework to include \emph{graph feedback}, where the learner observes the rewards of all neighboring arms of the selected arms in a feedback graph . We establish that the optimal regret over a time horizon scales as , where is the size of the combinatorial decisions and is the independence number of . This result interpolates between the known regrets under full information (i.e., is complete) and under the semi-bandit feedback (i.e., has only self-loops), where is the total number of arms. A key technical…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsAdvanced Bandit Algorithms Research · Machine Learning and Algorithms · Stochastic Gradient Optimization Techniques
MethodsSparse Evolutionary Training
