Adversarial Combinatorial Semi-bandits with Graph Feedback

Yuxiao Wen

arXiv:2502.18826·cs.LG·September 17, 2025

Adversarial Combinatorial Semi-bandits with Graph Feedback

Yuxiao Wen

PDF

Open Access

TL;DR

This paper extends combinatorial semi-bandits to include graph feedback, deriving optimal regret bounds that interpolate between full information and semi-bandit feedback, with new technical insights on action realization and regret analysis.

Contribution

It introduces a framework for graph feedback in combinatorial semi-bandits, establishes tight regret bounds, and proposes novel action realization techniques.

Findings

01

Optimal regret scales as tildeTheta(Ssqrt{T}+sqrt{alpha ST})

02

Convexified action realization with negative correlations improves performance

03

OSMD with convexified actions in expectation is suboptimal

Abstract

In combinatorial semi-bandits, a learner repeatedly selects from a combinatorial decision set of arms, receives the realized sum of rewards, and observes the rewards of the individual selected arms as feedback. In this paper, we extend this framework to include \emph{graph feedback}, where the learner observes the rewards of all neighboring arms of the selected arms in a feedback graph $G$ . We establish that the optimal regret over a time horizon $T$ scales as $Θ (S T + α S T)$ , where $S$ is the size of the combinatorial decisions and $α$ is the independence number of $G$ . This result interpolates between the known regrets $Θ (S T)$ under full information (i.e., $G$ is complete) and $Θ (K S T)$ under the semi-bandit feedback (i.e., $G$ has only self-loops), where $K$ is the total number of arms. A key technical…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsAdvanced Bandit Algorithms Research · Machine Learning and Algorithms · Stochastic Gradient Optimization Techniques

MethodsSparse Evolutionary Training