Nearly Tight Bounds for Cross-Learning Contextual Bandits with Graphical Feedback
Ruiyuan Huang, Zengfeng Huang

TL;DR
This paper introduces an algorithm for cross-learning contextual bandits with graphical feedback that achieves near-optimal regret bounds, independent of the number of contexts, in stochastic settings and even addresses the adversarial case.
Contribution
It provides the first algorithm achieving minimax regret bounds of ( T) for stochastic cross-learning contextual bandits with graphical feedback, resolving an open theoretical question.
Findings
Achieves ( T) regret bound independent of context count.
Addresses both stochastic and adversarial feedback models.
Closes a key theoretical gap in bandit feedback structures.
Abstract
The cross-learning contextual bandit problem with graphical feedback has recently attracted significant attention. In this setting, there is a contextual bandit with a feedback graph over the arms, and pulling an arm reveals the loss for all neighboring arms in the feedback graph across all contexts. Initially proposed by Han et al. (2024), this problem has broad applications in areas such as bidding in first price auctions, and explores a novel frontier in the feedback structure of bandit problems. A key theoretical question is whether an algorithm with regret exists, where represents the independence number of the feedback graph. This question is particularly interesting because it concerns whether an algorithm can achieve a regret bound entirely independent of the number of contexts and matching the minimax regret of vanilla graphical…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsAdvanced Bandit Algorithms Research · Data Stream Mining Techniques · Smart Grid Energy Management
