Oracle-Efficient Combinatorial Semi-Bandits
Jung-hun Kim, Milan Vojnovi\'c, Min-hwan Oh

TL;DR
This paper introduces oracle-efficient algorithms for combinatorial semi-bandits that drastically reduce oracle calls from linear to logarithmic in time while maintaining optimal regret bounds.
Contribution
It presents novel algorithms that significantly lower the number of oracle queries needed in combinatorial semi-bandit problems, with tight regret guarantees.
Findings
Achieves $ ilde{O}( oot{2}{T})$ regret with $O(\log\log T)$ oracle calls.
Develops covariance-adaptive algorithms for better regret in structured noise settings.
Extends methods to handle general non-linear reward functions.
Abstract
We study the combinatorial semi-bandit problem where an agent selects a subset of base arms and receives individual feedback. While this generalizes the classical multi-armed bandit and has broad applicability, its scalability is limited by the high cost of combinatorial optimization, requiring oracle queries at every round. To tackle this, we propose oracle-efficient frameworks that significantly reduce oracle calls while maintaining tight regret guarantees. For the worst-case linear reward setting, our algorithms achieve regret using only oracle queries. We also propose covariance-adaptive algorithms that leverage noise structure for improved regret, and extend our approach to general (non-linear) rewards. Overall, our methods reduce oracle usage from linear to (doubly) logarithmic in time, with strong theoretical guarantees.
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
Taxonomy
TopicsAdvanced Bandit Algorithms Research · Optimization and Search Problems · Stochastic Gradient Optimization Techniques
