Near-Optimal Regret for Efficient Stochastic Combinatorial Semi-Bandits
Zichun Ye, Runqi Wang, Xutong Liu, Shuai Li

TL;DR
This paper introduces CMOSS, a computationally efficient algorithm for stochastic combinatorial semi-bandits that achieves near-optimal regret bounds without the logarithmic dependence on time, outperforming existing methods.
Contribution
The paper proposes CMOSS, a novel algorithm that attains instance-independent regret bounds matching lower bounds and reduces computational complexity in stochastic semi-bandit problems.
Findings
CMOSS achieves regret bounds of $O( ( ext{log }k)\sqrt{kmT})$ and $O((m-k)\sqrt{ ext{log }k ext{log }(m-k)T})$.
CMOSS eliminates the $ ext{log }T$ dependence present in previous algorithms.
Experimental results show CMOSS outperforms benchmark algorithms in regret and runtime.
Abstract
The combinatorial multi-armed bandit (CMAB) is a cornerstone of sequential decision-making framework, dominated by two algorithmic families: UCB-based and adversarial methods such as follow the regularized leader (FTRL) and online mirror descent (OMD). However, prominent UCB-based approaches like CUCB suffer from additional regret factor that is detrimental over long horizons, while adversarial methods such as EXP3.M and HYBRID impose significant computational overhead. To resolve this trade-off, we introduce the Combinatorial Minimax Optimal Strategy in the Stochastic setting (CMOSS). CMOSS is a computationally efficient algorithm that achieves an instance-independent regret of when and when under semi-bandit feedback, where is the number of arms and is the…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsAdvanced Bandit Algorithms Research · Distributed Sensor Networks and Detection Algorithms · Machine Learning and ELM
