Near-Optimal Regret for Efficient Stochastic Combinatorial Semi-Bandits

Zichun Ye; Runqi Wang; Xutong Liu; Shuai Li

arXiv:2508.06247·cs.LG·December 30, 2025

Near-Optimal Regret for Efficient Stochastic Combinatorial Semi-Bandits

Zichun Ye, Runqi Wang, Xutong Liu, Shuai Li

PDF

Open Access

TL;DR

This paper introduces CMOSS, a computationally efficient algorithm for stochastic combinatorial semi-bandits that achieves near-optimal regret bounds without the logarithmic dependence on time, outperforming existing methods.

Contribution

The paper proposes CMOSS, a novel algorithm that attains instance-independent regret bounds matching lower bounds and reduces computational complexity in stochastic semi-bandit problems.

Findings

01

CMOSS achieves regret bounds of $O( ( ext{log }k)\sqrt{kmT})$ and $O((m-k)\sqrt{ ext{log }k ext{log }(m-k)T})$.

02

CMOSS eliminates the $ ext{log }T$ dependence present in previous algorithms.

03

Experimental results show CMOSS outperforms benchmark algorithms in regret and runtime.

Abstract

The combinatorial multi-armed bandit (CMAB) is a cornerstone of sequential decision-making framework, dominated by two algorithmic families: UCB-based and adversarial methods such as follow the regularized leader (FTRL) and online mirror descent (OMD). However, prominent UCB-based approaches like CUCB suffer from additional regret factor $lo g T$ that is detrimental over long horizons, while adversarial methods such as EXP3.M and HYBRID impose significant computational overhead. To resolve this trade-off, we introduce the Combinatorial Minimax Optimal Strategy in the Stochastic setting (CMOSS). CMOSS is a computationally efficient algorithm that achieves an instance-independent regret of $O ((lo g k) k m T)$ when $k \leq \frac{m}{2}$ and $O ((m - k) lo g k lo g (m - k) T)$ when $k > \frac{m}{2}$ under semi-bandit feedback, where $m$ is the number of arms and $k$ is the…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsAdvanced Bandit Algorithms Research · Distributed Sensor Networks and Detection Algorithms · Machine Learning and ELM