Offline Learning for Combinatorial Multi-armed Bandits

Xutong Liu; Xiangxiang Dai; Jinhang Zuo; Siwei Wang; Carlee Joe-Wong; John C.S. Lui; Wei Chen

arXiv:2501.19300·cs.LG·May 30, 2025

Offline Learning for Combinatorial Multi-armed Bandits

Xutong Liu, Xiangxiang Dai, Jinhang Zuo, Siwei Wang, Carlee Joe-Wong, John C.S. Lui, Wei Chen

PDF

Open Access 1 Video

TL;DR

This paper introduces Off-CMAB, a novel offline learning framework for combinatorial multi-armed bandits, addressing the costs of online interaction and leveraging offline data with theoretical guarantees and practical applications.

Contribution

It proposes the CLCB algorithm and data coverage conditions, providing the first offline learning approach for CMAB with near-optimal theoretical guarantees.

Findings

01

CLCB achieves near-optimal suboptimality gap under data coverage conditions.

02

Off-CMAB effectively handles nonlinear rewards and out-of-distribution actions.

03

Experimental results demonstrate superior performance on synthetic and real datasets.

Abstract

The combinatorial multi-armed bandit (CMAB) is a fundamental sequential decision-making framework, extensively studied over the past decade. However, existing work primarily focuses on the online setting, overlooking the substantial costs of online interactions and the readily available offline datasets. To overcome these limitations, we introduce Off-CMAB, the first offline learning framework for CMAB. Central to our framework is the combinatorial lower confidence bound (CLCB) algorithm, which combines pessimistic reward estimations with combinatorial solvers. To characterize the quality of offline datasets, we propose two novel data coverage conditions and prove that, under these conditions, CLCB achieves a near-optimal suboptimality gap, matching the theoretical lower bound up to a logarithmic factor. We validate Off-CMAB through practical applications, including learning to rank,…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

Offline Learning for Combinatorial Multi-armed Bandits· slideslive

Taxonomy

TopicsAdvanced Bandit Algorithms Research · Machine Learning and Algorithms · Anomaly Detection Techniques and Applications