Combinatorial Multivariant Multi-Armed Bandits with Applications to   Episodic Reinforcement Learning and Beyond

Xutong Liu; Siwei Wang; Jinhang Zuo; Han Zhong; Xuchuang Wang; Zhiyong; Wang; Shuai Li; Mohammad Hajiesmaili; John C.S. Lui; Wei Chen

arXiv:2406.01386·cs.LG·April 24, 2025·1 cites

Combinatorial Multivariant Multi-Armed Bandits with Applications to Episodic Reinforcement Learning and Beyond

Xutong Liu, Siwei Wang, Jinhang Zuo, Han Zhong, Xuchuang Wang, Zhiyong, Wang, Shuai Li, Mohammad Hajiesmaili, John C.S. Lui, Wei Chen

PDF

Open Access

TL;DR

This paper introduces a flexible combinatorial multivariant multi-armed bandit framework with probabilistic triggering, enabling improved modeling and regret bounds for applications like episodic reinforcement learning and probabilistic coverage.

Contribution

It proposes a new CMAB-MT framework with a multivariant triggering condition, connecting episodic RL with CMAB and providing a unified approach with enhanced theoretical guarantees.

Findings

01

Achieves matching or improved regret bounds for key applications.

02

Establishes the first connection between episodic RL and CMAB.

03

Introduces a general smoothness condition for multivariant arms.

Abstract

We introduce a novel framework of combinatorial multi-armed bandits (CMAB) with multivariant and probabilistically triggering arms (CMAB-MT), where the outcome of each arm is a $d$ -dimensional multivariant random variable and the feedback follows a general arm triggering process. Compared with existing CMAB works, CMAB-MT not only enhances the modeling power but also allows improved results by leveraging distinct statistical properties for multivariant random variables. For CMAB-MT, we propose a general 1-norm multivariant and triggering probability-modulated smoothness condition, and an optimistic CUCB-MT algorithm built upon this condition. Our framework can include many important problems as applications, such as episodic reinforcement learning (RL) and probabilistic maximum coverage for goods distribution, all of which meet the above smoothness condition and achieve matching or…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsAdvanced Bandit Algorithms Research · Reinforcement Learning in Robotics · Data Stream Mining Techniques