Contextual Combinatorial Multi-output GP Bandits with Group Constraints

Sepehr Elahi; Baran Atalar; Sevda \"O\u{g}\"ut; Cem Tekin

arXiv:2111.14778·cs.LG·July 11, 2023·1 cites

Contextual Combinatorial Multi-output GP Bandits with Group Constraints

Sepehr Elahi, Baran Atalar, Sevda \"O\u{g}\"ut, Cem Tekin

PDF

Open Access

TL;DR

This paper introduces a novel Gaussian process bandit algorithm for federated multi-armed bandit problems with group constraints, balancing reward maximization and privacy, and demonstrates its superior performance over existing methods.

Contribution

The paper proposes TCGP-UCB, a new double-UCB Gaussian process algorithm for combinatorial bandits with group constraints, including a new regret measure and theoretical guarantees.

Findings

01

TCGP-UCB outperforms non-GP algorithms in experiments.

02

The algorithm effectively balances reward maximization and constraint satisfaction.

03

Theoretical regret bounds are established for the proposed method.

Abstract

In federated multi-armed bandit problems, maximizing global reward while satisfying minimum privacy requirements to protect clients is the main goal. To formulate such problems, we consider a combinatorial contextual bandit setting with groups and changing action sets, where similar base arms arrive in groups and a set of base arms, called a super arm, must be chosen in each round to maximize super arm reward while satisfying the constraints of the rewards of groups from which base arms were chosen. To allow for greater flexibility, we let each base arm have two outcomes, modeled as the output of a two-output Gaussian process (GP), where one outcome is used to compute super arm reward and the other for group reward. We then propose a novel double-UCB GP-bandit algorithm, called Thresholded Combinatorial Gaussian Process Upper Confidence Bounds (TCGP-UCB), which balances between…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsAdvanced Bandit Algorithms Research · Misinformation and Its Impacts

MethodsBalanced Selection · Gaussian Process