Gaussian Process Bandits with Aggregated Feedback

Mengyan Zhang; Russell Tsuchida; Cheng Soon Ong

arXiv:2112.13029·cs.LG·December 28, 2021

Gaussian Process Bandits with Aggregated Feedback

Mengyan Zhang, Russell Tsuchida, Cheng Soon Ong

PDF

Open Access 1 Repo 1 Video

TL;DR

This paper introduces GPOO, an algorithm for continuum-armed bandits with aggregated feedback, leveraging Gaussian Processes to optimize recommendations under limited reward information.

Contribution

It proposes a novel algorithm for bandit optimization with aggregated feedback, extending Gaussian Process methods to this new setting.

Findings

01

GPOO effectively optimizes with aggregated feedback in simulations.

02

Theoretical analysis supports the algorithm's regret bounds.

03

GPOO generalizes single-point feedback scenarios.

Abstract

We consider the continuum-armed bandits problem, under a novel setting of recommending the best arms within a fixed budget under aggregated feedback. This is motivated by applications where the precise rewards are impossible or expensive to obtain, while an aggregated reward or feedback, such as the average over a subset, is available. We constrain the set of reward functions by assuming that they are from a Gaussian Process and propose the Gaussian Process Optimistic Optimisation (GPOO) algorithm. We adaptively construct a tree with nodes as subsets of the arm space, where the feedback is the aggregated reward of representatives of a node. We propose a new simple regret notion with respect to aggregated feedback on the recommended arms. We provide theoretical analysis for the proposed algorithm, and recover single point feedback as a special case. We illustrate GPOO and compare it with…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

Mengyanz/Mengyanz.github.io
none

Videos

Gaussian Process Bandits with Aggregated Feedback· underline

Taxonomy

TopicsAdvanced Bandit Algorithms Research · Advanced Multi-Objective Optimization Algorithms · Gaussian Processes and Bayesian Inference

MethodsGaussian Process