Bayesian Optimization from Human Feedback: Near-Optimal Regret Bounds

Aya Kayal; Sattar Vakili; Laura Toni; Da-shan Shiu; Alberto Bernacchia

arXiv:2505.23673·cs.LG·May 30, 2025

Bayesian Optimization from Human Feedback: Near-Optimal Regret Bounds

Aya Kayal, Sattar Vakili, Laura Toni, Da-shan Shiu, Alberto Bernacchia

PDF

Open Access 1 Video

TL;DR

This paper develops tighter theoretical regret bounds for Bayesian optimization using human preference feedback, showing that similar sample complexities to traditional methods can be achieved with less informative feedback.

Contribution

It introduces improved regret bounds for BOHF under the BTL model, matching the sample efficiency of conventional BO with richer feedback.

Findings

01

Derived regret bounds of (\u007f(\Gamma(T)T)

02

Showed order-optimal sample complexity for common kernels

03

Achieved near-optimal performance with limited preference feedback

Abstract

Bayesian optimization (BO) with preference-based feedback has recently garnered significant attention due to its emerging applications. We refer to this problem as Bayesian Optimization from Human Feedback (BOHF), which differs from conventional BO by learning the best actions from a reduced feedback model, where only the preference between two actions is revealed to the learner at each time step. The objective is to identify the best action using a limited number of preference queries, typically obtained through costly human feedback. Existing work, which adopts the Bradley-Terry-Luce (BTL) feedback model, provides regret bounds for the performance of several algorithms. In this work, within the same framework we develop tighter performance guarantees. Specifically, we derive regret bounds of $\tilde{O} (Γ (T) T)$ , where $Γ (T)$ represents the maximum information…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

Bayesian Optimization from Human Feedback: Near-Optimal Regret Bounds· slideslive

Taxonomy

TopicsAdvanced Bandit Algorithms Research · Stochastic Gradient Optimization Techniques · Gaussian Processes and Bayesian Inference

MethodsSoftmax · Attention Is All You Need