Preference is More Than Comparisons: Rethinking Dueling Bandits with Augmented Human Feedback

Shengbo Wang; Hong Sun; Ke Li

arXiv:2511.09047·cs.LG·November 13, 2025

Preference is More Than Comparisons: Rethinking Dueling Bandits with Augmented Human Feedback

Shengbo Wang, Hong Sun, Ke Li

PDF

Open Access

TL;DR

This paper proposes a model-free dueling bandit approach with augmented human feedback, improving efficiency in preference elicitation tasks where human feedback is sparse, and demonstrates competitive results across various benchmarks.

Contribution

It introduces augmented confidence bounds for integrating human feedback in a model-free dueling bandit framework, enhancing efficiency and robustness in preference elicitation.

Findings

01

Achieves competitive performance on IPE benchmarks

02

Effectively handles sparse human feedback

03

Provides theoretical regret analysis for the proposed method

Abstract

Interactive preference elicitation (IPE) aims to substantially reduce human effort while acquiring human preferences in wide personalization systems. Dueling bandit (DB) algorithms enable optimal decision-making in IPE building on pairwise comparisons. However, they remain inefficient when human feedback is sparse. Existing methods address sparsity by heavily relying on parametric reward models, whose rigid assumptions are vulnerable to misspecification. In contrast, we explore an alternative perspective based on feedback augmentation, and introduce critical improvements to the model-free DB framework. Specifically, we introduce augmented confidence bounds to integrate augmented human feedback under generalized concentration properties, and analyze the multi-factored performance trade-off via regret analysis. Our prototype algorithm achieves competitive performance across several IPE…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsAdvanced Bandit Algorithms Research · Recommender Systems and Techniques · Mobile Crowdsensing and Crowdsourcing