Preference is More Than Comparisons: Rethinking Dueling Bandits with Augmented Human Feedback
Shengbo Wang, Hong Sun, Ke Li

TL;DR
This paper proposes a model-free dueling bandit approach with augmented human feedback, improving efficiency in preference elicitation tasks where human feedback is sparse, and demonstrates competitive results across various benchmarks.
Contribution
It introduces augmented confidence bounds for integrating human feedback in a model-free dueling bandit framework, enhancing efficiency and robustness in preference elicitation.
Findings
Achieves competitive performance on IPE benchmarks
Effectively handles sparse human feedback
Provides theoretical regret analysis for the proposed method
Abstract
Interactive preference elicitation (IPE) aims to substantially reduce human effort while acquiring human preferences in wide personalization systems. Dueling bandit (DB) algorithms enable optimal decision-making in IPE building on pairwise comparisons. However, they remain inefficient when human feedback is sparse. Existing methods address sparsity by heavily relying on parametric reward models, whose rigid assumptions are vulnerable to misspecification. In contrast, we explore an alternative perspective based on feedback augmentation, and introduce critical improvements to the model-free DB framework. Specifically, we introduce augmented confidence bounds to integrate augmented human feedback under generalized concentration properties, and analyze the multi-factored performance trade-off via regret analysis. Our prototype algorithm achieves competitive performance across several IPE…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsAdvanced Bandit Algorithms Research · Recommender Systems and Techniques · Mobile Crowdsensing and Crowdsourcing
