Exploiting Transitivity for Top-k Selection with Score-Based Dueling Bandits
Matthew Groves, Juergen Branke

TL;DR
This paper introduces a new approach for top-k subset selection in score-based dueling bandits by leveraging transitivity through a Thurstonian model, improving sampling efficiency in real-world ranking tasks.
Contribution
It extends existing dueling bandit models to incorporate quantitative score information using a Thurstonian model, enhancing the efficiency of top-k selection.
Findings
Thurstonian model improves sample efficiency over binary models.
Proposed method outperforms standard POCBAm in experiments.
Effective in real-world ranking scenarios.
Abstract
We consider the problem of top-k subset selection in Dueling Bandit problems with score information. Real-world pairwise ranking problems often exhibit a high degree of transitivity and prior work has suggested sampling methods that exploit such transitivity through the use of parametric preference models like the Bradley-Terry-Luce (BTL) and Thurstone models. To date, this work has focused on cases where sample outcomes are win/loss binary responses. We extend this to selection problems where sampling results contain quantitative information by proposing a Thurstonian style model and adapting the Pairwise Optimal Computing Budget Allocation for subset selection (POCBAm) sampling method to exploit this model for efficient sample selection. We compare the empirical performance against standard POCBAm and other competing algorithms.
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsAdvanced Bandit Algorithms Research · Machine Learning and Algorithms · Data Stream Mining Techniques
