Exploiting Transitivity for Top-k Selection with Score-Based Dueling   Bandits

Matthew Groves; Juergen Branke

arXiv:2012.15637·cs.LG·January 1, 2021

Exploiting Transitivity for Top-k Selection with Score-Based Dueling Bandits

Matthew Groves, Juergen Branke

PDF

Open Access

TL;DR

This paper introduces a new approach for top-k subset selection in score-based dueling bandits by leveraging transitivity through a Thurstonian model, improving sampling efficiency in real-world ranking tasks.

Contribution

It extends existing dueling bandit models to incorporate quantitative score information using a Thurstonian model, enhancing the efficiency of top-k selection.

Findings

01

Thurstonian model improves sample efficiency over binary models.

02

Proposed method outperforms standard POCBAm in experiments.

03

Effective in real-world ranking scenarios.

Abstract

We consider the problem of top-k subset selection in Dueling Bandit problems with score information. Real-world pairwise ranking problems often exhibit a high degree of transitivity and prior work has suggested sampling methods that exploit such transitivity through the use of parametric preference models like the Bradley-Terry-Luce (BTL) and Thurstone models. To date, this work has focused on cases where sample outcomes are win/loss binary responses. We extend this to selection problems where sampling results contain quantitative information by proposing a Thurstonian style model and adapting the Pairwise Optimal Computing Budget Allocation for subset selection (POCBAm) sampling method to exploit this model for efficient sample selection. We compare the empirical performance against standard POCBAm and other competing algorithms.

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsAdvanced Bandit Algorithms Research · Machine Learning and Algorithms · Data Stream Mining Techniques