On Learning to Rank Long Sequences with Contextual Bandits

Anirban Santara; Claudio Gentile; Gaurav Aggarwal; Shuai Li

arXiv:2106.03546·cs.LG·September 5, 2022

On Learning to Rank Long Sequences with Contextual Bandits

Anirban Santara, Claudio Gentile, Gaurav Aggarwal, Shuai Li

PDF

Open Access

TL;DR

This paper introduces a new model for learning to rank long sequences using contextual bandits, providing theoretical guarantees and demonstrating improved empirical performance on real datasets.

Contribution

It proposes a novel cascading bandit variant for long sequences, with new algorithms and tight regret bounds, advancing the state-of-the-art in sequence ranking.

Findings

01

Tighter regret bounds than previous models.

02

Significant empirical improvements on real-world datasets.

03

Effective handling of variable-length sequences.

Abstract

Motivated by problems of learning to rank long item sequences, we introduce a variant of the cascading bandit model that considers flexible length sequences with varying rewards and losses. We formulate two generative models for this problem within the generalized linear setting, and design and analyze upper confidence algorithms for it. Our analysis delivers tight regret bounds which, when specialized to vanilla cascading bandits, results in sharper guarantees than previously available in the literature. We evaluate our algorithms on a number of real-world datasets, and show significantly improved empirical performance as compared to known cascading bandit baselines.

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsAdvanced Bandit Algorithms Research · Reinforcement Learning in Robotics · Machine Learning and Algorithms