Simultaneously Learning Stochastic and Adversarial Bandits under the   Position-Based Model

Cheng Chen; Canzhe Zhao; Shuai Li

arXiv:2207.05437·cs.LG·July 13, 2022

Simultaneously Learning Stochastic and Adversarial Bandits under the Position-Based Model

Cheng Chen, Canzhe Zhao, Shuai Li

PDF

Open Access 1 Video

TL;DR

This paper introduces a novel online learning to rank algorithm that effectively handles both stochastic and adversarial environments under the position-based click model, with proven regret bounds and competitive empirical performance.

Contribution

It develops a unified FTRL-based algorithm with Tsallis entropy for OLTR under PBM, achieving optimal regret bounds in both stochastic and adversarial settings.

Findings

01

Achieves $O( ext{log}T)$ regret in stochastic environment.

02

Achieves $O(m ext{sqrt}(nT))$ regret in adversarial environment.

03

Matches the lower bound for adversarial PBM, improving prior results.

Abstract

Online learning to rank (OLTR) interactively learns to choose lists of items from a large collection based on certain click models that describe users' click behaviors. Most recent works for this problem focus on the stochastic environment where the item attractiveness is assumed to be invariant during the learning process. In many real-world scenarios, however, the environment could be dynamic or even arbitrarily changing. This work studies the OLTR problem in both stochastic and adversarial environments under the position-based model (PBM). We propose a method based on the follow-the-regularized-leader (FTRL) framework with Tsallis entropy and develop a new self-bounding constraint especially designed for PBM. We prove the proposed algorithm simultaneously achieves $O (lo g T)$ regret in the stochastic environment and $O (m n T)$ regret in the adversarial environment, where $T$ is…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

Simultaneously Learning Stochastic and Adversarial Bandits under the Position-Based Model· underline

Taxonomy

TopicsAdvanced Bandit Algorithms Research · Machine Learning and Algorithms · Data Stream Mining Techniques