Online Learning with Diverse User Preferences

Chao Gan; Jing Yang; Ruida Zhou; Cong Shen

arXiv:1901.07924·cs.LG·November 11, 2022

Online Learning with Diverse User Preferences

Chao Gan, Jing Yang, Ruida Zhou, Cong Shen

PDF

Open Access

TL;DR

This paper demonstrates that in a stochastic linear bandit setting with diverse user preferences, the regret can be reduced from logarithmic to constant by leveraging the diversity, with a proposed W-UCB algorithm achieving this under certain conditions.

Contribution

The paper introduces a novel analysis showing constant regret in linear bandits with diverse preferences and proposes the W-UCB algorithm to achieve this.

Findings

01

W-UCB achieves constant regret with diverse user preferences.

02

Diversity in user preferences accelerates convergence of arm estimates.

03

Performance validated with synthetic data.

Abstract

In this paper, we investigate the impact of diverse user preference on learning under the stochastic multi-armed bandit (MAB) framework. We aim to show that when the user preferences are sufficiently diverse and each arm can be optimal for certain users, the O(log T) regret incurred by exploring the sub-optimal arms under the standard stochastic MAB setting can be reduced to a constant. Our intuition is that to achieve sub-linear regret, the number of times an optimal arm being pulled should scale linearly in time; when all arms are optimal for certain users and pulled frequently, the estimated arm statistics can quickly converge to their true values, thus reducing the need of exploration dramatically. We cast the problem into a stochastic linear bandits model, where both the users preferences and the state of arms are modeled as {independent and identical distributed (i.i.d)}…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsAdvanced Bandit Algorithms Research · Distributed Sensor Networks and Detection Algorithms · Data Stream Mining Techniques