Sequential Choice Bandits with Feedback for Personalizing users'   experience

Anshuka Rangi; Massimo Franceschetti; Long Tran-Thanh

arXiv:2101.01572·stat.ML·January 6, 2021

Sequential Choice Bandits with Feedback for Personalizing users' experience

Anshuka Rangi, Massimo Franceschetti, Long Tran-Thanh

PDF

Open Access

TL;DR

This paper introduces bandit algorithms for personalizing user experiences by learning user thresholds through sequential choices and feedback, aiming to maximize rewards while managing user patience and abandonment.

Contribution

It proposes novel algorithms for threshold learning in sequential choice bandits with feedback, providing regret bounds and analyzing user waiting times.

Findings

01

Regret bounds of order $ ilde{O}(N^{2/3})$ and $ ilde heta(N^{2/3})$ for the algorithms.

02

Algorithms effectively learn user thresholds to optimize platform rewards.

03

User waiting time before personalization is independent of total users $N$.

Abstract

In this work, we study sequential choice bandits with feedback. We propose bandit algorithms for a platform that personalizes users' experience to maximize its rewards. For each action directed to a given user, the platform is given a positive reward, which is a non-decreasing function of the action, if this action is below the user's threshold. Users are equipped with a patience budget, and actions that are above the threshold decrease the user's patience. When all patience is lost, the user abandons the platform. The platform attempts to learn the thresholds of the users in order to maximize its rewards, based on two different feedback models describing the information pattern available to the platform at each action. We define a notion of regret by determining the best action to be taken when the platform knows that the user's threshold is in a given interval. We then propose bandit…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsAdvanced Bandit Algorithms Research · Consumer Market Behavior and Pricing · Smart Grid Energy Management