Exploration, Exploitation, and Engagement in Multi-Armed Bandits with   Abandonment

Zixian Yang; Xin Liu; Lei Ying

arXiv:2205.13566·cs.LG·May 30, 2022

Exploration, Exploitation, and Engagement in Multi-Armed Bandits with Abandonment

Zixian Yang, Xin Liu, Lei Ying

PDF

Open Access 1 Video

TL;DR

This paper introduces a new multi-armed bandit model incorporating user abandonment based on engagement, proposing algorithms with proven logarithmic regret and demonstrating superior performance over traditional methods.

Contribution

The paper develops the MAB-A model accounting for user abandonment and proposes ULCB and KL-ULCB algorithms with theoretical regret bounds and practical improvements.

Findings

01

Both algorithms achieve $O( ext{log} K)$ regret.

02

KL-ULCB's regret bound is asymptotically sharp.

03

Algorithms outperform traditional UCB, KL-UCB, and Q-learning in simulations.

Abstract

Multi-armed bandit (MAB) is a classic model for understanding the exploration-exploitation trade-off. The traditional MAB model for recommendation systems assumes the user stays in the system for the entire learning horizon. In new online education platforms such as ALEKS or new video recommendation systems such as TikTok and YouTube Shorts, the amount of time a user spends on the app depends on how engaging the recommended contents are. Users may temporarily leave the system if the recommended items cannot engage the users. To understand the exploration, exploitation, and engagement in these systems, we propose a new model, called MAB-A where "A" stands for abandonment and the abandonment probability depends on the current recommended item and the user's past experience (called state). We propose two algorithms, ULCB and KL-ULCB, both of which do more exploration (being optimistic)…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

Exploration, Exploitation, and Engagement in Multi-Armed Bandits with Abandonment· slideslive

Taxonomy

TopicsAdvanced Bandit Algorithms Research · Data Stream Mining Techniques · Smart Grid Energy Management