Incentivized Bandit Learning with Self-Reinforcing User Preferences

Tianchen Zhou; Jia Liu; Chaosheng Dong; Jingyuan Deng

arXiv:2105.08869·cs.LG·June 1, 2021·1 cites

Incentivized Bandit Learning with Self-Reinforcing User Preferences

Tianchen Zhou, Jia Liu, Chaosheng Dong, Jingyuan Deng

PDF

Open Access 1 Video

TL;DR

This paper introduces a novel multi-armed bandit model that incorporates user incentives and self-reinforcing preferences, proposing policies with logarithmic regret and payment bounds, validated through simulations.

Contribution

It presents a new MAB model considering incentives and self-reinforcing user preferences, along with two policies achieving logarithmic regret and payment bounds.

Findings

01

Both policies achieve $O(log T)$ expected regret.

02

Expected payment is also $O(log T)$ over time horizon T.

03

Simulations confirm robustness and effectiveness of the policies.

Abstract

In this paper, we investigate a new multi-armed bandit (MAB) online learning model that considers real-world phenomena in many recommender systems: (i) the learning agent cannot pull the arms by itself and thus has to offer rewards to users to incentivize arm-pulling indirectly; and (ii) if users with specific arm preferences are well rewarded, they induce a "self-reinforcing" effect in the sense that they will attract more users of similar arm preferences. Besides addressing the tradeoff of exploration and exploitation, another key feature of this new MAB model is to balance reward and incentivizing payment. The goal of the agent is to maximize the total reward over a fixed time horizon $T$ with a low total payment. Our contributions in this paper are two-fold: (i) We propose a new MAB model with random arm selection that considers the relationship of users' self-reinforcing…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

Incentivized Bandit Learning with Self-Reinforcing User Preferences· slideslive

Taxonomy

TopicsAdvanced Bandit Algorithms Research · Smart Grid Energy Management · Optimization and Search Problems