Contextual Bandit with Herding Effects: Algorithms and Recommendation Applications
Luyue Xu, Liming Wang, Hong Xie, Mingqiang Zhou

TL;DR
This paper introduces TS-Conf, a novel contextual bandit algorithm designed to mitigate herding effects in user feedback, leading to more accurate recommendations and faster learning in biased feedback environments.
Contribution
It formulates a user feedback model for herding effects and develops TS-Conf, the first algorithm tailored to address feedback bias caused by herding in recommendation systems.
Findings
TS-Conf outperforms four benchmark algorithms in experiments.
The regret bound reveals herding effects slow down learning.
TS-Conf effectively reduces feedback bias impact.
Abstract
Contextual bandits serve as a fundamental algorithmic framework for optimizing recommendation decisions online. Though extensive attention has been paid to tailoring contextual bandits for recommendation applications, the "herding effects" in user feedback have been ignored. These herding effects bias user feedback toward historical ratings, breaking down the assumption of unbiased feedback inherent in contextual bandits. This paper develops a novel variant of the contextual bandit that is tailored to address the feedback bias caused by the herding effects. A user feedback model is formulated to capture this feedback bias. We design the TS-Conf (Thompson Sampling under Conformity) algorithm, which employs posterior sampling to balance the exploration and exploitation tradeoff. We prove an upper bound for the regret of the algorithm, revealing the impact of herding effects on learning…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsAdvanced Bandit Algorithms Research · Data Stream Mining Techniques
MethodsSoftmax · Attention Is All You Need
