Mitigating Exposure Bias in Online Learning to Rank Recommendation: A Novel Reward Model for Cascading Bandits
Masoud Mansoury, Bamshad Mobasher, Herke van Hoof

TL;DR
This paper addresses exposure bias in online recommendation systems, especially in contextual bandit algorithms, and proposes a new reward model that improves fairness without sacrificing accuracy, backed by theoretical and empirical evidence.
Contribution
The paper introduces an Exposure-Aware reward model for linear cascading bandits that mitigates exposure bias and provides theoretical performance guarantees.
Findings
Improved exposure fairness over time in real-world datasets
Maintains recommendation accuracy while reducing bias
Outperforms existing baseline models
Abstract
Exposure bias is a well-known issue in recommender systems where items and suppliers are not equally represented in the recommendation results. This bias becomes particularly problematic over time as a few items are repeatedly over-represented in recommendation lists, leading to a feedback loop that further amplifies this bias. Although extensive research has addressed this issue in model-based or neighborhood-based recommendation algorithms, less attention has been paid to online recommendation models, such as those based on top-K contextual bandits, where recommendation models are dynamically updated with ongoing user feedback. In this paper, we study exposure bias in a class of well-known contextual bandit algorithms known as Linear Cascading Bandits. We analyze these algorithms in their ability to handle exposure bias and provide a fair representation of items in the recommendation…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsAdvanced Bandit Algorithms Research · Mind wandering and attention · Online Learning and Analytics
MethodsSoftmax · Attention Is All You Need
