Exposure-Aware Recommendation using Contextual Bandits
Masoud Mansoury, Bamshad Mobasher, Herke van Hoof

TL;DR
This paper investigates exposure bias in online contextual bandit recommendation algorithms, revealing their tendency to amplify disparity over time, and proposes an exposure-aware reward model to mitigate this bias while maintaining accuracy.
Contribution
The paper introduces an exposure-aware reward model for linear cascading bandits that reduces exposure bias amplification in online recommendations.
Findings
The algorithms tend to amplify exposure disparity over time.
The proposed model effectively reduces exposure bias.
Recommendation accuracy is maintained with the new model.
Abstract
Exposure bias is a well-known issue in recommender systems where items and suppliers are not equally represented in the recommendation results. This is especially problematic when bias is amplified over time as a few items (e.g., popular ones) are repeatedly over-represented in recommendation lists and users' interactions with those items will amplify bias towards those items over time resulting in a feedback loop. This issue has been extensively studied in the literature on model-based or neighborhood-based recommendation algorithms, but less work has been done on online recommendation models, such as those based on top-K contextual bandits, where recommendation models are dynamically updated with ongoing user feedback. In this paper, we study exposure bias in a class of well-known contextual bandit algorithms known as Linear Cascading Bandits. We analyze these algorithms on their…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsAdvanced Bandit Algorithms Research · Recommender Systems and Techniques · Smart Grid Energy Management
