Epsilon non-Greedy: A Bandit Approach for Unbiased Recommendation via Uniform Data
S.M.F. Sani, Seyed Abbas Hosseini, Hamid R. Rabiee

TL;DR
This paper introduces a bandit-based framework for unbiased recommendation systems that mitigates self-feedback bias by leveraging uniform data collection and sequential training, improving over existing debiasing methods.
Contribution
It proposes a novel offline sequential training schema and a bandit approach to reduce bias in recommendation systems, addressing the feedback loop issue.
Findings
Outperforms state-of-the-art debiasing methods in experiments
Effectively explores under-understood items to improve recommendations
Simulates real-world continuous training scenarios
Abstract
Often, recommendation systems employ continuous training, leading to a self-feedback loop bias in which the system becomes biased toward its previous recommendations. Recent studies have attempted to mitigate this bias by collecting small amounts of unbiased data. While these studies have successfully developed less biased models, they ignore the crucial fact that the recommendations generated by the model serve as the training data for subsequent training sessions. To address this issue, we propose a framework that learns an unbiased estimator using a small amount of uniformly collected data and focuses on generating improved training data for subsequent training iterations. To accomplish this, we view recommendation as a contextual multi-arm bandit problem and emphasize on exploring items that the model has a limited understanding of. We introduce a new offline sequential training…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsAdvanced Bandit Algorithms Research · Smart Grid Energy Management · Recommender Systems and Techniques
