Exploitation Over Exploration: Unmasking the Bias in Linear Bandit Recommender Offline Evaluation
Pedro R. Pires, Gregorio F. Azevedo, Pietro L. Campos, Rafael T. Sereicikas, Tiago A. Almeida

TL;DR
This paper reveals that offline evaluation of linear bandit algorithms in recommender systems is biased, as greedy models without exploration often outperform exploratory ones, highlighting the need for better assessment methods.
Contribution
It provides an extensive empirical comparison showing the dominance of exploitation over exploration in offline evaluations, exposing biases in current evaluation protocols.
Findings
Greedy linear models often outperform exploratory algorithms in offline tests.
Hyperparameter tuning favors minimal exploration strategies.
Current offline evaluation methods may not accurately reflect true exploration efficacy.
Abstract
Multi-Armed Bandit (MAB) algorithms are widely used in recommender systems that require continuous, incremental learning. A core aspect of MABs is the exploration-exploitation trade-off: choosing between exploiting items likely to be enjoyed and exploring new ones to gather information. In contextual linear bandits, this trade-off is particularly central, as many variants share the same linear regression backbone and differ primarily in their exploration strategies. Despite its prevalent use, offline evaluation of MABs is increasingly recognized for its limitations in reliably assessing exploration behavior. This study conducts an extensive offline empirical comparison of several linear MABs. Strikingly, across over 90% of various datasets, a greedy linear model, with no type of exploration, consistently achieves top-tier performance, often outperforming or matching its exploratory…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
