Exploitation Over Exploration: Unmasking the Bias in Linear Bandit Recommender Offline Evaluation

Pedro R. Pires; Gregorio F. Azevedo; Pietro L. Campos; Rafael T. Sereicikas; Tiago A. Almeida

arXiv:2507.18756·cs.LG·April 20, 2026

Exploitation Over Exploration: Unmasking the Bias in Linear Bandit Recommender Offline Evaluation

Pedro R. Pires, Gregorio F. Azevedo, Pietro L. Campos, Rafael T. Sereicikas, Tiago A. Almeida

PDF

TL;DR

This paper reveals that offline evaluation of linear bandit algorithms in recommender systems is biased, as greedy models without exploration often outperform exploratory ones, highlighting the need for better assessment methods.

Contribution

It provides an extensive empirical comparison showing the dominance of exploitation over exploration in offline evaluations, exposing biases in current evaluation protocols.

Findings

01

Greedy linear models often outperform exploratory algorithms in offline tests.

02

Hyperparameter tuning favors minimal exploration strategies.

03

Current offline evaluation methods may not accurately reflect true exploration efficacy.

Abstract

Multi-Armed Bandit (MAB) algorithms are widely used in recommender systems that require continuous, incremental learning. A core aspect of MABs is the exploration-exploitation trade-off: choosing between exploiting items likely to be enjoyed and exploring new ones to gather information. In contextual linear bandits, this trade-off is particularly central, as many variants share the same linear regression backbone and differ primarily in their exploration strategies. Despite its prevalent use, offline evaluation of MABs is increasingly recognized for its limitations in reliably assessing exploration behavior. This study conducts an extensive offline empirical comparison of several linear MABs. Strikingly, across over 90% of various datasets, a greedy linear model, with no type of exploration, consistently achieves top-tier performance, often outperforming or matching its exploratory…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.