Offline Evaluation of Reward-Optimizing Recommender Systems: The Case of Simulation
Imad Aouali, Amine Benhalloum, Martin Bompaire, Benjamin Heymann,, Olivier Jeunen, David Rohde, Otmane Sakhi, Flavian Vasile

TL;DR
This paper advocates for simulation-based offline evaluation methods for reward-optimizing recommender systems, highlighting their advantages over proxy and counterfactual metrics in providing more reliable assessments.
Contribution
It introduces simulation as a promising alternative for offline evaluation, addressing limitations of existing proxy and counterfactual metrics in real-world environments.
Findings
Simulation-based comparisons offer more reliable evaluation than traditional offline metrics.
Offline metrics like proxy and counterfactual methods often lack correlation with online performance.
Simulation can bridge the gap between offline evaluation and real-world system performance.
Abstract
Both in academic and industry-based research, online evaluation methods are seen as the golden standard for interactive applications like recommendation systems. Naturally, the reason for this is that we can directly measure utility metrics that rely on interventions, being the recommendations that are being shown to users. Nevertheless, online evaluation methods are costly for a number of reasons, and a clear need remains for reliable offline evaluation procedures. In industry, offline metrics are often used as a first-line evaluation to generate promising candidate models to evaluate online. In academic work, limited access to online systems makes offline metrics the de facto approach to validating novel methods. Two classes of offline metrics exist: proxy-based methods, and counterfactual methods. The first class is often poorly correlated with the online metrics we care about, and…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsAdvanced Bandit Algorithms Research · Recommender Systems and Techniques · Auction Theory and Applications
