Offline Evaluation of Reward-Optimizing Recommender Systems: The Case of   Simulation

Imad Aouali; Amine Benhalloum; Martin Bompaire; Benjamin Heymann,; Olivier Jeunen; David Rohde; Otmane Sakhi; Flavian Vasile

arXiv:2209.08642·cs.IR·September 20, 2022·1 cites

Offline Evaluation of Reward-Optimizing Recommender Systems: The Case of Simulation

Imad Aouali, Amine Benhalloum, Martin Bompaire, Benjamin Heymann,, Olivier Jeunen, David Rohde, Otmane Sakhi, Flavian Vasile

PDF

Open Access

TL;DR

This paper advocates for simulation-based offline evaluation methods for reward-optimizing recommender systems, highlighting their advantages over proxy and counterfactual metrics in providing more reliable assessments.

Contribution

It introduces simulation as a promising alternative for offline evaluation, addressing limitations of existing proxy and counterfactual metrics in real-world environments.

Findings

01

Simulation-based comparisons offer more reliable evaluation than traditional offline metrics.

02

Offline metrics like proxy and counterfactual methods often lack correlation with online performance.

03

Simulation can bridge the gap between offline evaluation and real-world system performance.

Abstract

Both in academic and industry-based research, online evaluation methods are seen as the golden standard for interactive applications like recommendation systems. Naturally, the reason for this is that we can directly measure utility metrics that rely on interventions, being the recommendations that are being shown to users. Nevertheless, online evaluation methods are costly for a number of reasons, and a clear need remains for reliable offline evaluation procedures. In industry, offline metrics are often used as a first-line evaluation to generate promising candidate models to evaluate online. In academic work, limited access to online systems makes offline metrics the de facto approach to validating novel methods. Two classes of offline metrics exist: proxy-based methods, and counterfactual methods. The first class is often poorly correlated with the online metrics we care about, and…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsAdvanced Bandit Algorithms Research · Recommender Systems and Techniques · Auction Theory and Applications