The Simpson's Paradox in the Offline Evaluation of Recommendation   Systems

Amir H. Jadidinejad; Craig Macdonald; Iadh Ounis

arXiv:2104.08912·cs.IR·April 20, 2021

The Simpson's Paradox in the Offline Evaluation of Recommendation Systems

Amir H. Jadidinejad, Craig Macdonald, Iadh Ounis

PDF

1 Repo

TL;DR

This paper reveals that offline evaluation of recommendation systems is affected by Simpson's paradox due to confounding factors from deployed systems, and proposes a new evaluation method that improves correlation with true rankings.

Contribution

The paper identifies Simpson's paradox in offline recommendation evaluation and introduces a novel methodology that accounts for confounders, enhancing evaluation accuracy.

Findings

01

Stratified sampling exposes confounding effects of frequently exposed items.

02

Proposed evaluation method improves correlation with true rankings by 14-40%.

03

Method shows statistically significant better performance on open loop datasets.

Abstract

Recommendation systems are often evaluated based on user's interactions that were collected from an existing, already deployed recommendation system. In this situation, users only provide feedback on the exposed items and they may not leave feedback on other items since they have not been exposed to them by the deployed system. As a result, the collected feedback dataset that is used to evaluate a new model is influenced by the deployed system, as a form of closed loop feedback. In this paper, we show that the typical offline evaluation of recommender systems suffers from the so-called Simpson's paradox. Simpson's paradox is the name given to a phenomenon observed when a significant trend appears in several different sub-populations of observational data but disappears or is even reversed when these sub-populations are combined together. Our in-depth experiments based on stratified…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

terrierteam/stratified_recsys_eval
noneOfficial

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.