Re-Evaluating the Netflix Prize - Human Uncertainty and its Impact on   Reliability

Kevin Jasberg; Sergej Sizov

arXiv:1706.08866·cs.HC·June 28, 2017·2 cites

Re-Evaluating the Netflix Prize - Human Uncertainty and its Impact on Reliability

Kevin Jasberg, Sergej Sizov

PDF

Open Access

TL;DR

This paper investigates how human rating variability affects the reliability of recommender system evaluations, revealing that many top rankings are statistically uncertain and may be influenced by chance.

Contribution

It introduces a probabilistic approach to rating assessment, accounting for human uncertainty, and re-evaluates the reliability of the Netflix Prize rankings.

Findings

01

User ratings are inconsistent upon repeated questioning.

02

Accuracy metrics can be modeled as probability densities.

03

Top rankings have high probabilities of being due to chance.

Abstract

In this paper, we examine the statistical soundness of comparative assessments within the field of recommender systems in terms of reliability and human uncertainty. From a controlled experiment, we get the insight that users provide different ratings on same items when repeatedly asked. This volatility of user ratings justifies the assumption of using probability densities instead of single rating scores. As a consequence, the well-known accuracy metrics (e.g. MAE, MSE, RMSE) yield a density themselves that emerges from convolution of all rating densities. When two different systems produce different RMSE distributions with significant intersection, then there exists a probability of error for each possible ranking. As an application, we examine possible ranking errors of the Netflix Prize. We are able to show that all top rankings are more or less subject to high probabilities of…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsScientific Measurement and Uncertainty Evaluation · Advanced Statistical Process Monitoring · Forecasting Techniques and Applications