Collaborative Filtering and the Missing at Random Assumption
Benjamin Marlin, Richard S. Zemel, Sam Roweis, Malcolm Slaney

TL;DR
This paper investigates the validity of the missing at random assumption in collaborative filtering, revealing user biases and demonstrating that modeling missing data mechanisms improves rating prediction accuracy.
Contribution
It provides empirical evidence that the MAR assumption often does not hold in real user data and shows that explicit modeling of missing data mechanisms enhances prediction performance.
Findings
Users' rating behavior influences whether they rate a song.
The random sample of ratings differs significantly from user-selected ratings.
Modeling missing data mechanisms improves prediction accuracy.
Abstract
Rating prediction is an important application, and a popular research topic in collaborative filtering. However, both the validity of learning algorithms, and the validity of standard testing procedures rest on the assumption that missing ratings are missing at random (MAR). In this paper we present the results of a user study in which we collect a random sample of ratings from current users of an online radio service. An analysis of the rating data collected in the study shows that the sample of random ratings has markedly different properties than ratings of user-selected songs. When asked to report on their own rating behaviour, a large number of users indicate they believe their opinion of a song does affect whether they choose to rate that song, a violation of the MAR condition. Finally, we present experimental results showing that incorporating an explicit model of the missing…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsRecommender Systems and Techniques · Mobile Crowdsensing and Crowdsourcing · Music and Audio Processing
