Pairwise, Magnitude, or Stars: What's the Best Way for Crowds to Rate?
Alessandro Checco, Gianluca Demartini

TL;DR
This study compares five-star, pairwise, and magnitude estimation rating methods in crowdsourcing, analyzing their efficiency, accuracy, and effort through a large-scale experiment and dataset.
Contribution
It provides an unbiased comparison of three rating techniques and releases a comprehensive dataset for future research.
Findings
Pairwise comparison requires fewer ratings for accurate preferences.
Magnitude estimation achieves high accuracy with moderate effort.
Five-star ratings are easiest for users but less precise.
Abstract
We compare three popular techniques of rating content: the ubiquitous five star rating, the less used pairwise comparison, and the recently introduced (in crowdsourcing) magnitude estimation approach. Each system has specific advantages and disadvantages, in terms of required user effort, achievable user preference prediction accuracy and number of ratings required. We design an experiment where the three techniques are compared in an unbiased way. We collected 39'000 ratings on a popular crowdsourcing platform, allowing us to release a dataset that will be useful for many related studies on user rating techniques.
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsMobile Crowdsensing and Crowdsourcing · Information Retrieval and Search Behavior · Recommender Systems and Techniques
