IMDB-WIKI-SbS: An Evaluation Dataset for Crowdsourced Pairwise Comparisons
Nikita Pavlichenko, Dmitry Ustalov

TL;DR
This paper introduces IMDB-WIKI-SbS, a large-scale dataset for evaluating pairwise comparisons in AI, addressing the limitations of existing datasets by capturing subjective human preferences with balanced demographic annotations.
Contribution
The creation of a comprehensive, balanced, large-scale dataset for pairwise comparison tasks, facilitating better evaluation of AI models on subjective human preferences.
Findings
Dataset contains 9,150 images and 250,249 pairs.
Balanced age and gender distribution in the dataset.
Baseline methods demonstrate the dataset's effectiveness for model evaluation.
Abstract
Today, comprehensive evaluation of large-scale machine learning models is possible thanks to the open datasets produced using crowdsourcing, such as SQuAD, MS COCO, ImageNet, SuperGLUE, etc. These datasets capture objective responses, assuming the single correct answer, which does not allow to capture the subjective human perception. In turn, pairwise comparison tasks, in which one has to choose between only two options, allow taking peoples' preferences into account for very challenging artificial intelligence tasks, such as information retrieval and recommender system evaluation. Unfortunately, the available datasets are either small or proprietary, slowing down progress in gathering better feedback from human users. In this paper, we present IMDB-WIKI-SbS, a new large-scale dataset for evaluating pairwise comparisons. It contains 9,150 images appearing in 250,249 pairs annotated on a…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsMobile Crowdsensing and Crowdsourcing · Domain Adaptation and Few-Shot Learning · Machine Learning and Data Classification
