Transitivity, Time Consumption, and Quality of Preference Judgments in Crowdsourcing
Kai Hui, Klaus Berberich

TL;DR
This study investigates how preference judgment types collected via crowdsourcing differ in transitivity, time, and quality, revealing that only strict preferences are transitive and consistent.
Contribution
It provides the first comparison of strict and weak preference judgments from crowdsourcing, highlighting differences in transitivity, efficiency, and judgment quality.
Findings
Only strict preference judgments are transitive.
Weak preferences show different transitivity and quality patterns.
Strict preferences are more reliable for relevance assessment.
Abstract
Preference judgments have been demonstrated as a better alternative to graded judgments to assess the relevance of documents relative to queries. Existing work has verified transitivity among preference judgments when collected from trained judges, which reduced the number of judgments dramatically. Moreover, strict preference judgments and weak preference judgments, where the latter additionally allow judges to state that two documents are equally relevant for a given query, are both widely used in literature. However, whether transitivity still holds when collected from crowdsourcing, i.e., whether the two kinds of preference judgments behave similarly remains unclear. In this work, we collect judgments from multiple judges using a crowdsourcing platform and aggregate them to compare the two kinds of preference judgments in terms of transitivity, time consumption, and quality. That…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsMobile Crowdsensing and Crowdsourcing · Information Retrieval and Search Behavior · Data Management and Algorithms
