Rethinking Crowd Sourcing for Semantic Similarity
Shaul Solomon, Adam Cohn, Hernan Rosenblum, Chezi Hershkovitz, and Ivan P. Yamshchikov

TL;DR
This paper examines the ambiguities in crowd-sourced semantic similarity labeling, emphasizing the impact of binary annotator perceptions and proposing heuristics to improve label reliability in NLP tasks.
Contribution
It identifies the dominant role of binary annotators in semantic similarity labeling and introduces heuristics to filter unreliable annotators, enhancing label quality.
Findings
Binary annotators significantly influence crowd-sourced labels.
Heuristics can effectively filter unreliable annotators.
Discussion on human perception of semantic similarity.
Abstract
Estimation of semantic similarity is crucial for a variety of natural language processing (NLP) tasks. In the absence of a general theory of semantic information, many papers rely on human annotators as the source of ground truth for semantic similarity estimation. This paper investigates the ambiguities inherent in crowd-sourced semantic labeling. It shows that annotators that treat semantic similarity as a binary category (two sentences are either similar or not similar and there is no middle ground) play the most important role in the labeling. The paper offers heuristics to filter out unreliable annotators and stimulates further discussions on human perception of semantic similarity.
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsMisinformation and Its Impacts · Sentiment Analysis and Opinion Mining · Opinion Dynamics and Social Influence
