Cascading Biases: Investigating the Effect of Heuristic Annotation   Strategies on Data and Models

Chaitanya Malaviya; Sudeep Bhatia; Mark Yatskar

arXiv:2210.13439·cs.CL·January 24, 2023

Cascading Biases: Investigating the Effect of Heuristic Annotation Strategies on Data and Models

Chaitanya Malaviya, Sudeep Bhatia, Mark Yatskar

PDF

Open Access 1 Repo

TL;DR

This paper investigates how cognitive heuristics used by annotators influence data quality and model robustness in reading comprehension datasets, revealing that heuristic use correlates with biases and impacts model performance.

Contribution

It introduces a method to track annotator heuristic traces and demonstrates their effect on data bias and model generalization in NLP tasks.

Findings

01

Annotators use multiple cognitive heuristics during annotation.

02

Heuristic use correlates with increased data bias and model susceptibility.

03

Tracking heuristics can improve dataset quality and bias diagnosis.

Abstract

Cognitive psychologists have documented that humans use cognitive heuristics, or mental shortcuts, to make quick decisions while expending less effort. While performing annotation work on crowdsourcing platforms, we hypothesize that such heuristic use among annotators cascades on to data quality and model robustness. In this work, we study cognitive heuristic use in the context of annotating multiple-choice reading comprehension datasets. We propose tracking annotator heuristic traces, where we tangibly measure low-effort annotation strategies that could indicate usage of various cognitive heuristics. We find evidence that annotators might be using multiple such heuristics, based on correlations with a battery of psychological tests. Importantly, heuristic use among annotators determines data quality along several dimensions: (1) known biased models, such as partial input models, more…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

chaitanyamalaviya/annotator-heuristics
pytorchOfficial

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsMobile Crowdsensing and Crowdsourcing · Topic Modeling · Expert finding and Q&A systems