Comparing Humans and Models on a Similar Scale: Towards Cognitive Gender Bias Evaluation in Coreference Resolution

Gili Lior; Gabriel Stanovsky

arXiv:2305.15389·cs.CL·August 25, 2025·2 cites

Comparing Humans and Models on a Similar Scale: Towards Cognitive Gender Bias Evaluation in Coreference Resolution

Gili Lior, Gabriel Stanovsky

PDF

Open Access 1 Repo

TL;DR

This paper investigates how model biases in coreference resolution compare to human biases by using dual-process theory and crowdsourcing experiments, revealing that humans are slightly more biased on real data, but models are more biased on synthetic data.

Contribution

It introduces a novel framework to quantify and compare human and model gender biases in NLP using dual-process theory and experimental methods.

Findings

01

Humans are ~3% more biased on real data.

02

Models are ~12% more biased on synthetic data.

03

The dual-process approach offers new insights into bias mechanisms.

Abstract

Spurious correlations were found to be an important factor explaining model performance in various NLP tasks (e.g., gender or racial artifacts), often considered to be ''shortcuts'' to the actual task. However, humans tend to similarly make quick (and sometimes wrong) predictions based on societal and cognitive presuppositions. In this work we address the question: can we quantify the extent to which model biases reflect human behaviour? Answering this question will help shed light on model performance and provide meaningful comparisons against humans. We approach this question through the lens of the dual-process theory for human decision-making. This theory differentiates between an automatic unconscious (and sometimes biased) ''fast system'' and a ''slow system'', which when triggered may revisit earlier automatic reactions. We make several observations from two crowdsourcing…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

slab-nlp/cog-gb-eval
noneOfficial

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsTopic Modeling · Software Engineering Research · Explainable Artificial Intelligence (XAI)