WinoPron: Revisiting English Winogender Schemas for Consistency,   Coverage, and Grammatical Case

Vagrant Gautam; Julius Steuer; Eileen Bingert; Ray Johns; Anne; Lauscher; Dietrich Klakow

arXiv:2409.05653·cs.CL·October 8, 2024

WinoPron: Revisiting English Winogender Schemas for Consistency, Coverage, and Grammatical Case

Vagrant Gautam, Julius Steuer, Eileen Bingert, Ray Johns, Anne, Lauscher, Dietrich Klakow

PDF

Open Access 1 Repo 1 Datasets 1 Video

TL;DR

This paper introduces WinoPron, a corrected and expanded dataset for evaluating gender bias in coreference resolution, and demonstrates its effectiveness by analyzing state-of-the-art models and proposing a nuanced bias evaluation method.

Contribution

The paper identifies issues in the original Winogender Schemas, creates the improved WinoPron dataset, and introduces a new method for more detailed bias evaluation in coreference resolution.

Findings

01

Accusative pronouns are more difficult for models to resolve.

02

Bias varies across different pronoun surface forms.

03

WinoPron provides more reliable bias evaluation than previous datasets.

Abstract

While measuring bias and robustness in coreference resolution are important goals, such measurements are only as good as the tools we use to measure them. Winogender Schemas (Rudinger et al., 2018) are an influential dataset proposed to evaluate gender bias in coreference resolution, but a closer look reveals issues with the data that compromise its use for reliable evaluation, including treating different pronominal forms as equivalent, violations of template constraints, and typographical errors. We identify these issues and fix them, contributing a new dataset: WinoPron. Using WinoPron, we evaluate two state-of-the-art supervised coreference resolution systems, SpanBERT, and five sizes of FLAN-T5, and demonstrate that accusative pronouns are harder to resolve for all models. We also propose a new method to evaluate pronominal bias in coreference resolution that goes beyond the…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

uds-lsv/winopron
noneOfficial

Datasets

elidek-themis/WinoPron
dataset· 6 dl
6 dl

Videos

WinoPron: Revisiting English Winogender Schemas for Consistency, Coverage, and Grammatical Case· underline

Taxonomy

TopicsNatural Language Processing Techniques