ProText: A benchmark dataset for measuring (mis)gendering in long-form texts

Hadas Kotek; Margit Bowler; Patrick Sonnenberg; Yu'an Yang

arXiv:2603.27838·cs.CL·March 31, 2026

ProText: A benchmark dataset for measuring (mis)gendering in long-form texts

Hadas Kotek, Margit Bowler, Patrick Sonnenberg, Yu'an Yang

PDF

TL;DR

ProText is a new benchmark dataset designed to evaluate gender bias and misgendering in long-form English texts across various themes and categories, especially in text transformations by language models.

Contribution

It introduces a comprehensive dataset that extends beyond traditional benchmarks to assess nuanced gender biases in diverse text transformations.

Findings

01

Systematic gender bias observed in model outputs.

02

Bias increases when inputs lack explicit gender cues.

03

Models often default to heteronormative assumptions.

Abstract

We introduce ProText, a dataset for measuring gendering and misgendering in stylistically diverse long-form English texts. ProText spans three dimensions: Theme nouns (names, occupations, titles, kinship terms), Theme category (stereotypically male, stereotypically female, gender-neutral/non-gendered), and Pronoun category (masculine, feminine, gender-neutral, none). The dataset is designed to probe (mis)gendering in text transformations such as summarization and rewrites using state-of-the-art Large Language Models, extending beyond traditional pronoun resolution benchmarks and beyond the gender binary. We validated ProText through a mini case study, showing that even with just two prompts and two models, we can draw nuanced insights regarding gender bias, stereotyping, misgendering, and gendering. We reveal systematic gender bias, particularly when inputs contain no explicit gender…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.