Do They Understand Them? An Updated Evaluation on Nonbinary Pronoun Handling in Large Language Models

Xushuo Tang; Yi Ding; Zhengyi Yang; Yin Chen; Yongrui Gu; Wenke Yang; Mingchen Ju; Xin Cao; Yongfei Liu; Wenjie Zhang

arXiv:2508.00788·cs.CL·August 4, 2025

Do They Understand Them? An Updated Evaluation on Nonbinary Pronoun Handling in Large Language Models

Xushuo Tang, Yi Ding, Zhengyi Yang, Yin Chen, Yongrui Gu, Wenke Yang, Mingchen Ju, Xin Cao, Yongfei Liu, Wenjie Zhang

PDF

Open Access

TL;DR

This paper evaluates how well large language models handle nonbinary and neopronouns, showing progress but also highlighting ongoing challenges in gender-inclusive language understanding.

Contribution

Introduces MISGENDERED+ benchmark, updates evaluation of LLMs' pronoun handling, and compares five models across various tasks, revealing both improvements and persistent gaps.

Findings

01

Notable improvements in binary and gender-neutral pronoun accuracy.

02

Inconsistent performance on neopronouns and reverse inference tasks.

03

Persistent gaps in identity-sensitive reasoning in LLMs.

Abstract

Large language models (LLMs) are increasingly deployed in sensitive contexts where fairness and inclusivity are critical. Pronoun usage, especially concerning gender-neutral and neopronouns, remains a key challenge for responsible AI. Prior work, such as the MISGENDERED benchmark, revealed significant limitations in earlier LLMs' handling of inclusive pronouns, but was constrained to outdated models and limited evaluations. In this study, we introduce MISGENDERED+, an extended and updated benchmark for evaluating LLMs' pronoun fidelity. We benchmark five representative LLMs, GPT-4o, Claude 4, DeepSeek-V3, Qwen Turbo, and Qwen2.5, across zero-shot, few-shot, and gender identity inference. Our results show notable improvements compared with previous studies, especially in binary and gender-neutral pronoun accuracy. However, accuracy on neopronouns and reverse inference tasks remains…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsArtificial Intelligence in Healthcare and Education · Topic Modeling · Ethics and Social Impacts of AI