Beyond Keywords: Evaluating Large Language Model Classification of Nuanced Ableism

Naba Rizvi; Harper Strickland; Saleha Ahmedi; Aekta Kallepalli; Isha Khirwadkar; William Wu; Imani N. S. Munyaka; Nedjma Ousidhoum

arXiv:2505.20500·cs.CL·May 28, 2025

Beyond Keywords: Evaluating Large Language Model Classification of Nuanced Ableism

Naba Rizvi, Harper Strickland, Saleha Ahmedi, Aekta Kallepalli, Isha Khirwadkar, William Wu, Imani N. S. Munyaka, Nedjma Ousidhoum

PDF

Open Access

TL;DR

This paper assesses how well large language models can detect nuanced ableism in text, revealing their reliance on keywords and highlighting the importance of context for accurate identification.

Contribution

It provides a detailed evaluation of LLMs' ability to recognize nuanced ableism, comparing their performance to human judgment and analyzing their interpretative limitations.

Findings

01

LLMs can identify autism-related language but often miss harmful connotations.

02

LLMs rely heavily on keyword matching, leading to context misinterpretations.

03

Both LLMs and humans agree on a binary classification scheme for ableism detection.

Abstract

Large language models (LLMs) are increasingly used in decision-making tasks like r\'esum\'e screening and content moderation, giving them the power to amplify or suppress certain perspectives. While previous research has identified disability-related biases in LLMs, little is known about how they conceptualize ableism or detect it in text. We evaluate the ability of four LLMs to identify nuanced ableism directed at autistic individuals. We examine the gap between their understanding of relevant terminology and their effectiveness in recognizing ableist content in context. Our results reveal that LLMs can identify autism-related language but often miss harmful or offensive connotations. Further, we conduct a qualitative comparison of human and LLM explanations. We find that LLMs tend to rely on surface-level keyword matching, leading to context misinterpretations, in contrast to human…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsTraffic and Road Safety