DisaBench: A Participatory Evaluation Framework for Disability Harms in Language Models
Eugenia Kim, Ioana Tanase, Christina Mallon

TL;DR
DisaBench is a comprehensive evaluation framework for assessing disability-related harms in language models, emphasizing nuanced, context-aware analysis with community involvement.
Contribution
It introduces a new taxonomy, dataset, and methodology co-created with disabled communities to better evaluate subtle and intersectional harms in language models.
Findings
Harm rates vary significantly by disability type.
Terminology-driven harm is culturally and temporally bound.
Standard safety tests miss subtle harms detectable by domain experts.
Abstract
General-purpose safety benchmarks for large language models do not adequately evaluate disability-related harms. We introduce DisaBench: a taxonomy of twelve disability harm categories co-created with people with disabilities and red teaming experts, a taxonomy-driven evaluation methodology that pairs benign and adversarial prompts across seven life domains, and a dataset of 175 prompts with human-annotated labels on 525 prompt-response pairs. Annotation by four evaluators with lived disability experience reveals three findings: harm rates vary sharply by disability type and will compound in non-text modalities, terminology-driven harm is culturally and temporally bound rather than universally assessable, and standard safety evaluation catches overt failures while missing the subtle harms that only domain expertise can recognize. Disability harm is simultaneously personal,…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
