The Text Aphasia Battery (TAB): A Clinically-Grounded Benchmark for Aphasia-Like Deficits in Language Models

Nathan Roll; Jill Kries; Flora Jin; Catherine Wang; Ann Marie Finley; Meghan Sumner; Cory Shain; Laura Gwilliams

arXiv:2511.20507·cs.CL·November 26, 2025

The Text Aphasia Battery (TAB): A Clinically-Grounded Benchmark for Aphasia-Like Deficits in Language Models

Nathan Roll, Jill Kries, Flora Jin, Catherine Wang, Ann Marie Finley, Meghan Sumner, Cory Shain, Laura Gwilliams

PDF

Open Access

TL;DR

This paper introduces the Text Aphasia Battery (TAB), a benchmark adapted from clinical assessments to evaluate aphasic-like language deficits in large language models, with validated automated scoring for large-scale analysis.

Contribution

The paper presents the TAB benchmark, including its design, scoring, and validation, enabling systematic assessment of language deficits in LLMs using a clinically-grounded framework.

Findings

01

TAB achieves reliability comparable to expert human raters.

02

Automated evaluation protocol is scalable for large-scale use.

03

Provides a new tool for analyzing language deficits in artificial systems.

Abstract

Large language models (LLMs) have emerged as a candidate "model organism" for human language, offering an unprecedented opportunity to study the computational basis of linguistic disorders like aphasia. However, traditional clinical assessments are ill-suited for LLMs, as they presuppose human-like pragmatic pressures and probe cognitive processes not inherent to artificial architectures. We introduce the Text Aphasia Battery (TAB), a text-only benchmark adapted from the Quick Aphasia Battery (QAB) to assess aphasic-like deficits in LLMs. The TAB comprises four subtests: Connected Text, Word Comprehension, Sentence Comprehension, and Repetition. This paper details the TAB's design, subtests, and scoring criteria. To facilitate large-scale use, we validate an automated evaluation protocol using Gemini 2.5 Flash, which achieves reliability comparable to expert human raters…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsNeurobiology of Language and Bilingualism · Artificial Intelligence in Healthcare and Education · Text Readability and Simplification