Applied Theory of Mind and Large Language Models -- how good is ChatGPT at solving social vignettes?
Anna Katharina Holl-Etten, Nina Schnaderbeck, Elizaveta Kosareva, Leonhard Aron Prattke, Ralph Krueger, Lisa Marie Warner, and Nora C. Vetter

TL;DR
This study evaluates GPT-3.5 Turbo and GPT-4's ability to perform social reasoning tasks, finding GPT-4 performs near human levels on complex Theory of Mind assessments, indicating potential for assistive social communication tools.
Contribution
It provides a comprehensive assessment of GPT-4's capacity for applied Theory of Mind in social vignettes, comparing it to human performance and previous AI models.
Findings
GPT-4 achieves near human accuracy on the Faux Pas Test.
GPT-4 scores comparable to neurotypical adults on Social Stories Questionnaire.
GPT-4 exceeds neurotypical benchmarks in Story Comprehension Test.
Abstract
The rapid development of language-based artificial intelligence (AI) offers new possibilities for psychotherapy and assistive systems, particularly benefitting autistic individuals who often respond well to technology. Parents of autistic persons emphasize the importance of appropriate and context-specific communication behavior. This study investigated whether GPT-3.5 Turbo and GPT-4, as language-based AI applications, are fundamentally capable of replicating this type of adequate communication behavior in the form of applied Theory of Mind (ToM). GPT-3.5 Turbo and GPT-4 were evaluated on three established higher-order ToM tasks: the Faux Pas Test, the Social Stories Questionnaire, and the Story Comprehension Test in English and German. Two independent raters scored response accuracy based on standardized manuals. In addition, responses were rated for epistemic markers as indicators of…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsArtificial Intelligence in Healthcare and Education · Neurobiology of Language and Bilingualism · Digital Mental Health Interventions
