Large language models show fragile cognitive reasoning about human emotions
Sree Bhattacharyya, Evgenii Kuriabov, Lucas Craig, Tharun Dilliraj, Reginald B. Adams, Jr., Jia Li, James Z. Wang

TL;DR
This paper investigates whether large language models can reason about human emotions through cognitive dimensions, revealing they capture some systematic relations but also show misalignment and instability.
Contribution
Introduces CoRE, a benchmark to evaluate LLMs' reasoning about emotions via cognitive appraisals, highlighting their strengths and limitations.
Findings
LLMs capture systematic relations between appraisals and emotions
LLMs show misalignment with human judgments
LLMs exhibit instability across different contexts
Abstract
Affective computing seeks to support the holistic development of artificial intelligence by enabling machines to engage with human emotion. Recent foundation models, particularly large language models (LLMs), have been trained and evaluated on emotion-related tasks, typically using supervised learning with discrete emotion labels. Such evaluations largely focus on surface phenomena, such as recognizing expressed or evoked emotions, leaving open whether these systems reason about emotion in cognitively meaningful ways. Here we ask whether LLMs can reason about emotions through underlying cognitive dimensions rather than labels alone. Drawing on cognitive appraisal theory, we introduce CoRE, a large-scale benchmark designed to probe the implicit cognitive structures LLMs use when interpreting emotionally charged situations. We assess alignment with human appraisal patterns, internal…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsTopic Modeling · Explainable Artificial Intelligence (XAI)
