CAREBench: Evaluating LLMs' Emotion Understanding by Assessing Cognitive Appraisal Reasoning
Zhaoyue Sun,Hainiu Xu,Andero Uusberg,James J. Gross,Petr Slovak, Yulan He

TL;DR
CAREBench is a novel benchmark designed to evaluate LLMs' emotion understanding by analyzing their ability to perform appraisal reasoning and capture the cognitive processes behind emotion generation.
Contribution
It introduces a comprehensive evaluation framework with complete inferential chain annotations based on appraisal theory, filling a gap in emotion understanding assessment.
Findings
Stronger models match or surpass humans on some tasks.
Models struggle with appraisal reasoning and positive emotion recognition.
Performance varies across chain steps and is affected by appraisal interventions.
Abstract
Emotion understanding is a core capability for LLMs to interact effectively with humans, yet existing evaluation paradigms rely on discrete emotion label prediction and fail to capture the cognitive processes underlying emotion generation. Grounded in appraisal theory, we introduce CAREBench, the first benchmark with complete inferential chain annotations from both first- and third-person perspectives on real-world narratives, spanning appraisal reasoning, appraisal ratings, and multi-label emotion annotation. We propose a process-level evaluation framework and conduct systematic experiments across six LLMs organized around four research questions. We find that stronger models match or surpass human observers on certain tasks, yet fall short on appraisal reasoning and positive emotion recognition; performance across chain steps and sensitivity to appraisal interventions exhibit…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
