SocialEval: Evaluating Social Intelligence of Large Language Models
Jinfeng Zhou, Yuxuan Chen, Yihan Shi, Xuanming Zhang, Leqi Lei, Yi Feng, Zexuan Xiong, Miao Yan, Xunzhi Wang, Yaru Cao, Jianing Yin, Shuai Wang, Quanyu Dai, Zhenhua Dong, Hongning Wang, Minlie Huang

TL;DR
SocialEval introduces a comprehensive bilingual benchmark to evaluate the social intelligence of large language models, focusing on both outcome and process aspects of social interactions, revealing current limitations and insights into LLMs' social abilities.
Contribution
The paper presents SocialEval, a novel script-based bilingual benchmark that assesses LLMs' social intelligence through structured narrative scripts and a dual evaluation paradigm.
Findings
LLMs lag behind humans in social intelligence evaluations.
LLMs tend to exhibit prosocial and positive social behaviors.
Representation analysis shows LLMs develop human-like functional partitions.
Abstract
LLMs exhibit promising Social Intelligence (SI) in modeling human behavior, raising the need to evaluate LLMs' SI and their discrepancy with humans. SI equips humans with interpersonal abilities to behave wisely in navigating social interactions to achieve social goals. This presents an operational evaluation paradigm: outcome-oriented goal achievement evaluation and process-oriented interpersonal ability evaluation, which existing work fails to address. To this end, we propose SocialEval, a script-based bilingual SI benchmark, integrating outcome- and process-oriented evaluation by manually crafting narrative scripts. Each script is structured as a world tree that contains plot lines driven by interpersonal ability, providing a comprehensive view of how LLMs navigate social interactions. Experiments show that LLMs fall behind humans on both SI evaluations, exhibit prosociality, and…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
Taxonomy
TopicsTopic Modeling
