Evaluating and Modeling Social Intelligence: A Comparative Study of Human and AI Capabilities
Junqi Wang, Chunhui Zhang, Jiapeng Li, Yuxi Ma, Lixing Niu, Jiaheng, Han, Yujia Peng, Yixin Zhu, Lifeng Fan

TL;DR
This study introduces a benchmark and theoretical framework to evaluate social intelligence in humans and AI, revealing humans outperform GPT models and highlighting the limited social understanding of current LLMs.
Contribution
The paper develops a new benchmark and computational model for assessing social intelligence, providing a comparative analysis between human and AI capabilities.
Findings
Humans outperform GPT models in social intelligence tasks.
GPT models only exhibit basic social intelligence (order 0).
Humans demonstrate higher adaptability and generalization in social reasoning.
Abstract
Facing the current debate on whether Large Language Models (LLMs) attain near-human intelligence levels (Mitchell & Krakauer, 2023; Bubeck et al., 2023; Kosinski, 2023; Shiffrin & Mitchell, 2023; Ullman, 2023), the current study introduces a benchmark for evaluating social intelligence, one of the most distinctive aspects of human cognition. We developed a comprehensive theoretical framework for social dynamics and introduced two evaluation tasks: Inverse Reasoning (IR) and Inverse Inverse Planning (IIP). Our approach also encompassed a computational model based on recursive Bayesian inference, adept at elucidating diverse human behavioral patterns. Extensive experiments and detailed analyses revealed that humans surpassed the latest GPT models in overall performance, zero-shot learning, one-shot generalization, and adaptability to multi-modalities. Notably, GPT models demonstrated…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsCognitive Science and Mapping
MethodsRefunds@Expedia|||How do I get a full refund from Expedia? · Attention Is All You Need · Dense Connections · Cosine Annealing · Linear Layer · Weight Decay · Linear Warmup With Cosine Annealing · Residual Connection · Byte Pair Encoding · Adam
