How Social is It? A Benchmark for LLMs' Capabilities in Multi-user Multi-turn Social Agent Tasks
Yusen Wu, Junwu Xiong, Xiaotie Deng

TL;DR
This paper introduces a new benchmark called How Social Is It (HSII) to systematically evaluate large language models' social interaction skills in multi-user, multi-turn scenarios, addressing a key gap in current assessments.
Contribution
The paper presents a sociologically grounded framework and a comprehensive benchmark for measuring LLMs' social capabilities in complex multi-user tasks, including a dataset and evaluation metrics.
Findings
Benchmark effectively assesses LLM social skills.
Chain of thought improves LLM social performance.
COT-complexity quantifies efficiency and correctness trade-offs.
Abstract
Expanding the application of large language models (LLMs) to societal life, instead of primary function only as auxiliary assistants to communicate with only one person at a time, necessitates LLMs' capabilities to independently play roles in multi-user, multi-turn social agent tasks within complex social settings. However, currently the capability has not been systematically measured with available benchmarks. To address this gap, we first introduce an agent task leveling framework grounded in sociological principles. Concurrently, we propose a novel benchmark, How Social Is It (we call it HSII below), designed to assess LLM's social capabilities in comprehensive social agents tasks and benchmark representative models. HSII comprises four stages: format parsing, target selection, target switching conversation, and stable conversation, which collectively evaluate the communication and…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsTopic Modeling · Artificial Intelligence in Healthcare and Education · Multimodal Machine Learning Applications
