How Social is It? A Benchmark for LLMs' Capabilities in Multi-user   Multi-turn Social Agent Tasks

Yusen Wu; Junwu Xiong; Xiaotie Deng

arXiv:2505.04628·cs.CL·May 9, 2025

How Social is It? A Benchmark for LLMs' Capabilities in Multi-user Multi-turn Social Agent Tasks

Yusen Wu, Junwu Xiong, Xiaotie Deng

PDF

Open Access

TL;DR

This paper introduces a new benchmark called How Social Is It (HSII) to systematically evaluate large language models' social interaction skills in multi-user, multi-turn scenarios, addressing a key gap in current assessments.

Contribution

The paper presents a sociologically grounded framework and a comprehensive benchmark for measuring LLMs' social capabilities in complex multi-user tasks, including a dataset and evaluation metrics.

Findings

01

Benchmark effectively assesses LLM social skills.

02

Chain of thought improves LLM social performance.

03

COT-complexity quantifies efficiency and correctness trade-offs.

Abstract

Expanding the application of large language models (LLMs) to societal life, instead of primary function only as auxiliary assistants to communicate with only one person at a time, necessitates LLMs' capabilities to independently play roles in multi-user, multi-turn social agent tasks within complex social settings. However, currently the capability has not been systematically measured with available benchmarks. To address this gap, we first introduce an agent task leveling framework grounded in sociological principles. Concurrently, we propose a novel benchmark, How Social Is It (we call it HSII below), designed to assess LLM's social capabilities in comprehensive social agents tasks and benchmark representative models. HSII comprises four stages: format parsing, target selection, target switching conversation, and stable conversation, which collectively evaluate the communication and…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsTopic Modeling · Artificial Intelligence in Healthcare and Education · Multimodal Machine Learning Applications