SocialVeil: Probing Social Intelligence of Language Agents under Communication Barriers

Keyang Xuan; Pengda Wang; Chongrui Ye; Haofei Yu; Tal August; Jiaxuan You

arXiv:2602.05115·cs.AI·February 6, 2026

SocialVeil: Probing Social Intelligence of Language Agents under Communication Barriers

Keyang Xuan, Pengda Wang, Chongrui Ye, Haofei Yu, Tal August, Jiaxuan You

PDF

Open Access 3 Reviews

TL;DR

SocialVeil introduces a realistic social interaction environment for language models, simulating communication barriers like semantic vagueness and cultural mismatch, to better evaluate their social intelligence in imperfect settings.

Contribution

This paper presents SocialVeil, a novel environment with barrier simulations and evaluation metrics, to assess LLM social intelligence under communication disruptions, addressing limitations of prior idealized benchmarks.

Findings

01

Barriers significantly reduce mutual understanding by over 45%.

02

Confusion levels increase by nearly 50% under communication barriers.

03

Human evaluations confirm the fidelity of simulated barriers.

Abstract

Large language models (LLMs) are increasingly evaluated in interactive environments to test their social intelligence. However, existing benchmarks often assume idealized communication between agents, limiting our ability to diagnose whether LLMs can maintain and repair interactions in more realistic, imperfect settings. To close this gap, we present \textsc{SocialVeil}, a social learning environment that can simulate social interaction under cognitive-difference-induced communication barriers. Grounded in a systematic literature review of communication challenges in human interaction, \textsc{SocialVeil} introduces three representative types of such disruption, \emph{semantic vagueness}, \emph{sociocultural mismatch}, and \emph{emotional interference}. We also introduce two barrier-aware evaluation metrics, \emph{unresolved confusion} and \emph{mutual understanding}, to evaluate…

Peer Reviews

Decision·Submitted to ICLR 2026

Reviewer 01Rating 4Confidence 3

Strengths

Novel Contribution: The core idea is highly relevant and timely. Moving beyond idealized "seamless" interaction to study how agents handle communication breakdowns is a critical step toward more robust and socially-aware AI. The focus on structured, cognitive barriers, as opposed to simple noise, is a significant conceptual advance. Rigorous and Well-Structured Framework: The methodology is well-designed. The barrier taxonomy is theoretically grounded in literature from pragmatics, sociolinguis

Weaknesses

Statistical Reporting Could Be Enhanced: While the results are presented clearly in tables and figures, the paper would be strengthened by more formal statistical testing. Table 2: The reported performance drops are descriptive (averages). Statistical significance tests (e.g., paired t-tests between baseline and each barrier condition for each metric/model) would solidify the claim that barriers "consistently impair" performance. Table 3: The comparison between Base, Repair, and (BC+SR) conditio

Reviewer 02Rating 4Confidence 3

Strengths

1. This paper introduces a barrier-aware social interaction environment (SOCIALVEIL) that systematically embeds realistic communication disruptions to evaluate LLM social intelligence. 2. The paper proposes a comprehensive, automated evaluation protocol and verifies its fidelity through extensive human studies, showing strong metric alignment and reproducibility. 3. The experiment results and analysis demonstrate that communication barriers substantially impair LLMs’ mutual understanding and rel

Weaknesses

1. Evaluation: Barriers are injected with one model and GPT-4o is used as the automatic evaluator. This raises concerns about evaluator bias/overfitting to its own stylistic expectations. An ablation with multiple evaluators would strengthen claims. 2. Dataset: Generalization beyond SOTOPIA. All scenarios are adapted from SOTOPIA; it remains unclear how well the findings transfer to other interactive corpora or human-in-the-loop settings. 3. Dataset: Limited Data Points: 180 episodes for each ba

Reviewer 03Rating 2Confidence 4

Strengths

* The motivation for studying social intelligence in noisy or ambiguous communication settings is interesting. * The paper is clearly written and organized. * The implementation of different communication barriers is creative and could be useful for exploratory studies.

Weaknesses

1. The proposed benchmark is mainly constructed by manually designing prompting templates that inject vagueness, cultural mismatch, or emotional bias into conversations. There is no new model, algorithm, or principled framework. The whole approach remains at the level of prompt engineering rather than a genuine methodological advance in measuring social intelligence. 2. The study does not introduce a novel metric, learning method, or theoretical insight. Most of the results simply confirm what

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsMultimodal Machine Learning Applications · Artificial Intelligence in Healthcare and Education · Topic Modeling