SCENE: Recognizing Social Norms and Sanctioning in Group Chats
Mateusz Jacniacki, Maksymilian Bilski

TL;DR
SCENE is a benchmark for evaluating LLM agents' ability to recognize and adapt to implicit social norms and sanctions in multi-party chat environments.
Contribution
It introduces a novel benchmark with evaluation metrics for social norm recognition and adaptation in LLM-based agents.
Findings
Claude Opus 4.7 and Gemini 3.1 Pro outperform open-weight models in norm adaptation.
SCENE provides a new way to assess social capabilities of LLMs in dynamic interactions.
The benchmark emphasizes the importance of social responsiveness in AI chat agents.
Abstract
Online group chats are social spaces with implicit behavior patterns that, when broken, are often met with social sanctioning from the group. The ability and willingness of LLM-based agents to recognize and adapt to these norms remains mostly unexplored. We introduce SCENE, a social-interaction benchmark focused on implicit norms and social sanctioning in multi-party chat. SCENE generates plausible non-roleplay scenarios with scripted personas that follow a hidden norm, create opportunities for the subject agent to violate it, and sanction breaches when they occur. We further propose behavioral evaluation metrics for two functional adaptation abilities: responsiveness to negative sanctioning, and adapting norm from peers behavior. We evaluate six frontier and open-weight models on SCENE. Our results show that Claude Opus 4.7 and Gemini 3.1 Pro adapt to implicit norms significantly more…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
