Where Norms and References Collide: Evaluating LLMs on Normative Reasoning
Mitchell Abrams, Kaveh Eskandari Miandoab, Felix Gervits, Vasanth Sarathy, Matthias Scheutz

TL;DR
This paper evaluates whether large language models can understand and apply social norms in contextually grounded tasks, revealing significant limitations in their ability to handle implicit, underspecified, or conflicting norms.
Contribution
Introduces SNIC, a diagnostic testbed for assessing LLMs' ability to reason over social norms in embodied, real-world scenarios, highlighting current model shortcomings.
Findings
LLMs struggle with implicit norms
Models have difficulty with underspecified norms
Conflicting norms pose challenges for LLMs
Abstract
Embodied agents, such as robots, will need to interact in situated environments where successful communication often depends on reasoning over social norms: shared expectations that constrain what actions are appropriate in context. A key capability in such settings is norm-based reference resolution (NBRR), where interpreting referential expressions requires inferring implicit normative expectations grounded in physical and social context. Yet it remains unclear whether Large Language Models (LLMs) can support this kind of reasoning. In this work, we introduce SNIC (Situated Norms in Context), a human-validated diagnostic testbed designed to probe how well state-of-the-art LLMs can extract and utilize normative principles relevant to NBRR. SNIC emphasizes physically grounded norms that arise in everyday tasks such as cleaning, tidying, and serving. Across a range of controlled…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
Taxonomy
TopicsMultimodal Machine Learning Applications · Explainable Artificial Intelligence (XAI) · Social Robot Interaction and HRI
