Social Norm Reasoning in Multimodal Language Models: An Evaluation
Oishik Chowdhury, Anushka Debnath, Bastin Tony Roy Savarimuthu

TL;DR
This paper evaluates the ability of multimodal large language models to reason about social norms in text and image scenarios, revealing strengths in text-based reasoning and challenges with complex norms.
Contribution
It introduces an evaluation of five MLLMs' norm reasoning capabilities across multimodal social scenarios, highlighting their comparative performance and limitations.
Findings
MLLMs perform better on text than images.
GPT-4o shows the strongest norm reasoning ability.
Complex norms remain challenging for all models.
Abstract
In Multi-Agent Systems (MAS), agents are designed with social capabilities, allowing them to understand and reason about social concepts such as norms when interacting with others (e.g., inter-robot interactions). In Normative MAS (NorMAS), researchers study how norms develop, and how violations are detected and sanctioned. However, existing research in NorMAS use symbolic approaches (e.g., formal logic) for norm representation and reasoning whose application is limited to simplified environments. In contrast, Multimodal Large Language Models (MLLMs) present promising possibilities to develop software used by robots to identify and reason about norms in a wide variety of complex social situations embodied in text and images. However, prior work on norm reasoning have been limited to text-based scenarios. This paper investigates the norm reasoning competence of five MLLMs by evaluating…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsMultimodal Machine Learning Applications · Explainable Artificial Intelligence (XAI) · Topic Modeling
