Social Genome: Grounded Social Reasoning Abilities of Multimodal Models
Leena Mathur, Marian Qian, Paul Pu Liang, Louis-Philippe Morency

TL;DR
This paper introduces SOCIAL GENOME, a comprehensive benchmark for evaluating multimodal models' social reasoning abilities using annotated videos and reasoning traces, highlighting current performance gaps.
Contribution
It presents the first benchmark for grounded social reasoning in multimodal models, incorporating external knowledge and detailed evaluation metrics.
Findings
State-of-the-art models show significant performance gaps.
The benchmark enables detailed analysis of reasoning quality.
External knowledge integration remains a challenge.
Abstract
Social reasoning abilities are crucial for AI systems to effectively interpret and respond to multimodal human communication and interaction within social contexts. We introduce SOCIAL GENOME, the first benchmark for fine-grained, grounded social reasoning abilities of multimodal models. SOCIAL GENOME contains 272 videos of interactions and 1,486 human-annotated reasoning traces related to inferences about these interactions. These traces contain 5,777 reasoning steps that reference evidence from visual cues, verbal cues, vocal cues, and external knowledge (contextual knowledge external to videos). SOCIAL GENOME is also the first modeling challenge to study external knowledge in social reasoning. SOCIAL GENOME computes metrics to holistically evaluate semantic and structural qualities of model-generated social reasoning traces. We demonstrate the utility of SOCIAL GENOME through…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
