SIV-Bench: A Video Benchmark for Social Interaction Understanding and Reasoning

Fanqi Kong; Weiqin Zu; Xinyu Chen; Yaodong Yang; Song-Chun Zhu; Xue Feng

arXiv:2506.05425·cs.CV·April 29, 2026

SIV-Bench: A Video Benchmark for Social Interaction Understanding and Reasoning

Fanqi Kong, Weiqin Zu, Xinyu Chen, Yaodong Yang, Song-Chun Zhu, Xue Feng

PDF

1 Repo 1 Datasets

TL;DR

SIV-Bench is a comprehensive video benchmark designed to evaluate multimodal large language models' abilities in social scene understanding, reasoning, and prediction, highlighting current limitations and guiding future research.

Contribution

The paper introduces SIV-Bench, a novel benchmark with diverse videos and questions to systematically assess social interaction understanding in MLLMs.

Findings

01

MLLMs perform well on social scene understanding but struggle with reasoning and prediction.

02

Relation inference remains a key bottleneck in social interaction understanding.

03

Audio and subtitles improve reasoning in social state reasoning and dynamics prediction.

Abstract

Understanding social interaction, which encompasses perceiving numerous and subtle multimodal cues, inferring unobservable mental states and relations, and dynamically predicting others' behavior, is the foundation for achieving human-machine interaction. Despite rapid advances in Multimodal Large Language Models (MLLMs), the rich and multifaceted nature of social interaction has hindered the development of benchmarks that holistically evaluate and guide their social interaction abilities. Based on social relation theory, which has been widely regarded as a foundational framework for understanding social behavior, we provide SIV-Bench, a novel video benchmark for systematically evaluating MLLMs' capabilities across Social Scene Understanding (SSU), Social State Reasoning (SSR), and Social Dynamics Prediction (SDP). SIV-Bench features 2,792 originally collected video clips and 5,455…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

https://kfq20.github.io/sivbench
github

Datasets

Fancylalala/SIV-Bench
dataset· 1.2k dl
1.2k dl

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.