Omni-MMSI: Toward Identity-attributed Social Interaction Understanding

Xinpeng Li; Bolin Lai; Hardy Chen; Shijian Deng; Cihang Xie; Yuyin Zhou; James Matthew Rehg; Yapeng Tian

arXiv:2604.00267·cs.CV·April 2, 2026

Omni-MMSI: Toward Identity-attributed Social Interaction Understanding

Xinpeng Li, Bolin Lai, Hardy Chen, Shijian Deng, Cihang Xie, Yuyin Zhou, James Matthew Rehg, Yapeng Tian

PDF

1 Repo 1 Datasets

TL;DR

Omni-MMSI introduces a new task for AI to understand social interactions from raw multi-modal data, emphasizing identity attribution and reasoning, and proposes a reference-guided pipeline that outperforms existing models.

Contribution

The paper presents Omni-MMSI-R, a novel reference-guided pipeline for identity attribution and social reasoning from raw multi-modal data, addressing limitations of prior methods.

Findings

01

Omni-MMSI-R outperforms existing LLMs and methods on the Omni-MMSI dataset.

02

Constructed participant-level reference pairs and curated reasoning annotations.

03

Demonstrated improved social interaction understanding from raw data.

Abstract

We introduce Omni-MMSI, a new task that requires comprehensive social interaction understanding from raw audio, vision, and speech input. The task involves perceiving identity-attributed social cues (e.g., who is speaking what) and reasoning about the social interaction (e.g., whom the speaker refers to). This task is essential for developing AI assistants that can perceive and respond to human interactions. Unlike prior studies that operate on oracle-preprocessed social cues, Omni-MMSI reflects realistic scenarios where AI assistants must perceive and reason from raw data. However, existing pipelines and multi-modal LLMs perform poorly on Omni-MMSI because they lack reliable identity attribution capabilities, which leads to inaccurate social interaction understanding. To address this challenge, we propose Omni-MMSI-R, a reference-guided pipeline that produces identity-attributed social…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

https://sampson-lee.github.io/omni-mmsi-project-page
github

Datasets

Xinpeng-Li/Omni_MMSI
dataset· 74 dl
74 dl

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.