SocialMirror: Reconstructing 3D Human Interaction Behaviors from Monocular Videos with Semantic and Geometric Guidance

Qi Xia; Peishan Cong; Ziyi Wang; Yujing Sun; Qin Sun; Xinge Zhu; Mao Ye; Ruigang Yang; Yuexin Ma

arXiv:2604.13581·cs.CV·April 16, 2026

SocialMirror: Reconstructing 3D Human Interaction Behaviors from Monocular Videos with Semantic and Geometric Guidance

Qi Xia, Peishan Cong, Ziyi Wang, Yujing Sun, Qin Sun, Xinge Zhu, Mao Ye, Ruigang Yang, Yuexin Ma

PDF

TL;DR

SocialMirror is a diffusion-based framework that reconstructs 3D human interactions from monocular videos by integrating semantic cues and geometric constraints, overcoming occlusion and ambiguity challenges.

Contribution

It introduces a novel semantic-guided motion infiller and a sequence-level temporal refiner for improved 3D human interaction reconstruction.

Findings

01

Achieves state-of-the-art performance on multiple benchmarks.

02

Demonstrates strong generalization to unseen datasets and in-the-wild scenarios.

03

Effectively handles occlusions and local pose ambiguities.

Abstract

Accurately reconstructing human behavior in close-interaction scenarios is crucial for enabling realistic virtual interactions in augmented reality, precise motion analysis in sports, and natural collaborative behavior in human-robot tasks. Reliable reconstruction in these contexts significantly enhances the realism and effectiveness of AI-driven interactive applications. However, human reconstruction from monocular videos in close-interaction scenarios remains challenging due to severe mutual occlusions, leading local motion ambiguity, disrupted temporal continuity and spatial relationship error. In this paper, we propose SocialMirror, a diffusion-based framework that integrates semantic and geometric cues to effectively address these issues. Specifically, we first leverage high-level interaction descriptions generated by a vision-language model to guide a semantic-guided motion…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.