SocialDirector: Training-Free Social Interaction Control for Multi-Person Video Generation
Liangyang Ouyang, Ruicong Liu, Caixin Kang, Yifei Huang, Yoichi Sato

TL;DR
SocialDirector is a training-free control method for multi-person video generation that improves social interaction accuracy by modulating cross-attention, ensuring correct actor actions and targeted interactions.
Contribution
It introduces a novel, training-free interaction control framework with two modules that enhance social interaction fidelity in video generation.
Findings
Significantly improves interaction fidelity in generated videos.
Effectively reduces actor-action mismatch and disordered social dynamics.
Approaches the quality of real videos in social interaction accuracy.
Abstract
Video generation has advanced rapidly, producing photorealistic videos from text or image prompts. Meanwhile, film production and social robotics increasingly demand multi-person videos with rich social interactions, including conversations, gestures, and coordinated actions. However, existing models offer no explicit control over interactions, such as who performs which action, when it occurs, and toward whom it is directed. This often results in wrong person performing unintended actions (actor-action mismatch), disordered social dynamics, and wrong action targets. To address these challenges, we present SocialDirector, a training-free interaction controller that enhances the generation model by modulating cross-attention maps. SocialDirector contains two modules: Social Actor Masking and Directional Reweighting. Social Actor Masking constrains each person's visual tokens to attend…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
