CoInteract: Physically-Consistent Human-Object Interaction Video Synthesis via Spatially-Structured Co-Generation

Xiangyang Luo; Xiaozhe Xin; Tao Feng; Xu Guo; Meiguang Jin; Junfeng Ma

arXiv:2604.19636·cs.CV·April 22, 2026

CoInteract: Physically-Consistent Human-Object Interaction Video Synthesis via Spatially-Structured Co-Generation

Xiangyang Luo, Xiaozhe Xin, Tao Feng, Xu Guo, Meiguang Jin, Junfeng Ma

PDF

1 Repo 1 Models

TL;DR

CoInteract is a novel framework for synthesizing human-object interaction videos that enhances structural fidelity and physical plausibility using a dual-stream diffusion transformer with specialized routing.

Contribution

It introduces a Human-Aware Mixture-of-Experts and a Spatially-Structured Co-Generation paradigm for improved HOI video synthesis.

Findings

01

Outperforms existing methods in structural stability.

02

Achieves more realistic and consistent interactions.

03

Maintains zero overhead during inference.

Abstract

Synthesizing human--object interaction (HOI) videos has broad practical value in e-commerce, digital advertising, and virtual marketing. However, current diffusion models, despite their photorealistic rendering capability, still frequently fail on (i) the structural stability of sensitive regions such as hands and faces and (ii) physically plausible contact (e.g., avoiding hand--object interpenetration). We present CoInteract, an end-to-end framework for HOI video synthesis conditioned on a person reference image, a product reference image, text prompts, and speech audio. CoInteract introduces two complementary designs embedded into a Diffusion Transformer (DiT) backbone. First, we propose a Human-Aware Mixture-of-Experts (MoE) that routes tokens to lightweight, region-specialized experts via spatially supervised routing, improving fine-grained structural fidelity with minimal parameter…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

luoxyhappy/CoInteract
github

Models

🤗
georgexin/cointeract
model· 190 dl· ♡ 13
190 dl♡ 13

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.