ONE-SHOT: Compositional Human-Environment Video Synthesis via Spatial-Decoupled Motion Injection and Hybrid Context Integration

Fengyuan Yang; Luying Huang; Jiazhi Guan; Quanwei Yang; Dongwei Pan; Jianglin Fu; Haocheng Feng; Wei He; Kaisiyuan Wang; Hang Zhou; Angela Yao

arXiv:2604.01043·cs.CV·April 2, 2026

ONE-SHOT: Compositional Human-Environment Video Synthesis via Spatial-Decoupled Motion Injection and Hybrid Context Integration

Fengyuan Yang, Luying Huang, Jiazhi Guan, Quanwei Yang, Dongwei Pan, Jianglin Fu, Haocheng Feng, Wei He, Kaisiyuan Wang, Hang Zhou, Angela Yao

PDF

1 Repo

TL;DR

ONE-SHOT introduces a novel, efficient framework for compositional human-environment video synthesis that disentangles signals and maintains consistency over long sequences, outperforming existing methods.

Contribution

The paper proposes a parameter-efficient, disentangled generative process with novel spatial correspondence and hybrid context mechanisms for improved video synthesis.

Findings

01

Outperforms state-of-the-art methods in structural control and diversity.

02

Supports long-horizon, minute-level video synthesis.

03

Introduces a canonical-space injection and Dynamic-Grounded-RoPE for spatial decoupling.

Abstract

Recent advances in Video Foundation Models (VFMs) have revolutionized human-centric video synthesis, yet fine-grained and independent editing of subjects and scenes remains a critical challenge. Recent attempts to incorporate richer environment control through rigid 3D geometric compositions often encounter a stark trade-off between precise control and generative flexibility. Furthermore, the heavy 3D pre-processing still limits practical scalability. In this paper, we propose ONE-SHOT, a parameter-efficient framework for compositional human-environment video generation. Our key insight is to factorize the generative process into disentangled signals. Specifically, we introduce a canonical-space injection mechanism that decouples human dynamics from environmental cues via cross-attention. We also propose Dynamic-Grounded-RoPE, a novel positional embedding strategy that establishes…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

https://martayang.github.io/ONE-SHOT
github

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.