Generated Reality: Human-centric World Simulation using Interactive Video Generation with Hand and Camera Control

Linxi Xie; Lisong C. Sun; Ashley Neall; Tong Wu; Shengqu Cai; Gordon Wetzstein

arXiv:2602.18422·cs.CV·February 23, 2026

Generated Reality: Human-centric World Simulation using Interactive Video Generation with Hand and Camera Control

Linxi Xie, Lisong C. Sun, Ashley Neall, Tong Wu, Shengqu Cai, Gordon Wetzstein

PDF

Open Access

TL;DR

This paper presents a human-centric video world model conditioned on head and hand poses, enabling interactive egocentric virtual environments with improved control and task performance.

Contribution

It introduces a novel conditioning mechanism for 3D head and hand control in video diffusion models, enhancing embodied interaction in XR environments.

Findings

01

Higher perceived control in virtual actions

02

Improved task performance with the system

03

Effective 3D hand and head pose conditioning

Abstract

Extended reality (XR) demands generative models that respond to users' tracked real-world motion, yet current video world models accept only coarse control signals such as text or keyboard input, limiting their utility for embodied interaction. We introduce a human-centric video world model that is conditioned on both tracked head pose and joint-level hand poses. For this purpose, we evaluate existing diffusion transformer conditioning strategies and propose an effective mechanism for 3D head and hand control, enabling dexterous hand--object interactions. We train a bidirectional video diffusion model teacher using this strategy and distill it into a causal, interactive system that generates egocentric virtual environments. We evaluate this generated reality system with human subjects and demonstrate improved task performance as well as a significantly higher level of perceived amount…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsHuman Motion and Animation · Virtual Reality Applications and Impacts · Hand Gesture Recognition Systems