Dynamic Worlds, Dynamic Humans: Generating Virtual Human-Scene Interaction Motion in Dynamic Scenes

Yin Wang; Zhiying Leng; Haitian Liu; Frederick W. B. Li; Mu Li; Xiaohui Liang

arXiv:2601.19484·cs.CV·January 28, 2026

Dynamic Worlds, Dynamic Humans: Generating Virtual Human-Scene Interaction Motion in Dynamic Scenes

Yin Wang, Zhiying Leng, Haitian Liu, Frederick W. B. Li, Mu Li, Xiaohui Liang

PDF

Open Access

TL;DR

This paper introduces Dyn-HSI, a comprehensive cognitive architecture for virtual humans that interact dynamically with changing scenes, incorporating perception, memory, and control modules to improve motion realism and adaptability.

Contribution

The paper presents the first dynamic human-scene interaction model that integrates perception, memory, and diffusion-based control, along with a new dynamic benchmark dataset.

Findings

01

Outperforms existing methods in dynamic scene interaction quality

02

Enhances motion realism and generalization in virtual humans

03

Provides a new benchmark for dynamic human-scene interaction

Abstract

Scenes are continuously undergoing dynamic changes in the real world. However, existing human-scene interaction generation methods typically treat the scene as static, which deviates from reality. Inspired by world models, we introduce Dyn-HSI, the first cognitive architecture for dynamic human-scene interaction, which endows virtual humans with three humanoid components. (1)Vision (human eyes): we equip the virtual human with a Dynamic Scene-Aware Navigation, which continuously perceives changes in the surrounding environment and adaptively predicts the next waypoint. (2)Memory (human brain): we equip the virtual human with a Hierarchical Experience Memory, which stores and updates experiential data accumulated during training. This allows the model to leverage prior knowledge during inference for context-aware motion priming, thereby enhancing both motion quality and generalization.…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsHuman Motion and Animation · Multimodal Machine Learning Applications · Human Pose and Action Recognition