SIMS: Simulating Stylized Human-Scene Interactions with Retrieval-Augmented Script Generation
Wenjia Wang, Liang Pan, Zhiyang Dou, Jidong Mei, Zhouyingcheng Liao,, Yuke Lou, Yifan Wu, Lei Yang, Jingbo Wang, Taku Komura

TL;DR
This paper presents SIMS, a hierarchical framework that combines retrieval-augmented script generation with a physics-based control policy to simulate diverse, stylized human-scene interactions with high physical plausibility.
Contribution
The paper introduces a novel hierarchical approach that integrates large language model-based script generation with a multi-condition control policy for realistic and stylized human-scene interaction simulation.
Findings
Outperforms previous methods in task execution and generalization
Generates diverse and stylized human motions
Provides comprehensive datasets for HSI simulation
Abstract
Simulating stylized human-scene interactions (HSI) in physical environments is a challenging yet fascinating task. Prior works emphasize long-term execution but fall short in achieving both diverse style and physical plausibility. To tackle this challenge, we introduce a novel hierarchical framework named SIMS that seamlessly bridges highlevel script-driven intent with a low-level control policy, enabling more expressive and diverse human-scene interactions. Specifically, we employ Large Language Models with Retrieval-Augmented Generation (RAG) to generate coherent and diverse long-form scripts, providing a rich foundation for motion planning. A versatile multicondition physics-based control policy is also developed, which leverages text embeddings from the generated scripts to encode stylistic cues, simultaneously perceiving environmental geometries and accomplishing task goals. By…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsHuman Motion and Animation · Computer Graphics and Visualization Techniques · 3D Shape Modeling and Analysis
MethodsAttention Is All You Need · Attention Dropout · Residual Connection · Linear Warmup With Linear Decay · Linear Layer · Adam · WordPiece · Layer Normalization · Byte Pair Encoding · Softmax
