TL;DR
This paper presents a novel generative simulation framework that synthesizes diverse human-robot interaction scenarios from natural language prompts, enabling zero-shot transfer of learned policies to real-world tasks.
Contribution
It introduces the first generative simulation pipeline for pHRI, automating environment creation, data collection, and policy training using LLMs and VLMs.
Findings
Policies achieve over 80% success rate in real-world tasks
Zero-shot sim-to-real transfer demonstrated for assistive tasks
Synthetic data enables robust vision-based imitation learning
Abstract
Developing autonomous physical human-robot interaction (pHRI) systems is limited by the scarcity of large-scale training data to learn robust robot behaviors for real-world applications. In this paper, we introduce a zero-shot "text2sim2real" generative simulation framework that automatically synthesizes diverse pHRI scenarios from high-level natural-language prompts. Leveraging Large Language Models (LLMs) and Vision-Language Models (VLMs), our pipeline procedurally generates soft-body human models, scene layouts, and robot motion trajectories for assistive tasks. We utilize this framework to autonomously collect large-scale synthetic demonstration datasets and then train vision-based imitation learning policies operating on segmented point clouds. We evaluate our approach through a user study on two physically assistive tasks: scratching and bathing. Our learned policies successfully…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
