Realistic Synthetic Household Data Generation at Scale
Siddharth Singh, Ifrah Idrees, Abraham Dauhajre

TL;DR
This paper presents a scalable generative framework for creating realistic synthetic household datasets that model the dynamic interplay between human behavior and environment, supporting Embodied AI research.
Contribution
The framework uniquely models bidirectional influence between human personas and household environments, enabling natural language-driven, large-scale synthetic data generation with rich static and temporal context.
Findings
Synthetic data shows high similarity to real datasets (cosine similarity 0.60)
Intervention analysis confirms measurable environmental and behavioral differences
Framework enables scalable, configurable household data generation for AI development
Abstract
Advancements in foundation models have catalyzed research in Embodied AI to develop interactive agents capable of environmental reasoning and interaction. Developing such agents requires diverse, large-scale datasets. Prior frameworks generate synthetic data for long-term human-robot interactions but fail to model the bidirectional influence between human behavior and household environments. Our proposed generative framework creates household datasets at scale through loosely coupled generation of long-term human-robot interactions and environments. Human personas influence environment generation, while environment schematics and semantics shape human-robot interactions. The generated 3D data includes rich static context such as object and environment semantics, and temporal context capturing human and agent behaviors over extended periods. Our flexible tool allows users to define…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsSocial Robot Interaction and HRI · Multimodal Machine Learning Applications · Persona Design and Applications
