On the Evaluation of Generative Robotic Simulations
Feng Chen, Botian Xu, Pu Hua, Peiqi Duan, Yanchao Yang, Yi Ma, Huazhe, Xu

TL;DR
This paper introduces a comprehensive evaluation framework for generative robotic simulations, focusing on quality, diversity, and generalization, validated through experiments aligning with human assessments.
Contribution
It proposes a novel, multi-faceted evaluation framework specifically designed for generative robotic tasks, addressing a key challenge in the field.
Findings
Metrics for quality and diversity can be optimized separately.
No single method excels across all evaluation metrics.
Current models face significant challenges in zero-shot generalization.
Abstract
Due to the difficulty of acquiring extensive real-world data, robot simulation has become crucial for parallel training and sim-to-real transfer, highlighting the importance of scalable simulated robotic tasks. Foundation models have demonstrated impressive capacities in autonomously generating feasible robotic tasks. However, this new paradigm underscores the challenge of adequately evaluating these autonomously generated tasks. To address this, we propose a comprehensive evaluation framework tailored to generative simulations. Our framework segments evaluation into three core aspects: quality, diversity, and generalization. For single-task quality, we evaluate the realism of the generated task and the completeness of the generated trajectories using large language models and vision-language models. In terms of diversity, we measure both task and data diversity through text similarity…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsManufacturing Process and Optimization · Modular Robots and Swarm Intelligence
MethodsFocus
