Automatic Cognitive Task Generation for In-Situ Evaluation of Embodied Agents
Xinyi He, Ying Yang, Chuanjian Fu, Sihan Guo, Songchun Zhu, Lifeng Fan, Zhenliang Zhang, Yujia Peng

TL;DR
This paper introduces TEA, a dynamic method for generating diverse, scene-specific in-situ tasks for embodied agents in unseen environments, enabling more realistic evaluation of their capabilities.
Contribution
We propose a novel two-stage task generation system, TEA, that creates and evolves in-situ tasks without external data, improving evaluation realism for embodied agents.
Findings
TEA generated 87,876 tasks across 10 unseen scenes.
Human verification confirmed tasks were reasonable and comprehensive.
State-of-the-art models perform poorly on in-situ tasks despite good benchmark results.
Abstract
As general intelligent agents are poised for widespread deployment in diverse households, evaluation tailored to each unique unseen 3D environment has become a critical prerequisite. However, existing benchmarks suffer from severe data contamination and a lack of scene specificity, inadequate for assessing agent capabilities in unseen settings. To address this, we propose a dynamic in-situ task generation method for unseen environments inspired by human cognition. We define tasks through a structured graph representation and construct a two-stage interaction-evolution task generation system for embodied agents (TEA). In the interaction stage, the agent actively interacts with the environment, creating a loop between task execution and generation that allows for continuous task generation. In the evolution stage, task graph modeling allows us to recombine and reuse existing tasks to…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsSocial Robot Interaction and HRI · Multimodal Machine Learning Applications · Reinforcement Learning in Robotics
