Automatic Cognitive Task Generation for In-Situ Evaluation of Embodied Agents

Xinyi He; Ying Yang; Chuanjian Fu; Sihan Guo; Songchun Zhu; Lifeng Fan; Zhenliang Zhang; Yujia Peng

arXiv:2602.05249·cs.AI·February 6, 2026

Automatic Cognitive Task Generation for In-Situ Evaluation of Embodied Agents

Xinyi He, Ying Yang, Chuanjian Fu, Sihan Guo, Songchun Zhu, Lifeng Fan, Zhenliang Zhang, Yujia Peng

PDF

Open Access

TL;DR

This paper introduces TEA, a dynamic method for generating diverse, scene-specific in-situ tasks for embodied agents in unseen environments, enabling more realistic evaluation of their capabilities.

Contribution

We propose a novel two-stage task generation system, TEA, that creates and evolves in-situ tasks without external data, improving evaluation realism for embodied agents.

Findings

01

TEA generated 87,876 tasks across 10 unseen scenes.

02

Human verification confirmed tasks were reasonable and comprehensive.

03

State-of-the-art models perform poorly on in-situ tasks despite good benchmark results.

Abstract

As general intelligent agents are poised for widespread deployment in diverse households, evaluation tailored to each unique unseen 3D environment has become a critical prerequisite. However, existing benchmarks suffer from severe data contamination and a lack of scene specificity, inadequate for assessing agent capabilities in unseen settings. To address this, we propose a dynamic in-situ task generation method for unseen environments inspired by human cognition. We define tasks through a structured graph representation and construct a two-stage interaction-evolution task generation system for embodied agents (TEA). In the interaction stage, the agent actively interacts with the environment, creating a loop between task execution and generation that allows for continuous task generation. In the evolution stage, task graph modeling allows us to recombine and reuse existing tasks to…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsSocial Robot Interaction and HRI · Multimodal Machine Learning Applications · Reinforcement Learning in Robotics