Cook and Clean Together: Teaching Embodied Agents for Parallel Task Execution

Dingkang Liang; Cheng Zhang; Xiaopeng Xu; Jianzhong Ju; Zhenbo Luo; Xiang Bai

arXiv:2511.19430·cs.CV·November 25, 2025

Cook and Clean Together: Teaching Embodied Agents for Parallel Task Execution

Dingkang Liang, Cheng Zhang, Xiaopeng Xu, Jianzhong Ju, Zhenbo Luo, Xiang Bai

PDF

Open Access 1 Datasets 1 Video

TL;DR

This paper introduces ORS3D, a new task and dataset for embodied AI that emphasizes efficient parallel task scheduling using 3D grounding and OR knowledge, and proposes the GRANT model to address this challenge.

Contribution

The paper presents ORS3D-60K dataset and GRANT model, integrating OR knowledge, 3D grounding, and scheduling for embodied agents, which is a novel approach in task planning.

Findings

01

GRANT effectively generates efficient schedules and grounded actions.

02

ORS3D-60K enables large-scale research on parallel task execution.

03

Experiments show improved efficiency and understanding in embodied AI tasks.

Abstract

Task scheduling is critical for embodied AI, enabling agents to follow natural language instructions and execute actions efficiently in 3D physical worlds. However, existing datasets often simplify task planning by ignoring operations research (OR) knowledge and 3D spatial grounding. In this work, we propose Operations Research knowledge-based 3D Grounded Task Scheduling (ORS3D), a new task that requires the synergy of language understanding, 3D grounding, and efficiency optimization. Unlike prior settings, ORS3D demands that agents minimize total completion time by leveraging parallelizable subtasks, e.g., cleaning the sink while the microwave operates. To facilitate research on ORS3D, we construct ORS3D-60K, a large-scale dataset comprising 60K composite tasks across 4K real-world scenes. Furthermore, we propose GRANT, an embodied multi-modal large language model equipped with a…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Datasets

H-EmbodVis/ORS3D-60K
dataset· 38 dl
38 dl

Videos

Cook and Clean Together: Teaching Embodied Agents for Parallel Task Execution· underline

Taxonomy

TopicsMultimodal Machine Learning Applications · Reinforcement Learning in Robotics · Robot Manipulation and Learning