Cook and Clean Together: Teaching Embodied Agents for Parallel Task Execution
Dingkang Liang, Cheng Zhang, Xiaopeng Xu, Jianzhong Ju, Zhenbo Luo, Xiang Bai

TL;DR
This paper introduces ORS3D, a new task and dataset for embodied AI that emphasizes efficient parallel task scheduling using 3D grounding and OR knowledge, and proposes the GRANT model to address this challenge.
Contribution
The paper presents ORS3D-60K dataset and GRANT model, integrating OR knowledge, 3D grounding, and scheduling for embodied agents, which is a novel approach in task planning.
Findings
GRANT effectively generates efficient schedules and grounded actions.
ORS3D-60K enables large-scale research on parallel task execution.
Experiments show improved efficiency and understanding in embodied AI tasks.
Abstract
Task scheduling is critical for embodied AI, enabling agents to follow natural language instructions and execute actions efficiently in 3D physical worlds. However, existing datasets often simplify task planning by ignoring operations research (OR) knowledge and 3D spatial grounding. In this work, we propose Operations Research knowledge-based 3D Grounded Task Scheduling (ORS3D), a new task that requires the synergy of language understanding, 3D grounding, and efficiency optimization. Unlike prior settings, ORS3D demands that agents minimize total completion time by leveraging parallelizable subtasks, e.g., cleaning the sink while the microwave operates. To facilitate research on ORS3D, we construct ORS3D-60K, a large-scale dataset comprising 60K composite tasks across 4K real-world scenes. Furthermore, we propose GRANT, an embodied multi-modal large language model equipped with a…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
Taxonomy
TopicsMultimodal Machine Learning Applications · Reinforcement Learning in Robotics · Robot Manipulation and Learning
