CookBench: A Long-Horizon Embodied Planning Benchmark for Complex Cooking Scenarios

Muzhen Cai; Xiubo Chen; Yining An; Jiaxin Zhang; Xuesong Wang; Wang Xu; Weinan Zhang; Ting Liu

arXiv:2508.03232·cs.RO·August 6, 2025

CookBench: A Long-Horizon Embodied Planning Benchmark for Complex Cooking Scenarios

Muzhen Cai, Xiubo Chen, Yining An, Jiaxin Zhang, Xuesong Wang, Wang Xu, Weinan Zhang, Ting Liu

PDF

TL;DR

CookBench is a new benchmark designed for long-horizon, complex cooking tasks in embodied AI, featuring high-fidelity simulation, refined action granularity, and tools for high-level planning and decision-making.

Contribution

This paper introduces CookBench, a comprehensive benchmark with a realistic simulation environment and detailed action primitives for complex cooking scenarios, addressing limitations of existing benchmarks.

Findings

01

State-of-the-art models struggle with long-horizon tasks

02

CookBench enables evaluation of high-level planning in embodied AI

03

The benchmark is open-sourced for future research

Abstract

Embodied Planning is dedicated to the goal of creating agents capable of executing long-horizon tasks in complex physical worlds. However, existing embodied planning benchmarks frequently feature short-horizon tasks and coarse-grained action primitives. To address this challenge, we introduce CookBench, a benchmark for long-horizon planning in complex cooking scenarios. By leveraging a high-fidelity simulation environment built upon the powerful Unity game engine, we define frontier AI challenges in a complex, realistic environment. The core task in CookBench is designed as a two-stage process. First, in Intention Recognition, an agent needs to accurately parse a user's complex intent. Second, in Embodied Interaction, the agent should execute the identified cooking goal through a long-horizon, fine-grained sequence of physical actions. Unlike existing embodied planning benchmarks, we…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.