CookBench: A Long-Horizon Embodied Planning Benchmark for Complex Cooking Scenarios
Muzhen Cai, Xiubo Chen, Yining An, Jiaxin Zhang, Xuesong Wang, Wang Xu, Weinan Zhang, Ting Liu

TL;DR
CookBench is a new benchmark designed for long-horizon, complex cooking tasks in embodied AI, featuring high-fidelity simulation, refined action granularity, and tools for high-level planning and decision-making.
Contribution
This paper introduces CookBench, a comprehensive benchmark with a realistic simulation environment and detailed action primitives for complex cooking scenarios, addressing limitations of existing benchmarks.
Findings
State-of-the-art models struggle with long-horizon tasks
CookBench enables evaluation of high-level planning in embodied AI
The benchmark is open-sourced for future research
Abstract
Embodied Planning is dedicated to the goal of creating agents capable of executing long-horizon tasks in complex physical worlds. However, existing embodied planning benchmarks frequently feature short-horizon tasks and coarse-grained action primitives. To address this challenge, we introduce CookBench, a benchmark for long-horizon planning in complex cooking scenarios. By leveraging a high-fidelity simulation environment built upon the powerful Unity game engine, we define frontier AI challenges in a complex, realistic environment. The core task in CookBench is designed as a two-stage process. First, in Intention Recognition, an agent needs to accurately parse a user's complex intent. Second, in Embodied Interaction, the agent should execute the identified cooking goal through a long-horizon, fine-grained sequence of physical actions. Unlike existing embodied planning benchmarks, we…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
