Chain-of-Cooking:Cooking Process Visualization via Bidirectional Chain-of-Thought Guidance
Mengling Xu, Ming Tao, Bing-Kun Bao

TL;DR
This paper introduces Chain-of-Cooking, a novel model for visualizing cooking processes by generating sequential images that are semantically coherent and consistent with recipe steps, using bidirectional guidance and reference patch retrieval.
Contribution
The work proposes a Dynamic Patch Selection Module, a Semantic Evolution Module, and a Bidirectional Chain-of-Thought Guidance to improve cooking process visualization.
Findings
Outperforms existing methods in generating coherent cooking process images.
Achieves better semantic consistency across sequential images.
Demonstrates effectiveness on the new CookViz dataset.
Abstract
Cooking process visualization is a promising task in the intersection of image generation and food analysis, which aims to generate an image for each cooking step of a recipe. However, most existing works focus on generating images of finished foods based on the given recipes, and face two challenges to visualize the cooking process. First, the appearance of ingredients changes variously across cooking steps, it is difficult to generate the correct appearances of foods that match the textual description, leading to semantic inconsistency. Second, the current step might depend on the operations of previous step, it is crucial to maintain the contextual coherence of images in sequential order. In this work, we present a cooking process visualization model, called Chain-of-Cooking. Specifically, to generate correct appearances of ingredients, we present a Dynamic Patch Selection Module to…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
