CookingDiffusion: Cooking Procedural Image Generation with Stable Diffusion
Yuan Wang, Bin Zhu, Yanbin Hao, Chong-Wah Ngo, Yi Tan, Xiang Wang

TL;DR
CookingDiffusion introduces a novel method for generating sequential, photo-realistic cooking images aligned with recipe steps, enhancing visual cooking guidance and simulation capabilities.
Contribution
The paper presents CookingDiffusion, a new approach leveraging Stable Diffusion and Memory Nets to generate consistent, high-quality procedural images from cooking recipes and steps.
Findings
Achieves high-quality, sequential cooking images with consistency metrics
Outperforms baselines in FID and procedure consistency
Demonstrates ability to manipulate ingredients and cooking methods
Abstract
Recent advancements in text-to-image generation models have excelled in creating diverse and realistic images. This success extends to food imagery, where various conditional inputs like cooking styles, ingredients, and recipes are utilized. However, a yet-unexplored challenge is generating a sequence of procedural images based on cooking steps from a recipe. This could enhance the cooking experience with visual guidance and possibly lead to an intelligent cooking simulation system. To fill this gap, we introduce a novel task called \textbf{cooking procedural image generation}. This task is inherently demanding, as it strives to create photo-realistic images that align with cooking steps while preserving sequential consistency. To collectively tackle these challenges, we present \textbf{CookingDiffusion}, a novel approach that leverages Stable Diffusion and three innovative Memory Nets…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsImage Retrieval and Classification Techniques
MethodsDiffusion · ALIGN
