Learning Program Representations for Food Images and Cooking Recipes
Dim P. Papadopoulos, Enrique Mora, Nadiia Chepurko, Kuan Wei Huang,, Ferda Ofli, Antonio Torralba

TL;DR
This paper introduces a novel approach to model cooking recipes and food images as structured programs, enabling improved cross-modal retrieval, recipe understanding, and image generation through joint embedding and program manipulation.
Contribution
It proposes representing recipes and food images as cooking programs with a joint embedding model trained via self-supervision, enhancing retrieval, recognition, and image synthesis capabilities.
Findings
Embedding-based retrieval outperforms previous methods.
Program generation improves recipe recognition accuracy.
Manipulating programs enables realistic food image synthesis.
Abstract
In this paper, we are interested in modeling a how-to instructional procedure, such as a cooking recipe, with a meaningful and rich high-level representation. Specifically, we propose to represent cooking recipes and food images as cooking programs. Programs provide a structured representation of the task, capturing cooking semantics and sequential relationships of actions in the form of a graph. This allows them to be easily manipulated by users and executed by agents. To this end, we build a model that is trained to learn a joint embedding between recipes and food images via self-supervision and jointly generate a program from this embedding as a sequence. To validate our idea, we crowdsource programs for cooking recipes and show that: (a) projecting the image-recipe embeddings into programs leads to better cross-modal retrieval results; (b) generating programs from images leads to…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsSoftware Engineering Research
