FIRE: Food Image to REcipe generation
Prateek Chhikara, Dhiraj Chaurasia, Yifan Jiang, Omkar Masur, Filip, Ilievski

TL;DR
FIRE is a novel multimodal system that generates comprehensive recipes from food images by combining advanced vision and language models, enabling applications like personalized recipe customization and automated cooking.
Contribution
The paper introduces FIRE, a new multimodal approach that integrates BLIP, Vision Transformer, and T5 models for end-to-end food image to recipe generation.
Findings
FIRE effectively generates food titles, ingredients, and instructions from images.
The approach demonstrates potential for personalized recipe adaptation.
FIRE enables automated cooking through recipe-to-code transformation.
Abstract
Food computing has emerged as a prominent multidisciplinary field of research in recent years. An ambitious goal of food computing is to develop end-to-end intelligent systems capable of autonomously producing recipe information for a food image. Current image-to-recipe methods are retrieval-based and their success depends heavily on the dataset size and diversity, as well as the quality of learned embeddings. Meanwhile, the emergence of powerful attention-based vision and language models presents a promising avenue for accurate and generalizable recipe generation, which has yet to be extensively explored. This paper proposes FIRE, a novel multimodal methodology tailored to recipe generation in the food computing domain, which generates the food title, ingredients, and cooking instructions based on input food images. FIRE leverages the BLIP model to generate titles, utilizes a Vision…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
FIRE: Food Image to REcipe Generation· youtube
Taxonomy
TopicsAdvanced Image and Video Retrieval Techniques · Image Retrieval and Classification Techniques
MethodsRefunds@Expedia|||How do I get a full refund from Expedia? · Multi-Head Attention · Attention Is All You Need · Linear Layer · Layer Normalization · Gated Linear Unit · Dropout · Byte Pair Encoding · Adam · Position-Wise Feed-Forward Layer
