FIRE: Food Image to REcipe generation

Prateek Chhikara; Dhiraj Chaurasia; Yifan Jiang; Omkar Masur; Filip; Ilievski

arXiv:2308.14391·cs.CV·May 14, 2024

FIRE: Food Image to REcipe generation

Prateek Chhikara, Dhiraj Chaurasia, Yifan Jiang, Omkar Masur, Filip, Ilievski

PDF

Open Access 1 Repo 1 Video

TL;DR

FIRE is a novel multimodal system that generates comprehensive recipes from food images by combining advanced vision and language models, enabling applications like personalized recipe customization and automated cooking.

Contribution

The paper introduces FIRE, a new multimodal approach that integrates BLIP, Vision Transformer, and T5 models for end-to-end food image to recipe generation.

Findings

01

FIRE effectively generates food titles, ingredients, and instructions from images.

02

The approach demonstrates potential for personalized recipe adaptation.

03

FIRE enables automated cooking through recipe-to-code transformation.

Abstract

Food computing has emerged as a prominent multidisciplinary field of research in recent years. An ambitious goal of food computing is to develop end-to-end intelligent systems capable of autonomously producing recipe information for a food image. Current image-to-recipe methods are retrieval-based and their success depends heavily on the dataset size and diversity, as well as the quality of learned embeddings. Meanwhile, the emergence of powerful attention-based vision and language models presents a promising avenue for accurate and generalizable recipe generation, which has yet to be extensively explored. This paper proposes FIRE, a novel multimodal methodology tailored to recipe generation in the food computing domain, which generates the food title, ingredients, and cooking instructions based on input food images. FIRE leverages the BLIP model to generate titles, utilizes a Vision…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

prateekchhikara/fire
pytorchOfficial

Videos

FIRE: Food Image to REcipe Generation· youtube

Taxonomy

TopicsAdvanced Image and Video Retrieval Techniques · Image Retrieval and Classification Techniques

MethodsRefunds@Expedia|||How do I get a full refund from Expedia? · Multi-Head Attention · Attention Is All You Need · Linear Layer · Layer Normalization · Gated Linear Unit · Dropout · Byte Pair Encoding · Adam · Position-Wise Feed-Forward Layer