LLaVA-Chef: A Multi-modal Generative Model for Food Recipes

Fnu Mohbat; Mohammed J. Zaki

arXiv:2408.16889·cs.CL·September 2, 2024

LLaVA-Chef: A Multi-modal Generative Model for Food Recipes

Fnu Mohbat, Mohammed J. Zaki

PDF

1 Repo

TL;DR

LLaVA-Chef is a multi-modal generative model trained specifically on food recipes, combining visual and textual data to generate detailed and accurate recipes, advancing the application of large language models in culinary domains.

Contribution

The paper introduces LLaVA-Chef, a novel multi-stage training approach that adapts large language models to the food domain for improved recipe generation.

Findings

01

LLaVA-Chef outperforms existing models in recipe detail and ingredient accuracy.

02

The model effectively integrates visual food images with textual recipe generation.

03

Enhanced linguistic quality of generated recipes through a custom loss function.

Abstract

In the rapidly evolving landscape of online recipe sharing within a globalized context, there has been a notable surge in research towards comprehending and generating food recipes. Recent advancements in large language models (LLMs) like GPT-2 and LLaVA have paved the way for Natural Language Processing (NLP) approaches to delve deeper into various facets of food-related tasks, encompassing ingredient recognition and comprehensive recipe generation. Despite impressive performance and multi-modal adaptability of LLMs, domain-specific training remains paramount for their effective application. This work evaluates existing LLMs for recipe generation and proposes LLaVA-Chef, a novel model trained on a curated dataset of diverse recipe prompts in a multi-stage approach. First, we refine the mapping of visual food image embeddings to the language space. Second, we adapt LLaVA to the food…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

mohbattharani/LLaVA-Chef
pytorchOfficial

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

MethodsRefunds@Expedia|||How do I get a full refund from Expedia? · Linear Layer · Multi-Head Attention · Cosine Annealing · Byte Pair Encoding · Softmax · Dropout · Adam · Layer Normalization · Weight Decay