FoodMLLM-JP: Leveraging Multimodal Large Language Models for Japanese   Recipe Generation

Yuki Imajuku; Yoko Yamakata; Kiyoharu Aizawa

arXiv:2409.18459·cs.CV·March 4, 2025

FoodMLLM-JP: Leveraging Multimodal Large Language Models for Japanese Recipe Generation

Yuki Imajuku, Yoko Yamakata, Kiyoharu Aizawa

PDF

Open Access

TL;DR

This paper demonstrates that fine-tuned open multimodal large language models trained on Japanese recipe data outperform GPT-4o in ingredient generation and match its performance in cooking procedure generation, advancing food image understanding in Japanese cuisine.

Contribution

The study introduces fine-tuned open MLLMs on Japanese recipes, showing superior ingredient generation and comparable procedure generation compared to GPT-4o.

Findings

01

Open models outperform GPT-4o in ingredient generation (F1 0.531 vs 0.481)

02

Open models match GPT-4o in cooking procedure generation

03

Fine-tuning on Japanese recipes enhances food image understanding

Abstract

Research on food image understanding using recipe data has been a long-standing focus due to the diversity and complexity of the data. Moreover, food is inextricably linked to people's lives, making it a vital research area for practical applications such as dietary management. Recent advancements in Multimodal Large Language Models (MLLMs) have demonstrated remarkable capabilities, not only in their vast knowledge but also in their ability to handle languages naturally. While English is predominantly used, they can also support multiple languages including Japanese. This suggests that MLLMs are expected to significantly improve performance in food image understanding tasks. We fine-tuned open MLLMs LLaVA-1.5 and Phi-3 Vision on a Japanese recipe dataset and benchmarked their performance against the closed model GPT-4o. We then evaluated the content of generated recipes, including…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsTopic Modeling · Advanced Text Analysis Techniques · Digital Humanities and Scholarship

MethodsFocus