Retrieval Augmented Recipe Generation
Guoshan Liu, Hailong Yin, Bin Zhu, Jingjing Chen, Chong-Wah Ngo,, Yu-Gang Jiang

TL;DR
This paper introduces a retrieval-augmented large multimodal model with stochastic retrieval and self-consistency voting to improve recipe generation from food images, achieving state-of-the-art results on Recipe1M.
Contribution
The paper proposes a novel retrieval-augmented approach with SDRA and self-consistency voting to enhance recipe generation accuracy and diversity.
Findings
Achieves state-of-the-art performance on Recipe1M dataset.
Effectively reduces hallucinations in recipe generation.
Enhances diversity and relevance of generated recipes.
Abstract
Given the potential applications of generating recipes from food images, this area has garnered significant attention from researchers in recent years. Existing works for recipe generation primarily utilize a two-stage training method, first generating ingredients and then obtaining instructions from both the image and ingredients. Large Multi-modal Models (LMMs), which have achieved notable success across a variety of vision and language tasks, shed light to generating both ingredients and instructions directly from images. Nevertheless, LMMs still face the common issue of hallucinations during recipe generation, leading to suboptimal performance. To tackle this, we propose a retrieval augmented large multimodal model for recipe generation. We first introduce Stochastic Diversified Retrieval Augmentation (SDRA) to retrieve recipes semantically related to the image from an existing…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsArtificial Intelligence in Games
MethodsSoftmax · Attention Is All You Need
