The Multimodal And Modular Ai Chef: Complex Recipe Generation From Imagery
David Noever, Samantha Elizabeth Miller Noever

TL;DR
This paper presents a modular AI system combining image labeling and large language models to generate complex, customized recipes from refrigerator images, outperforming monolithic models in maintaining context and formatting.
Contribution
It introduces a lightweight, API-based modular approach for recipe generation that effectively integrates image recognition with advanced language models, improving over existing monolithic multimodal systems.
Findings
Achieved over 95% accuracy in object list generation from images.
Successfully generated a 100-page recipe book with 30 top ingredients.
Demonstrated the system's ability to handle complex constraints and produce pragmatic recipes.
Abstract
The AI community has embraced multi-sensory or multi-modal approaches to advance this generation of AI models to resemble expected intelligent understanding. Combining language and imagery represents a familiar method for specific tasks like image captioning or generation from descriptions. This paper compares these monolithic approaches to a lightweight and specialized method based on employing image models to label objects, then serially submitting this resulting object list to a large language model (LLM). This use of multiple Application Programming Interfaces (APIs) enables better than 95% mean average precision for correct object lists, which serve as input to the latest Open AI text generator (GPT-4). To demonstrate the API as a modular alternative, we solve the problem of a user taking a picture of ingredients available in a refrigerator, then generating novel recipe cards…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsMultimodal Machine Learning Applications · Natural Language Processing Techniques · Topic Modeling
