FoodLMM: A Versatile Food Assistant using Large Multi-modal Model
Yuehao Yin, Huiyan Qi, Bin Zhu, Jingjing Chen, Yu-Gang Jiang,, Chong-Wah Ngo

TL;DR
FoodLMM is a versatile large multi-modal model designed for various food-related tasks, including recognition, recipe generation, and nutrition estimation, achieved through a novel training strategy and task-specific components.
Contribution
The paper introduces FoodLMM, a multi-task, multi-modal food assistant with novel task-specific tokens and a two-stage training process for enhanced domain-specific performance.
Findings
Achieved state-of-the-art results on multiple food benchmarks.
Effectively performs complex reasoning and multi-round dialogues in the food domain.
Demonstrates versatility across recognition, generation, and segmentation tasks.
Abstract
Large Multi-modal Models (LMMs) have made impressive progress in many vision-language tasks. Nevertheless, the performance of general LMMs in specific domains is still far from satisfactory. This paper proposes FoodLMM, a versatile food assistant based on LMMs with various capabilities, including food recognition, ingredient recognition, recipe generation, nutrition estimation, food segmentation and multi-round conversation. To facilitate FoodLMM to deal with tasks beyond pure text output, we introduce a series of novel task-specific tokens and heads, enabling the model to predict food nutritional values and multiple segmentation masks. We adopt a two-stage training strategy. In the first stage, we utilize multiple public food benchmarks for multi-task learning by leveraging the instruct-following paradigm. In the second stage, we construct a multi-round conversation dataset and a…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsMultimodal Machine Learning Applications · Topic Modeling · Natural Language Processing Techniques
