LLMs-based Augmentation for Domain Adaptation in Long-tailed Food Datasets
Qing Wang, Chong-Wah Ngo, Ee-Peng Lim, Qianru Sun

TL;DR
This paper introduces a framework using large language models to improve food recognition by addressing domain shift, long-tailed distribution, and subtle visual variations, achieving superior performance on food datasets.
Contribution
The novel framework leverages LLMs to generate descriptive texts and align them with images in a shared space for enhanced food recognition in challenging conditions.
Findings
Outperforms existing methods on long-tailed food datasets
Effective in domain adaptation for food recognition
Improves fine-grained classification accuracy
Abstract
Training a model for food recognition is challenging because the training samples, which are typically crawled from the Internet, are visually different from the pictures captured by users in the free-living environment. In addition to this domain-shift problem, the real-world food datasets tend to be long-tailed distributed and some dishes of different categories exhibit subtle variations that are difficult to distinguish visually. In this paper, we present a framework empowered with large language models (LLMs) to address these challenges in food recognition. We first leverage LLMs to parse food images to generate food titles and ingredients. Then, we project the generated texts and food images from different domains to a shared embedding space to maximize the pair similarities. Finally, we take the aligned features of both modalities for recognition. With this simple framework, we…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsNutritional Studies and Diet · Food Security and Health in Diverse Populations · Consumer Attitudes and Food Labeling
