FMiFood: Multi-modal Contrastive Learning for Food Image Classification

Xinyue Pan; Jiangpeng He; Fengqing Zhu

arXiv:2408.03922·cs.CV·August 8, 2024

FMiFood: Multi-modal Contrastive Learning for Food Image Classification

Xinyue Pan, Jiangpeng He, Fengqing Zhu

PDF

Open Access

TL;DR

FMiFood introduces a multi-modal contrastive learning framework that combines food images and contextual text descriptions, including GPT-4 enriched data, to improve food image classification accuracy amidst intra-class diversity and inter-class similarity.

Contribution

The paper presents a novel multi-modal contrastive learning approach that integrates textual context and a flexible matching technique to enhance food image classification performance.

Findings

01

Improved accuracy on UPMC-101 and VFN datasets.

02

Effective integration of GPT-4 enriched descriptions.

03

Enhanced discriminative feature learning for food images.

Abstract

Food image classification is the fundamental step in image-based dietary assessment, which aims to estimate participants' nutrient intake from eating occasion images. A common challenge of food images is the intra-class diversity and inter-class similarity, which can significantly hinder classification performance. To address this issue, we introduce a novel multi-modal contrastive learning framework called FMiFood, which learns more discriminative features by integrating additional contextual information, such as food category text descriptions, to enhance classification accuracy. Specifically, we propose a flexible matching technique that improves the similarity matching between text and image embeddings to focus on multiple key information. Furthermore, we incorporate the classification objectives into the framework and explore the use of GPT-4 to enrich the text descriptions and…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsAdvanced Chemical Sensor Technologies · Identification and Quantification in Food · Culinary Culture and Tourism

MethodsAttention Is All You Need · Linear Layer · Layer Normalization · Multi-Head Attention · Position-Wise Feed-Forward Layer · Adam · Byte Pair Encoding · Softmax · Absolute Position Encodings · Dense Connections