MALM: Mask Augmentation based Local Matching for Food-Recipe Retrieval

Bhanu Prakash Voutharoja; Peng Wang; Lei Wang; Vivienne Guan

arXiv:2305.11327·cs.CV·May 22, 2023·2 cites

MALM: Mask Augmentation based Local Matching for Food-Recipe Retrieval

Bhanu Prakash Voutharoja, Peng Wang, Lei Wang, Vivienne Guan

PDF

Open Access 1 Repo

TL;DR

This paper introduces MALM, a novel mask-augmentation based local matching network for image-to-recipe retrieval, which improves cross-modality representation learning by combining local matching with masked self-distillation, outperforming state-of-the-art methods.

Contribution

The paper proposes a new local matching framework with mask augmentation and self-distillation to enhance generalizable cross-modality representations in food-recipe retrieval.

Findings

01

Outperforms state-of-the-art on Recipe1M dataset

02

Effectively locates fine-grained cross-modality correspondences

03

Enhances generalization through masked self-distillation

Abstract

Image-to-recipe retrieval is a challenging vision-to-language task of significant practical value. The main challenge of the task lies in the ultra-high redundancy in the long recipe and the large variation reflected in both food item combination and food item appearance. A de-facto idea to address this task is to learn a shared feature embedding space in which a food image is aligned better to its paired recipe than other recipes. However, such supervised global matching is prone to supervision collapse, i.e., only partial information that is necessary for distinguishing training pairs can be identified, while other information that is potentially useful in generalization could be lost. To mitigate such a problem, we propose a mask-augmentation-based local matching network (MALM), where an image-text matching module and a masked self-distillation module benefit each other mutually to…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

myfoodchoice/malm_mask_augmentation_based_local_matching-_for-_food_recipe_retrieval
noneOfficial

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsMultimodal Machine Learning Applications · Advanced Image and Video Retrieval Techniques · Domain Adaptation and Few-Shot Learning