Self-Attention and Ingredient-Attention Based Model for Recipe Retrieval from Image Queries
Matthias Fontanellaz, Stergios Christodoulidis, Stavroula, Mougiakakou

TL;DR
This paper introduces a novel self-attention and ingredient-attention based model for retrieving recipes from images, aiding nutrient estimation by focusing on relevant instructions and ingredients, and demonstrating promising results on a large dataset.
Contribution
The study proposes a new multi-modal model utilizing self-attention and ingredient attention mechanisms for improved recipe retrieval from images, reducing training time and enhancing interpretability.
Findings
Model achieves effective recipe retrieval from images.
Ingredient attention highlights important instructions and ingredients.
Comparison shows improved performance over baseline methods.
Abstract
Direct computer vision based-nutrient content estimation is a demanding task, due to deformation and occlusions of ingredients, as well as high intra-class and low inter-class variability between meal classes. In order to tackle these issues, we propose a system for recipe retrieval from images. The recipe information can subsequently be used to estimate the nutrient content of the meal. In this study, we utilize the multi-modal Recipe1M dataset, which contains over 1 million recipes accompanied by over 13 million images. The proposed model can operate as a first step in an automatic pipeline for the estimation of nutrition content by supporting hints related to ingredient and instruction. Through self-attention, our model can directly process raw recipe text, making the upstream instruction sentence embedding process redundant and thus reducing training time, while providing desirable…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
