CHEF: Cross-modal Hierarchical Embeddings for Food Domain Retrieval

Hai X. Pham; Ricardo Guerrero; Jiatong Li; Vladimir Pavlovic

arXiv:2102.02547·cs.CV·February 5, 2021

CHEF: Cross-modal Hierarchical Embeddings for Food Domain Retrieval

Hai X. Pham, Ricardo Guerrero, Jiatong Li, Vladimir Pavlovic

PDF

Open Access 1 Repo 1 Video

TL;DR

This paper introduces a novel cross-modal hierarchical embedding framework for food retrieval that models complex relationships between images and recipe texts, enabling automatic identification of key recipe components and improving retrieval accuracy.

Contribution

The work presents a new hierarchical embedding model using tree-structured LSTMs for food image-recipe association, capturing complex relationships without explicit supervision.

Findings

01

Effective identification of main ingredients and actions in recipes

02

Improved cross-modal retrieval performance

03

Learned meaningful food recipe representations

Abstract

Despite the abundance of multi-modal data, such as image-text pairs, there has been little effort in understanding the individual entities and their different roles in the construction of these data instances. In this work, we endeavour to discover the entities and their corresponding importance in cooking recipes automaticall} as a visual-linguistic association problem. More specifically, we introduce a novel cross-modal learning framework to jointly model the latent representations of images and text in the food image-recipe association and retrieval tasks. This model allows one to discover complex functional and hierarchical relationships between images and text, and among textual parts of a recipe including title, ingredients and cooking instructions. Our experiments show that by making use of efficient tree-structured Long Short-Term Memory as the text encoder in our computational…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

haixpham/CHEF
pytorchOfficial

Videos

CHEF: Cross-Modal Hierarchical Embeddings for Food Domain Retrieval· underline

Taxonomy

TopicsMultimodal Machine Learning Applications · Advanced Image and Video Retrieval Techniques · Image Retrieval and Classification Techniques