Cross-Modal Retrieval in the Cooking Context: Learning Semantic   Text-Image Embeddings

Micael Carvalho; R\'emi Cad\`ene; David Picard; Laure Soulier; Nicolas; Thome; Matthieu Cord

arXiv:1804.11146·cs.CL·May 1, 2018·38 cites

Cross-Modal Retrieval in the Cooking Context: Learning Semantic Text-Image Embeddings

Micael Carvalho, R\'emi Cad\`ene, David Picard, Laure Soulier, Nicolas, Thome, Matthieu Cord

PDF

Open Access 1 Repo

TL;DR

This paper introduces a cross-modal retrieval model that aligns visual and textual cooking data in a shared space, enabling efficient large-scale retrieval and improving upon previous models, validated on a large recipe dataset.

Contribution

The paper presents a novel learning scheme for cross-modal retrieval that effectively handles large-scale cooking data, advancing the state-of-the-art in semantic text-image embedding.

Findings

01

Outperforms previous state-of-the-art models on Recipe1M dataset

02

Effective in large-scale retrieval tasks

03

Qualitative results demonstrate practical cooking use cases

Abstract

Designing powerful tools that support cooking activities has rapidly gained popularity due to the massive amounts of available data, as well as recent advances in machine learning that are capable of analyzing them. In this paper, we propose a cross-modal retrieval model aligning visual and textual data (like pictures of dishes and their recipes) in a shared representation space. We describe an effective learning scheme, capable of tackling large-scale problems, and validate it on the Recipe1M dataset containing nearly 1 million picture-recipe pairs. We show the effectiveness of our approach regarding previous state-of-the-art models and present qualitative results over computational cooking use cases.

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

Cadene/recipe1m.bootstrap.pytorch
pytorchOfficial

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsMultimodal Machine Learning Applications · Topic Modeling · Image Retrieval and Classification Techniques