Learning TFIDF Enhanced Joint Embedding for Recipe-Image Cross-Modal   Retrieval Service

Zhongwei Xie; Ling Liu; Yanzhao Wu; Lin Li; Luo Zhong

arXiv:2108.00724·cs.CV·August 3, 2021

Learning TFIDF Enhanced Joint Embedding for Recipe-Image Cross-Modal Retrieval Service

Zhongwei Xie, Ling Liu, Yanzhao Wu, Lin Li, Luo Zhong

PDF

1 Repo

TL;DR

This paper introduces MSJE, a multi-modal joint embedding method that leverages TFIDF features and LSTM networks to improve cross-modal retrieval of recipes and images, outperforming existing methods.

Contribution

The paper proposes a novel multi-modal embedding approach that integrates TFIDF features with LSTM-based sequence modeling for better recipe-image retrieval.

Findings

01

MSJE outperforms state-of-the-art methods on Recipe1M dataset.

02

TFIDF features enhance the semantic understanding of recipes.

03

Combining TFIDF with sequence features improves retrieval accuracy.

Abstract

It is widely acknowledged that learning joint embeddings of recipes with images is challenging due to the diverse composition and deformation of ingredients in cooking procedures. We present a Multi-modal Semantics enhanced Joint Embedding approach (MSJE) for learning a common feature space between the two modalities (text and image), with the ultimate goal of providing high-performance cross-modal retrieval services. Our MSJE approach has three unique features. First, we extract the TFIDF feature from the title, ingredients and cooking instructions of recipes. By determining the significance of word sequences through combining LSTM learned features with their TFIDF features, we encode a recipe into a TFIDF weighted vector for capturing significant key terms and how such key terms are used in the corresponding cooking instructions. Second, we combine the recipe TFIDF feature with the…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

Kevinnest/MSJE
pytorchOfficial

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

MethodsTanh Activation · Sigmoid Activation · Long Short-Term Memory