Efficient Deep Feature Calibration for Cross-Modal Joint Embedding   Learning

Zhongwei Xie; Ling Liu; Lin Li; Luo Zhong

arXiv:2108.00705·cs.CV·August 10, 2021

Efficient Deep Feature Calibration for Cross-Modal Joint Embedding Learning

Zhongwei Xie, Ling Liu, Lin Li, Luo Zhong

PDF

TL;DR

This paper proposes a two-phase deep feature calibration framework for efficient cross-modal text-image embedding, improving semantic alignment and outperforming existing methods on the Recipe1M dataset.

Contribution

It introduces a novel two-phase deep feature calibration approach that separates data preprocessing from joint embedding training, enhancing semantic alignment in cross-modal learning.

Findings

01

Significant performance improvement over state-of-the-art methods.

02

Effective semantic alignment of recipes and images.

03

Robustness demonstrated on the Recipe1M dataset.

Abstract

This paper introduces a two-phase deep feature calibration framework for efficient learning of semantics enhanced text-image cross-modal joint embedding, which clearly separates the deep feature calibration in data preprocessing from training the joint embedding model. We use the Recipe1M dataset for the technical description and empirical validation. In preprocessing, we perform deep feature calibration by combining deep feature engineering with semantic context features derived from raw text-image input data. We leverage LSTM to identify key terms, NLP methods to produce ranking scores for key terms before generating the key term feature. We leverage wideResNet50 to extract and encode the image category semantics to help semantic alignment of the learned recipe and image embeddings in the joint latent space. In joint embedding learning, we perform deep feature calibration by…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

MethodsTanh Activation · Sigmoid Activation · Long Short-Term Memory · Triplet Loss