Intra-Modal Constraint Loss For Image-Text Retrieval

Jianan Chen; Lu Zhang; Qiong Wang; Cong Bai; Kidiyo Kpalma

arXiv:2207.05024·cs.CV·July 14, 2022

Intra-Modal Constraint Loss For Image-Text Retrieval

Jianan Chen, Lu Zhang, Qiong Wang, Cong Bai, Kidiyo Kpalma

PDF

Open Access 1 Repo

TL;DR

This paper introduces an intra-modal constraint loss for image-text retrieval that improves joint embedding learning by reducing negative pair violations within the same modality, leading to better retrieval performance.

Contribution

It proposes a novel intra-modal constraint loss function for joint embedding learning in image-text retrieval, enhancing existing cross-modal retrieval methods.

Findings

01

Outperforms state-of-the-art methods on Flickr30K dataset

02

Achieves higher retrieval accuracy on Microsoft COCO dataset

03

Demonstrates effectiveness of intra-modal constraints in embedding space

Abstract

Cross-modal retrieval has drawn much attention in both computer vision and natural language processing domains. With the development of convolutional and recurrent neural networks, the bottleneck of retrieval across image-text modalities is no longer the extraction of image and text features but an efficient loss function learning in embedding space. Many loss functions try to closer pairwise features from heterogeneous modalities. This paper proposes a method for learning joint embedding of images and texts using an intra-modal constraint loss function to reduce the violation of negative pairs from the same homogeneous modality. Experimental results show that our approach outperforms state-of-the-art bi-directional image-text retrieval methods on Flickr30K and Microsoft COCO datasets. Our code is publicly available: https://github.com/CanonChen/IMC.

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

canonchen/imc
pytorchOfficial

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsMultimodal Machine Learning Applications · Advanced Image and Video Retrieval Techniques · Image Retrieval and Classification Techniques