Intra-Modal Constraint Loss For Image-Text Retrieval
Jianan Chen, Lu Zhang, Qiong Wang, Cong Bai, Kidiyo Kpalma

TL;DR
This paper introduces an intra-modal constraint loss for image-text retrieval that improves joint embedding learning by reducing negative pair violations within the same modality, leading to better retrieval performance.
Contribution
It proposes a novel intra-modal constraint loss function for joint embedding learning in image-text retrieval, enhancing existing cross-modal retrieval methods.
Findings
Outperforms state-of-the-art methods on Flickr30K dataset
Achieves higher retrieval accuracy on Microsoft COCO dataset
Demonstrates effectiveness of intra-modal constraints in embedding space
Abstract
Cross-modal retrieval has drawn much attention in both computer vision and natural language processing domains. With the development of convolutional and recurrent neural networks, the bottleneck of retrieval across image-text modalities is no longer the extraction of image and text features but an efficient loss function learning in embedding space. Many loss functions try to closer pairwise features from heterogeneous modalities. This paper proposes a method for learning joint embedding of images and texts using an intra-modal constraint loss function to reduce the violation of negative pairs from the same homogeneous modality. Experimental results show that our approach outperforms state-of-the-art bi-directional image-text retrieval methods on Flickr30K and Microsoft COCO datasets. Our code is publicly available: https://github.com/CanonChen/IMC.
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsMultimodal Machine Learning Applications · Advanced Image and Video Retrieval Techniques · Image Retrieval and Classification Techniques
