Towards Fast and Accurate Image-Text Retrieval with Self-Supervised   Fine-Grained Alignment

Jiamin Zhuang; Jing Yu; Yang Ding; Xiangyan Qu; Yue Hu

arXiv:2308.14009·cs.CV·August 29, 2023

Towards Fast and Accurate Image-Text Retrieval with Self-Supervised Fine-Grained Alignment

Jiamin Zhuang, Jing Yu, Yang Ding, Xiangyan Qu, Yue Hu

PDF

1 Repo

TL;DR

This paper introduces SelfAlign, a self-supervised alignment module that enhances image-text retrieval accuracy in independent-embedding models without increasing retrieval time or requiring extra supervision.

Contribution

SelfAlign improves retrieval accuracy by aligning image and text at concept and context levels using contrastive learning, without cross-modal interactions during training.

Findings

01

Boosts state-of-the-art models' accuracy by up to 9.1% on Flickr30K.

02

Outperforms many interactive-embedding models in accuracy with less retrieval time.

03

Maintains efficiency with comparable time cost to existing models.

Abstract

Image-text retrieval requires the system to bridge the heterogenous gap between vision and language for accurate retrieval while keeping the network lightweight-enough for efficient retrieval. Existing trade-off solutions mainly study from the view of incorporating cross-modal interactions with the independent-embedding framework or leveraging stronger pretrained encoders, which still demand time-consuming similarity measurement or heavyweight model structure in the retrieval stage. In this work, we propose an image-text alignment module SelfAlign on top of the independent-embedding framework, which improves the retrieval accuracy while maintains the retrieval efficiency without extra supervision. SelfAlign contains two collaborative sub-modules that force image-text alignment at both concept level and context level by self-supervised contrastive learning. It does not require…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

zjamie813/selfalign
pytorchOfficial

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.