USER: Unified Semantic Enhancement with Momentum Contrast for Image-Text   Retrieval

Yan Zhang; Zhong Ji; Di Wang; Yanwei Pang; Xuelong Li

arXiv:2301.06844·cs.CV·January 18, 2023·1 cites

USER: Unified Semantic Enhancement with Momentum Contrast for Image-Text Retrieval

Yan Zhang, Zhong Ji, Di Wang, Yanwei Pang, Xuelong Li

PDF

Open Access 1 Repo

TL;DR

This paper introduces a novel unified semantic enhancement method using momentum contrast for image-text retrieval, improving accuracy and efficiency by leveraging global representations, knowledge transfer from CLIP, and dynamic negative sampling.

Contribution

The paper proposes the USER framework that combines semantic enhancement modules with momentum contrastive learning, addressing limitations of existing methods in representation accuracy and negative sample scale.

Findings

01

Achieves superior retrieval accuracy on MSCOCO and Flickr30K datasets.

02

Enhances inference efficiency compared to previous approaches.

03

Effectively enlarges negative sample sets using dynamic queues.

Abstract

As a fundamental and challenging task in bridging language and vision domains, Image-Text Retrieval (ITR) aims at searching for the target instances that are semantically relevant to the given query from the other modality, and its key challenge is to measure the semantic similarity across different modalities. Although significant progress has been achieved, existing approaches typically suffer from two major limitations: (1) It hurts the accuracy of the representation by directly exploiting the bottom-up attention based region-level features where each region is equally treated. (2) It limits the scale of negative sample pairs by employing the mini-batch based end-to-end training mechanism. To address these limitations, we propose a Unified Semantic Enhancement Momentum Contrastive Learning (USER) method for ITR. Specifically, we delicately design two simple but effective Global…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

zhangy0822/user
pytorchOfficial

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsMultimodal Machine Learning Applications · Domain Adaptation and Few-Shot Learning · Advanced Image and Video Retrieval Techniques

MethodsInfoNCE · Batch Normalization · Contrastive Language-Image Pre-training · Contrastive Learning · Momentum Contrast