AsCL: An Asymmetry-sensitive Contrastive Learning Method for Image-Text Retrieval with Cross-Modal Fusion
Ziyu Gong, Chengcheng Mai, Yihua Huang

TL;DR
This paper introduces AsCL, a contrastive learning approach that addresses asymmetry in image-text retrieval by enhancing fine-grained differentiation and cross-modal fusion, leading to improved retrieval accuracy.
Contribution
The paper proposes a novel asymmetry-sensitive contrastive learning method combined with hierarchical cross-modal fusion for better image-text retrieval.
Findings
Outperforms existing methods on MSCOCO and Flickr30K datasets.
Effectively distinguishes fine-grained semantic differences across modalities.
Enhances concept alignment through multimodal attention mechanism.
Abstract
The image-text retrieval task aims to retrieve relevant information from a given image or text. The main challenge is to unify multimodal representation and distinguish fine-grained differences across modalities, thereby finding similar contents and filtering irrelevant contents. However, existing methods mainly focus on unified semantic representation and concept alignment for multi-modalities, while the fine-grained differences across modalities have rarely been studied before, making it difficult to solve the information asymmetry problem. In this paper, we propose a novel asymmetry-sensitive contrastive learning method. By generating corresponding positive and negative samples for different asymmetry types, our method can simultaneously ensure fine-grained semantic differentiation and unified semantic representation between multi-modalities. Additionally, a hierarchical cross-modal…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsImage Retrieval and Classification Techniques · Advanced Image and Video Retrieval Techniques
