Image-text Retrieval via Preserving Main Semantics of Vision

Xu Zhang; Xinzheng Niu; Philippe Fournier-Viger; Xudong Dai

arXiv:2304.10254·cs.CV·May 1, 2023·1 cites

Image-text Retrieval via Preserving Main Semantics of Vision

Xu Zhang, Xinzheng Niu, Philippe Fournier-Viger, Xudong Dai

PDF

Open Access 1 Repo

TL;DR

This paper introduces a Visual Semantic Loss (VSL) to improve image-text retrieval by emphasizing main image content and reducing false matches caused by secondary information, leading to better retrieval accuracy.

Contribution

The paper proposes a novel semantic optimization method, VSL, that leverages annotated texts to focus on main image content, enhancing cross-modal retrieval performance.

Findings

01

VSL improves retrieval accuracy on MSCOCO and Flickr30K datasets.

02

The method reduces false matches caused by secondary image content.

03

Experimental results outperform existing approaches.

Abstract

Image-text retrieval is one of the major tasks of cross-modal retrieval. Several approaches for this task map images and texts into a common space to create correspondences between the two modalities. However, due to the content (semantics) richness of an image, redundant secondary information in an image may cause false matches. To address this issue, this paper presents a semantic optimization approach, implemented as a Visual Semantic Loss (VSL), to assist the model in focusing on an image's main content. This approach is inspired by how people typically annotate the content of an image by describing its main content. Thus, we leverage the annotated texts corresponding to an image to assist the model in capturing the main content of the image, reducing the negative impact of secondary content. Extensive experiments on two benchmark datasets (MSCOCO and Flickr30K) demonstrate the…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

zhangxu0963/vsl
pytorchOfficial

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsMultimodal Machine Learning Applications · Advanced Image and Video Retrieval Techniques · Image Retrieval and Classification Techniques