Dynamic Visual Semantic Sub-Embeddings and Fast Re-Ranking

Wenzhang Wei; Zhipeng Gui; Changguang Wu; Anqi Zhao; Dehua Peng; Huayi; Wu

arXiv:2309.08154·cs.CV·December 22, 2023

Dynamic Visual Semantic Sub-Embeddings and Fast Re-Ranking

Wenzhang Wei, Zhipeng Gui, Changguang Wu, Anqi Zhao, Dehua Peng, Huayi, Wu

PDF

Open Access

TL;DR

This paper introduces a Dynamic Visual Semantic Sub-Embeddings framework to reduce redundancy in image-text matching, coupled with a fast re-ranking strategy, improving semantic variation encoding and retrieval accuracy.

Contribution

It proposes a novel method for generating diverse, low-entropy visual sub-embeddings and an efficient re-ranking approach for cross-modal retrieval tasks.

Findings

01

Improved retrieval accuracy on MSCOCO, Flickr30K, and CUB datasets.

02

Effective encoding of semantic variations in visual embeddings.

03

Enhanced efficiency with the fast re-ranking strategy.

Abstract

The core of cross-modal matching is to accurately measure the similarity between different modalities in a unified representation space. However, compared to textual descriptions of a certain perspective, the visual modality has more semantic variations. So, images are usually associated with multiple textual captions in databases. Although popular symmetric embedding methods have explored numerous modal interaction approaches, they often learn toward increasing the average expression probability of multiple semantic variations within image embeddings. Consequently, information entropy in embeddings is increased, resulting in redundancy and decreased accuracy. In this work, we propose a Dynamic Visual Semantic Sub-Embeddings framework (DVSE) to reduce the information entropy. Specifically, we obtain a set of heterogeneous visual sub-embeddings through dynamic orthogonal constraint loss.…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsMultimodal Machine Learning Applications · Advanced Image and Video Retrieval Techniques · Video Analysis and Summarization

MethodsFocus