Dynamic Visual Semantic Sub-Embeddings and Fast Re-Ranking
Wenzhang Wei, Zhipeng Gui, Changguang Wu, Anqi Zhao, Dehua Peng, Huayi, Wu

TL;DR
This paper introduces a Dynamic Visual Semantic Sub-Embeddings framework to reduce redundancy in image-text matching, coupled with a fast re-ranking strategy, improving semantic variation encoding and retrieval accuracy.
Contribution
It proposes a novel method for generating diverse, low-entropy visual sub-embeddings and an efficient re-ranking approach for cross-modal retrieval tasks.
Findings
Improved retrieval accuracy on MSCOCO, Flickr30K, and CUB datasets.
Effective encoding of semantic variations in visual embeddings.
Enhanced efficiency with the fast re-ranking strategy.
Abstract
The core of cross-modal matching is to accurately measure the similarity between different modalities in a unified representation space. However, compared to textual descriptions of a certain perspective, the visual modality has more semantic variations. So, images are usually associated with multiple textual captions in databases. Although popular symmetric embedding methods have explored numerous modal interaction approaches, they often learn toward increasing the average expression probability of multiple semantic variations within image embeddings. Consequently, information entropy in embeddings is increased, resulting in redundancy and decreased accuracy. In this work, we propose a Dynamic Visual Semantic Sub-Embeddings framework (DVSE) to reduce the information entropy. Specifically, we obtain a set of heterogeneous visual sub-embeddings through dynamic orthogonal constraint loss.…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsMultimodal Machine Learning Applications · Advanced Image and Video Retrieval Techniques · Video Analysis and Summarization
MethodsFocus
