Survey of Visual-Semantic Embedding Methods for Zero-Shot Image   Retrieval

Kazuya Ueki

arXiv:2105.07391·cs.CV·September 29, 2021

Survey of Visual-Semantic Embedding Methods for Zero-Shot Image Retrieval

Kazuya Ueki

PDF

Open Access

TL;DR

This survey reviews the evolution, datasets, and evaluation of visual-semantic embedding methods for zero-shot image retrieval, aiming to guide future research in bridging images and language.

Contribution

It provides a comprehensive overview of technological trends, datasets, and evaluation results in zero-shot image retrieval using visual-semantic embeddings.

Findings

01

Summarizes historical development of image-text matching methods.

02

Compares performance of various embedding techniques.

03

Provides open-source implementation for reproducibility.

Abstract

Visual-semantic embedding is an interesting research topic because it is useful for various tasks, such as visual question answering (VQA), image-text retrieval, image captioning, and scene graph generation. In this paper, we focus on zero-shot image retrieval using sentences as queries and present a survey of the technological trends in this area. First, we provide a comprehensive overview of the history of the technology, starting with a discussion of the early studies of image-to-text matching and how the technology has evolved over time. In addition, a description of the datasets commonly used in experiments and a comparison of the evaluation results of each method are presented. We also introduce the implementation available on github for use in confirming the accuracy of experiments and for further improvement. We hope that this survey paper will encourage researchers to further…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsMultimodal Machine Learning Applications · Advanced Image and Video Retrieval Techniques · Domain Adaptation and Few-Shot Learning