Survey of Visual-Semantic Embedding Methods for Zero-Shot Image Retrieval
Kazuya Ueki

TL;DR
This survey reviews the evolution, datasets, and evaluation of visual-semantic embedding methods for zero-shot image retrieval, aiming to guide future research in bridging images and language.
Contribution
It provides a comprehensive overview of technological trends, datasets, and evaluation results in zero-shot image retrieval using visual-semantic embeddings.
Findings
Summarizes historical development of image-text matching methods.
Compares performance of various embedding techniques.
Provides open-source implementation for reproducibility.
Abstract
Visual-semantic embedding is an interesting research topic because it is useful for various tasks, such as visual question answering (VQA), image-text retrieval, image captioning, and scene graph generation. In this paper, we focus on zero-shot image retrieval using sentences as queries and present a survey of the technological trends in this area. First, we provide a comprehensive overview of the history of the technology, starting with a discussion of the early studies of image-to-text matching and how the technology has evolved over time. In addition, a description of the datasets commonly used in experiments and a comparison of the evaluation results of each method are presented. We also introduce the implementation available on github for use in confirming the accuracy of experiments and for further improvement. We hope that this survey paper will encourage researchers to further…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsMultimodal Machine Learning Applications · Advanced Image and Video Retrieval Techniques · Domain Adaptation and Few-Shot Learning
