Using Text to Teach Image Retrieval

Haoyu Dong; Ze Wang; Qiang Qiu; and Guillermo Sapiro

arXiv:2011.09928·cs.LG·November 20, 2020

Using Text to Teach Image Retrieval

Haoyu Dong, Ze Wang, Qiang Qiu, and Guillermo Sapiro

PDF

Open Access

TL;DR

This paper introduces a novel approach to image retrieval by augmenting image feature manifolds with aligned text data, improving retrieval accuracy especially when data is limited, and presents a new dataset for semantic similarity evaluation.

Contribution

It proposes representing image feature spaces as graphs with geodesic distances and enhances them with geometrically aligned text to improve retrieval performance.

Findings

01

Text augmentation improves image retrieval accuracy.

02

Joint embedding manifolds are more robust for retrieval tasks.

03

New CLEVR-based dataset quantifies semantic similarity between images and text.

Abstract

Image retrieval relies heavily on the quality of the data modeling and the distance measurement in the feature space. Building on the concept of image manifold, we first propose to represent the feature space of images, learned via neural networks, as a graph. Neighborhoods in the feature space are now defined by the geodesic distance between images, represented as graph vertices or manifold samples. When limited images are available, this manifold is sparsely sampled, making the geodesic computation and the corresponding retrieval harder. To address this, we augment the manifold samples with geometrically aligned text, thereby using a plethora of sentences to teach us about images. In addition to extensive results on standard datasets illustrating the power of text to help in image retrieval, a new public dataset based on CLEVR is introduced to quantify the semantic similarity between…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsMultimodal Machine Learning Applications · Advanced Image and Video Retrieval Techniques · Domain Adaptation and Few-Shot Learning