MagicLens: Self-Supervised Image Retrieval with Open-Ended Instructions

Kai Zhang; Yi Luan; Hexiang Hu; Kenton Lee; Siyuan Qiao; Wenhu Chen,; Yu Su; Ming-Wei Chang

arXiv:2403.19651·cs.CV·June 26, 2024·1 cites

MagicLens: Self-Supervised Image Retrieval with Open-Ended Instructions

Kai Zhang, Yi Luan, Hexiang Hu, Kenton Lee, Siyuan Qiao, Wenhu Chen,, Yu Su, Ming-Wei Chang

PDF

Open Access 1 Repo

TL;DR

MagicLens introduces a self-supervised image retrieval model that uses open-ended instructions to capture a wide range of implicit semantic relations between images, surpassing traditional similarity-based methods.

Contribution

It leverages web-mined implicit relations and foundation models to enable open-ended, semantically rich image retrieval with a smaller, efficient model.

Findings

01

Achieves comparable or better results than prior methods on eight benchmarks.

02

Supports diverse search intents demonstrated through human analysis.

03

Operates with high parameter efficiency and smaller model size.

Abstract

Image retrieval, i.e., finding desired images given a reference image, inherently encompasses rich, multi-faceted search intents that are difficult to capture solely using image-based measures. Recent works leverage text instructions to allow users to more freely express their search intents. However, they primarily focus on image pairs that are visually similar and/or can be characterized by a small set of pre-defined relations. The core thesis of this paper is that text instructions can enable retrieving images with richer relations beyond visual similarity. To show this, we introduce MagicLens, a series of self-supervised image retrieval models that support open-ended instructions. MagicLens is built on a key novel insight: image pairs that naturally occur on the same web pages contain a wide range of implicit relations (e.g., inside view of), and we can bring those implicit…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

google-deepmind/magiclens
jaxOfficial

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsImage Retrieval and Classification Techniques · Advanced Image and Video Retrieval Techniques · Multimodal Machine Learning Applications

MethodsSparse Evolutionary Training · Focus