CLIP-Branches: Interactive Fine-Tuning for Text-Image Retrieval

Christian L\"ulf; Denis Mayr Lima Martins; Marcos Antonio Vaz Salles,; Yongluan Zhou; Fabian Gieseke

arXiv:2406.13322·cs.IR·June 21, 2024·2 cites

CLIP-Branches: Interactive Fine-Tuning for Text-Image Retrieval

Christian L\"ulf, Denis Mayr Lima Martins, Marcos Antonio Vaz Salles,, Yongluan Zhou, Fabian Gieseke

PDF

Open Access 1 Repo

TL;DR

CLIP-Branches introduces an interactive fine-tuning method for text-image retrieval that improves search relevance by incorporating user feedback, leveraging efficient indexing to maintain fast response times.

Contribution

It presents a novel interactive fine-tuning approach for CLIP-based search engines, enhancing accuracy without sacrificing speed through efficient indexing.

Findings

01

Improved relevance and accuracy in search results after fine-tuning

02

Maintains swift response times with efficient index structures

03

Enhances traditional CLIP-based retrieval with user-guided refinement

Abstract

The advent of text-image models, most notably CLIP, has significantly transformed the landscape of information retrieval. These models enable the fusion of various modalities, such as text and images. One significant outcome of CLIP is its capability to allow users to search for images using text as a query, as well as vice versa. This is achieved via a joint embedding of images and text data that can, for instance, be used to search for similar items. Despite efficient query processing techniques such as approximate nearest neighbor search, the results may lack precision and completeness. We introduce CLIP-Branches, a novel text-image search engine built upon the CLIP architecture. Our approach enhances traditional text-image search engines by incorporating an interactive fine-tuning phase, which allows the user to further concretize the search query by iteratively defining positive…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

cluel01/clip-branches
pytorchOfficial

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsImage Retrieval and Classification Techniques

MethodsContrastive Language-Image Pre-training