Retrieve Fast, Rerank Smart: Cooperative and Joint Approaches for   Improved Cross-Modal Retrieval

Gregor Geigle; Jonas Pfeiffer; Nils Reimers; Ivan Vuli\'c; Iryna; Gurevych

arXiv:2103.11920·cs.CV·February 22, 2022·1 cites

Retrieve Fast, Rerank Smart: Cooperative and Joint Approaches for Improved Cross-Modal Retrieval

Gregor Geigle, Jonas Pfeiffer, Nils Reimers, Ivan Vuli\'c, Iryna, Gurevych

PDF

Open Access 1 Repo

TL;DR

This paper introduces a cooperative retrieve-and-rerank framework that enhances cross-modal retrieval by combining efficient bi-encoders for initial retrieval with a cross-encoder for refined ranking, achieving better accuracy and efficiency.

Contribution

It presents a novel fine-tuning approach that transforms pretrained multi-modal models into efficient retrieval systems using shared-weight bi-encoders and cross-encoders.

Findings

01

Improved retrieval accuracy across multiple benchmarks.

02

Significant reduction in retrieval latency.

03

Effective joint fine-tuning of components enhances performance.

Abstract

Current state-of-the-art approaches to cross-modal retrieval process text and visual input jointly, relying on Transformer-based architectures with cross-attention mechanisms that attend over all words and objects in an image. While offering unmatched retrieval performance, such models: 1) are typically pretrained from scratch and thus less scalable, 2) suffer from huge retrieval latency and inefficiency issues, which makes them impractical in realistic applications. To address these crucial gaps towards both improved and efficient cross-modal retrieval, we propose a novel fine-tuning framework that turns any pretrained text-image multi-modal model into an efficient retrieval model. The framework is based on a cooperative retrieve-and-rerank approach which combines: 1) twin networks (i.e., a bi-encoder) to separately encode all items of a corpus, enabling efficient initial retrieval,…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

UKPLab/MMT-Retrieval
pytorchOfficial

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsMultimodal Machine Learning Applications · Advanced Image and Video Retrieval Techniques · Domain Adaptation and Few-Shot Learning