Progressive Learning for Image Retrieval with Hybrid-Modality Queries

Yida Zhao; Yuqing Song; Qin Jin

arXiv:2204.11212·cs.CV·April 26, 2022

Progressive Learning for Image Retrieval with Hybrid-Modality Queries

Yida Zhao, Yuqing Song, Qin Jin

PDF

Open Access

TL;DR

This paper introduces a progressive learning framework for image retrieval using hybrid text-image queries, improving performance by decomposing the task into stages and adaptively weighting modalities.

Contribution

It proposes a three-stage progressive learning approach and a self-supervised adaptive weighting strategy to enhance hybrid-modality image retrieval performance.

Findings

01

Achieves 24.9% and 9.5% improvements in Recall@K on Fashion-IQ and Shoes datasets.

02

Outperforms state-of-the-art methods significantly.

03

Demonstrates effective knowledge transfer across domains.

Abstract

Image retrieval with hybrid-modality queries, also known as composing text and image for image retrieval (CTI-IR), is a retrieval task where the search intention is expressed in a more complex query format, involving both vision and text modalities. For example, a target product image is searched using a reference product image along with text about changing certain attributes of the reference image as the query. It is a more challenging image retrieval task that requires both semantic space learning and cross-modal fusion. Previous approaches that attempt to deal with both aspects achieve unsatisfactory performance. In this paper, we decompose the CTI-IR task into a three-stage learning problem to progressively learn the complex knowledge for image retrieval with hybrid-modality queries. We first leverage the semantic embedding space for open-domain image-text retrieval, and then…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsAdvanced Image and Video Retrieval Techniques · Multimodal Machine Learning Applications · Image Retrieval and Classification Techniques