Knowledge-Enhanced Dual-stream Zero-shot Composed Image Retrieval

Yucheng Suo; Fan Ma; Linchao Zhu; Yi Yang

arXiv:2403.16005·cs.CV·March 26, 2024·2 cites

Knowledge-Enhanced Dual-stream Zero-shot Composed Image Retrieval

Yucheng Suo, Fan Ma, Linchao Zhu, Yi Yang

PDF

Open Access 1 Repo

TL;DR

This paper introduces KEDs, a knowledge-enhanced dual-stream framework for zero-shot composed image retrieval that models detailed attributes and aligns visual and textual semantics, outperforming previous methods.

Contribution

The paper proposes a novel dual-stream framework that incorporates a database for attribute modeling and aligns pseudo-word tokens with textual concepts in a zero-shot setting.

Findings

01

KEDs outperforms previous zero-shot CIR methods on multiple benchmarks.

02

The framework effectively models detailed attributes like color and layout.

03

Explicit alignment of visual tokens with text improves retrieval accuracy.

Abstract

We study the zero-shot Composed Image Retrieval (ZS-CIR) task, which is to retrieve the target image given a reference image and a description without training on the triplet datasets. Previous works generate pseudo-word tokens by projecting the reference image features to the text embedding space. However, they focus on the global visual representation, ignoring the representation of detailed attributes, e.g., color, object number and layout. To address this challenge, we propose a Knowledge-Enhanced Dual-stream zero-shot composed image retrieval framework (KEDs). KEDs implicitly models the attributes of the reference images by incorporating a database. The database enriches the pseudo-word tokens by providing relevant images and captions, emphasizing shared attribute information in various aspects. In this way, KEDs recognizes the reference image from diverse perspectives. Moreover,…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

suoych/keds
pytorchOfficial

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsAdvanced Image and Video Retrieval Techniques · Image Retrieval and Classification Techniques · Multimodal Machine Learning Applications

MethodsFocus