Highlighting What Matters: Promptable Embeddings for Attribute-Focused Image Retrieval
Siting Li, Xiang Gao, Simon Shaolei Du

TL;DR
This paper introduces promptable embeddings for attribute-focused image retrieval, demonstrating improved performance over traditional global embeddings by highlighting relevant attributes, with scalable strategies for real-world deployment.
Contribution
It proposes a novel promptable embedding approach that enhances attribute-specific retrieval, addressing limitations of existing global embedding methods in handling detailed queries.
Findings
Promptable embeddings improve Recall@5 by 15% with pre-defined prompts.
Linear approximation of embeddings yields an 8% improvement during inference.
Current CLIP-like and MLLM-based retrievers struggle with attribute-focused queries.
Abstract
While an image is worth more than a thousand words, only a few provide crucial information for a given task and thus should be focused on. In light of this, ideal text-to-image (T2I) retrievers should prioritize specific visual attributes relevant to queries. To evaluate current retrievers on handling attribute-focused queries, we build COCO-Facet, a COCO-based benchmark with 9,112 queries about diverse attributes of interest. We find that CLIP-like retrievers, which are widely adopted due to their efficiency and zero-shot ability, have poor and imbalanced performance, possibly because their image embeddings focus on global semantics and subjects while leaving out other details. Notably, we reveal that even recent Multimodal Large Language Model (MLLM)-based, stronger retrievers with a larger output dimension struggle with this limitation. Hence, we hypothesize that retrieving with…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
Taxonomy
TopicsImage Retrieval and Classification Techniques · Advanced Image and Video Retrieval Techniques
MethodsFocus · Balanced Selection
