A Sanity Check on Composed Image Retrieval
Yikun Liu, Jiangchao Yao, Weidi Xie, Yanfeng Wang

TL;DR
This paper introduces a new benchmark and evaluation framework for Composed Image Retrieval, addressing existing limitations by controlling query variables and testing multi-round interactive performance.
Contribution
It proposes FISD, a semantically-diverse benchmark with generative models for precise evaluation, and an automatic multi-round evaluation framework for interactive scenarios.
Findings
FISD enables more accurate CIR evaluation across six dimensions.
The multi-round framework assesses models' adaptability in iterative retrieval.
Experiments show improved evaluation accuracy and insights into model performance.
Abstract
Composed Image Retrieval (CIR) aims to retrieve a target image based on a query composed of a reference image, and a relative caption that specifies the desired modification. Despite the rapid development of CIR models, their performance is not well characterized by existing benchmarks, which inherently contain indeterminate queries degrading the evaluation (i.e., multiple candidate images, rather than solely the target image, meet the query criteria), and have not considered their effectiveness in the context of the multi-round system. Motivated by this, we consider improving the evaluation procedure from two aspects: 1) we introduce FISD, a Fully-Informed Semantically-Diverse benchmark, which employs generative models to precisely control the variables of reference-target image pairs, enabling a more accurate evaluation of CIR methods across six dimensions, without query ambiguity; 2)…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
