TL;DR
HABIT is a robust progressive learning framework for composed image retrieval that effectively handles noisy triplet data by estimating sample quality and leveraging model collaboration, outperforming existing methods.
Contribution
The paper introduces HABIT, a novel framework with modules for mutual knowledge estimation and dual-consistency progressive learning, enhancing robustness in noisy CIR scenarios.
Findings
HABIT significantly outperforms most methods under various noise ratios.
The framework demonstrates superior robustness and retrieval performance on standard CIR datasets.
Extensive experiments validate the effectiveness of HABIT in noisy environments.
Abstract
Composed Image Retrieval (CIR) is a flexible image retrieval paradigm that enables users to accurately locate the target image through a multimodal query composed of a reference image and modification text. Although this task has demonstrated promising applications in personalized search and recommendation systems, it encounters a severe challenge in practical scenarios known as the Noise Triplet Correspondence (NTC) problem. This issue primarily arises from the high cost and subjectivity involved in annotating triplet data. To address this problem, we identify two central challenges: the precise estimation of composed semantic discrepancy and the insufficient progressive adaptation to modification discrepancy. To tackle these challenges, we propose a cHrono-synergiA roBust progressIve learning framework for composed image reTrieval (HABIT), which consists of two core modules. First,…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
