Collaborative Group: Composed Image Retrieval via Consensus Learning from Noisy Annotations
Xu Zhang, Zhedong Zheng, Linchao Zhu, Yi Yang

TL;DR
This paper introduces Css-Net, a consensus learning framework for composed image retrieval that mitigates triplet ambiguity by leveraging diverse compositors and inter-compositor interaction, leading to improved retrieval accuracy.
Contribution
The paper proposes Css-Net, a novel consensus network with multiple compositors and a divergence loss to address triplet ambiguity in image-text retrieval tasks.
Findings
Achieves a 2.77% increase in R@10 on FashionIQ
Attains a 6.67% boost in R@50 on benchmark datasets
Demonstrates robustness against noisy triplet annotations
Abstract
Composed image retrieval extends content-based image retrieval systems by enabling users to search using reference images and captions that describe their intention. Despite great progress in developing image-text compositors to extract discriminative visual-linguistic features, we identify a hitherto overlooked issue, triplet ambiguity, which impedes robust feature extraction. Triplet ambiguity refers to a type of semantic ambiguity that arises between the reference image, the relative caption, and the target image. It is mainly due to the limited representation of the annotated text, resulting in many noisy triplets where multiple visually dissimilar candidate images can be matched to an identical reference pair (i.e., a reference image + a relative caption). To address this challenge, we propose the Consensus Network (Css-Net), inspired by the psychological concept that groups…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsMultimodal Machine Learning Applications · Image Retrieval and Classification Techniques · Advanced Image and Video Retrieval Techniques
MethodsFocus
