DAFM: Dynamic Adaptive Fusion for Multi-Model Collaboration in Composed Image Retrieval
Yawei Cai, Jiapeng Mi, Nan Ji, Haotian Rong, Yawei Zhang, Zhangti Li, Wenbin Guo, Rensong Xie

TL;DR
This paper introduces DAFM, a dynamic adaptive fusion method that leverages multiple models for improved composed image retrieval, addressing limitations of single-model approaches by adaptively balancing model contributions.
Contribution
The paper proposes a novel dynamic adaptive fusion framework that exploits complementary strengths of heterogeneous models for more accurate and robust image retrieval.
Findings
Achieved 93.21% Recall@10 on CIRR benchmark.
Surpassed recent baselines by up to 4.5% in Rmean.
Demonstrated robustness and consistent improvements across datasets.
Abstract
Composed Image Retrieval (CIR) is a cross-modal task that aims to retrieve target images from large-scale databases using a reference image and a modification text. Most existing methods rely on a single model to perform feature fusion and similarity matching. However, this paradigm faces two major challenges. First, one model alone can't see the whole picture and the tiny details at the same time; it has to handle different tasks with the same weights, so it often misses the small but important links between image and text. Second, the absence of dynamic weight allocation prevents adaptive leveraging of complementary model strengths, so the resulting embedding drifts away from the target and misleads the nearest-neighbor search in CIR. To address these limitations, we propose Dynamic Adaptive Fusion (DAFM) for multi-model collaboration in CIR. Rather than optimizing a single method in…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsImage Retrieval and Classification Techniques · Advanced Image and Video Retrieval Techniques · Multimodal Machine Learning Applications
