Dynamic Weighted Combiner for Mixed-Modal Image Retrieval

Fuxiang Huang; Lei Zhang; Xiaowei Fu; Suqi Song

arXiv:2312.06179·cs.CV·December 12, 2023·1 cites

Dynamic Weighted Combiner for Mixed-Modal Image Retrieval

Fuxiang Huang, Lei Zhang, Xiaowei Fu, Suqi Song

PDF

Open Access 1 Repo 1 Video

TL;DR

This paper introduces a Dynamic Weighted Combiner (DWC) for mixed-modal image retrieval that effectively addresses modality contribution disparities and labeling noise, significantly improving retrieval performance on real-world datasets.

Contribution

The paper presents a novel DWC framework with an Editable Modality De-equalizer, a dynamic soft-similarity label generator, and a CLIP-based mutual enhancement module, advancing mixed-modal retrieval methods.

Findings

01

Outperforms state-of-the-art methods on real-world datasets

02

Effectively handles modality contribution disparities

03

Reduces impact of labeling noise in web datasets

Abstract

Mixed-Modal Image Retrieval (MMIR) as a flexible search paradigm has attracted wide attention. However, previous approaches always achieve limited performance, due to two critical factors are seriously overlooked. 1) The contribution of image and text modalities is different, but incorrectly treated equally. 2) There exist inherent labeling noises in describing users' intentions with text in web datasets from diverse real-world scenarios, giving rise to overfitting. We propose a Dynamic Weighted Combiner (DWC) to tackle the above challenges, which includes three merits. First, we propose an Editable Modality De-equalizer (EMD) by taking into account the contribution disparity between modalities, containing two modality feature editors and an adaptive weighted combiner. Second, to alleviate labeling noises and data bias, we propose a dynamic soft-similarity label generator (SSG) to…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

fuxianghuang1/dwc
pytorchOfficial

Videos

Dynamic Weighted Combiner for Mixed-Modal Image Retrieval· underline

Taxonomy

TopicsAdvanced Image and Video Retrieval Techniques · Image Retrieval and Classification Techniques · Multimodal Machine Learning Applications