Self-Training Boosted Multi-Factor Matching Network for Composed Image   Retrieval

Haokun Wen; Xuemeng Song; Jianhua Yin; Jianlong Wu; Weili Guan,; Liqiang Nie

arXiv:2305.09979·cs.MM·December 2, 2024·1 cites

Self-Training Boosted Multi-Factor Matching Network for Composed Image Retrieval

Haokun Wen, Xuemeng Song, Jianhua Yin, Jianlong Wu, Weili Guan,, Liqiang Nie

PDF

Open Access

TL;DR

This paper introduces LIMN+, a semi-supervised multi-faceted matching network with self-training for improved composed image retrieval, effectively modeling complex query-target relations and utilizing unlabeled data.

Contribution

The work proposes a novel multi-faceted matching network combined with an iterative self-training paradigm to leverage unlabeled data in composed image retrieval.

Findings

01

LIMN+ outperforms state-of-the-art methods on three datasets.

02

The self-training approach enhances model generalization.

03

Effective modeling of multi-faceted matching factors improves retrieval accuracy.

Abstract

The composed image retrieval (CIR) task aims to retrieve the desired target image for a given multimodal query, i.e., a reference image with its corresponding modification text. The key limitations encountered by existing efforts are two aspects: 1) ignoring the multi-faceted query-target matching factors; 2) ignoring the potential unlabeled reference-target image pairs in existing benchmark datasets. To address these two limitations is non-trivial due to the following challenges: 1) how to effectively model the multi-faceted matching factors in a latent way without direct supervision signals; 2) how to fully utilize the potential unlabeled reference-target image pairs to improve the generalization ability of the CIR model. To address these challenges, in this work, we first propose a muLtI-faceted Matching Network (LIMN), which consists of three key modules: multi-grained image/text…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsAdvanced Image and Video Retrieval Techniques · Image Retrieval and Classification Techniques · Multimodal Machine Learning Applications