Heterogeneous Uncertainty-Guided Composed Image Retrieval with Fine-Grained Probabilistic Learning
Haomiao Tang, Jinpeng Wang, Minyi Zhao, Guanghao Meng, Ruisheng Luo, Long Chen, Shu-Tao Xia

TL;DR
This paper introduces a novel probabilistic framework for composed image retrieval that models uncertainties at a fine-grained level, improving robustness and discrimination by representing queries and targets with Gaussian embeddings and heterogenous uncertainty estimation.
Contribution
It proposes a heterogeneous uncertainty-guided paradigm with Gaussian embeddings and dynamic weighting, addressing limitations of previous probabilistic methods in composed image retrieval.
Findings
Outperforms state-of-the-art methods on benchmark datasets.
Effectively models multi-modal and uni-modal uncertainties.
Enhances discriminative learning through uncertainty-guided objectives.
Abstract
Composed Image Retrieval (CIR) enables image search by combining a reference image with modification text. Intrinsic noise in CIR triplets incurs intrinsic uncertainty and threatens the model's robustness. Probabilistic learning approaches have shown promise in addressing such issues; however, they fall short for CIR due to their instance-level holistic modeling and homogeneous treatment of queries and targets. This paper introduces a Heterogeneous Uncertainty-Guided (HUG) paradigm to overcome these limitations. HUG utilizes a fine-grained probabilistic learning framework, where queries and targets are represented by Gaussian embeddings that capture detailed concepts and uncertainties. We customize heterogeneous uncertainty estimations for multi-modal queries and uni-modal targets. Given a query, we capture uncertainties not only regarding uni-modal content quality but also multi-modal…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
Taxonomy
TopicsAdvanced Image and Video Retrieval Techniques · Image Retrieval and Classification Techniques · Multimodal Machine Learning Applications
