G-MIXER: Geodesic Mixup-based Implicit Semantic Expansion and Explicit Semantic Re-ranking for Zero-Shot Composed Image Retrieval

Jiyoung Lim; Heejae Yang; Jee-Hyong Lee

arXiv:2604.14710·cs.CV·April 17, 2026

G-MIXER: Geodesic Mixup-based Implicit Semantic Expansion and Explicit Semantic Re-ranking for Zero-Shot Composed Image Retrieval

Jiyoung Lim, Heejae Yang, Jee-Hyong Lee

PDF

1 Repo

TL;DR

G-MIXER is a training-free method that enhances zero-shot composed image retrieval by generating diverse candidate features through geodesic mixup and re-ranking with explicit semantics, achieving state-of-the-art results.

Contribution

It introduces a novel geodesic mixup technique for implicit semantic expansion and a re-ranking strategy using explicit semantics, all without additional training.

Findings

01

Achieves state-of-the-art performance on multiple ZS-CIR benchmarks.

02

Effectively balances retrieval diversity and accuracy.

03

Does not require additional training or fine-tuning.

Abstract

Composed Image Retrieval (CIR) aims to retrieve target images by integrating a reference image with a corresponding modification text. CIR requires jointly considering the explicit semantics specified in the query and the implicit semantics embedded within its bi-modal composition. Recent training-free Zero-Shot CIR (ZS-CIR) methods leverage Multimodal Large Language Models (MLLMs) to generate detailed target descriptions, converting the implicit information into explicit textual expressions. However, these methods rely heavily on the textual modality and fail to capture the fuzzy retrieval nature that requires considering diverse combinations of candidates. This leads to reduced diversity and accuracy in retrieval results. To address this limitation, we propose a novel training-free method, Geodesic Mixup-based Implicit semantic eXpansion and Explicit semantic Re-ranking for ZS-CIR…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

maya0395/gmixer
github

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.