FashionMV: Product-Level Composed Image Retrieval with Multi-View Fashion Data

Peng Yuan; Bingyin Mei; Hui Zhang

arXiv:2604.10297·cs.CV·April 14, 2026

FashionMV: Product-Level Composed Image Retrieval with Multi-View Fashion Data

Peng Yuan, Bingyin Mei, Hui Zhang

PDF

1 Repo 1 Models 1 Datasets

TL;DR

FashionMV introduces a large-scale multi-view fashion dataset and a novel product-level image retrieval framework that leverages multimodal large language models with multi-view reasoning capabilities.

Contribution

The paper presents the first multi-view fashion dataset and a new modeling framework that improves product-level image retrieval using multimodal large language models.

Findings

01

Alignment is the most critical mechanism for performance.

02

Two-stage dialogue architecture is essential for effective alignment.

03

Supervised fine-tuning and chain-of-thought are partially redundant for knowledge injection.

Abstract

Composed Image Retrieval (CIR) retrieves target images using a reference image paired with modification text. Despite rapid advances, all existing methods and datasets operate at the image level -- a single reference image plus modification text in, a single target image out -- while real e-commerce users reason about products shown from multiple viewpoints. We term this mismatch View Incompleteness and formally define a new Multi-View CIR task that generalizes standard CIR from image-level to product-level retrieval. To support this task, we construct FashionMV, the first large-scale multi-view fashion dataset for product-level CIR, comprising 127K products, 472K multi-view images, and over 220K CIR triplets, built through a fully automated pipeline leveraging large multimodal models. We further propose ProCIR (Product-level Composed Image Retrieval), a modeling framework built upon a…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

yuandaxia2001/FashionMV
github

Models

🤗
yuandaxia/ProCIR
model· 16 dl· ♡ 2
16 dl♡ 2

Datasets

yuandaxia/FashionMV
dataset· 79 dl
79 dl

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.