FashionLens: Toward Versatile Fashion Image Retrieval via Task-Adaptive Learning

Haokun Wen; Xuemeng Song; Xinghao Xie; Xiaolin Chen; Xiangyu Zhao; Weili Guan

arXiv:2605.22552·cs.CV·May 22, 2026

FashionLens: Toward Versatile Fashion Image Retrieval via Task-Adaptive Learning

Haokun Wen, Xuemeng Song, Xinghao Xie, Xiaolin Chen, Xiangyu Zhao, Weili Guan

PDF

1 Repo

TL;DR

FashionLens introduces a versatile, task-adaptive framework for fashion image retrieval that unifies multiple retrieval scenarios and demonstrates state-of-the-art performance on a comprehensive new benchmark.

Contribution

The paper presents FashionLens, a novel multimodal large language model-based framework with task-specific calibrators and adaptive sampling, along with the U-FIRE benchmark for diverse fashion retrieval tasks.

Findings

01

FashionLens outperforms existing methods on U-FIRE benchmark.

02

It generalizes well to unseen retrieval tasks.

03

The framework effectively handles diverse query formats and search intentions.

Abstract

Fashion image retrieval is a cornerstone of modern e-commerce systems. A unified framework that supports diverse query formats and search intentions is highly desired in practice. However, existing approaches focus on narrow retrieval tasks and do not fully capture such diversity. Therefore, in this work, we aim to develop a unified framework capable of handling diverse realistic fashion retrieval scenarios, achieving truly versatile fashion image retrieval. To establish a data foundation, we first introduce U-FIRE, a comprehensive benchmark that consolidates fragmented fashion datasets into a unified collection, supplemented by two manually curated datasets for testing generalization. Building upon this, we propose FashionLens, a unified framework based on Multimodal Large Language Models. To handle divergent matching objectives, we design a Proposal-Guided Spherical Query Calibrator…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

haokunwen/FashionLens
github

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.