LookBench: A Live and Holistic Open Benchmark for Fashion Image Retrieval

Gensmo.ai; Chao Gao; Siqiao Xue; Jiwen Fu; Tingyi Gu; Shanshan Li; Fan Zhou

arXiv:2601.14706·cs.CV·April 14, 2026

LookBench: A Live and Holistic Open Benchmark for Fashion Image Retrieval

Gensmo.ai, Chao Gao, Siqiao Xue, Jiwen Fu, Tingyi Gu, Shanshan Li, Fan Zhou

PDF

2 Models 1 Datasets

TL;DR

LookBench is a comprehensive, live benchmark for fashion image retrieval that includes real and AI-generated images, aiming to evaluate and advance models in realistic e-commerce scenarios.

Contribution

The paper introduces LookBench, a new challenging, live, and periodically updated benchmark for fashion image retrieval, with open-source tools and a leaderboard.

Findings

01

Many models achieve below 60% Recall@1 on LookBench.

02

The proprietary model outperforms others on LookBench.

03

Both the proprietary and open-source models set new state-of-the-art on Fashion200K.

Abstract

In this paper, we present LookBench (We use the term "look" to reflect retrieval that mirrors how people shop -- finding the exact item, a close substitute, or a visually consistent alternative.), a live, holistic and challenging benchmark for fashion image retrieval in real e-commerce settings. LookBench includes both recent product images sourced from live websites and AI-generated fashion images, reflecting contemporary trends and use cases. Each test sample is time-stamped and we intend to update the benchmark periodically, enabling contamination-aware evaluation aligned with declared training cutoffs. Grounded in our fine-grained attribute taxonomy, LookBench covers single-item and outfit-level retrieval across. Our experiments reveal that LookBench poses a significant challenge on strong baselines, with many models achieving below $60%$ Recall@1. Our proprietary model achieves…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Models

Datasets

srpone/look-bench
dataset· 137 dl
137 dl

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.