Contrastive language and vision learning of general fashion concepts
Patrick John Chia, Giuseppe Attanasio, Federico Bianchi, Silvia, Terragni, Ana Rita Magalh\~aes, Diogo Goncalves, Ciro Greco, Jacopo Tagliabue

TL;DR
This paper introduces FashionCLIP, a contrastive learning model tailored for the fashion industry, enabling improved product retrieval, classification, and grounding, with the aim of creating more transferable representations for online shopping applications.
Contribution
The paper presents FashionCLIP, a novel contrastive learning model specifically designed for fashion, enhancing transferability and performance in various fashion-related tasks.
Findings
FashionCLIP effectively retrieves fashion products.
It improves classification accuracy for fashion items.
The model demonstrates strong grounding capabilities.
Abstract
The steady rise of online shopping goes hand in hand with the development of increasingly complex ML and NLP models. While most use cases are cast as specialized supervised learning problems, we argue that practitioners would greatly benefit from more transferable representations of products. In this work, we build on recent developments in contrastive learning to train FashionCLIP, a CLIP-like model for the fashion industry. We showcase its capabilities for retrieval, classification and grounding, and release our model and code to the community.
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsHandwritten Text Recognition Techniques · Image Retrieval and Classification Techniques
MethodsFashionCLIP · Contrastive Learning
