OpenFashionCLIP: Vision-and-Language Contrastive Learning with   Open-Source Fashion Data

Giuseppe Cartella; Alberto Baldrati; Davide Morelli; Marcella Cornia,; Marco Bertini; Rita Cucchiara

arXiv:2309.05551·cs.CV·September 12, 2023

OpenFashionCLIP: Vision-and-Language Contrastive Learning with Open-Source Fashion Data

Giuseppe Cartella, Alberto Baldrati, Davide Morelli, Marcella Cornia,, Marco Bertini, Rita Cucchiara

PDF

Open Access 3 Repos

TL;DR

OpenFashionCLIP introduces a vision-and-language contrastive learning framework trained solely on open-source fashion data, demonstrating strong out-of-domain generalization and outperforming existing methods in accuracy and recall across multiple benchmarks.

Contribution

It presents a novel contrastive learning approach using open-source, diverse fashion data, enhancing generalization and performance in fashion-related multimodal tasks.

Findings

01

Significant out-of-domain generalization capability.

02

Consistent improvements over state-of-the-art in accuracy.

03

Enhanced recall in multimodal retrieval tasks.

Abstract

The inexorable growth of online shopping and e-commerce demands scalable and robust machine learning-based solutions to accommodate customer requirements. In the context of automatic tagging classification and multimodal retrieval, prior works either defined a low generalizable supervised learning approach or more reusable CLIP-based techniques while, however, training on closed source data. In this work, we propose OpenFashionCLIP, a vision-and-language contrastive learning method that only adopts open-source fashion data stemming from diverse domains, and characterized by varying degrees of specificity. Our approach is extensively validated across several tasks and benchmarks, and experimental results highlight a significant out-of-domain generalization capability and consistent improvements over state-of-the-art methods both in terms of accuracy and recall. Source code and trained…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsCancer-related molecular mechanisms research · Multimodal Machine Learning Applications · Domain Adaptation and Few-Shot Learning

MethodsContrastive Learning