ENCLIP: Ensembling and Clustering-Based Contrastive Language-Image   Pretraining for Fashion Multimodal Search with Limited Data and Low-Quality   Images

Prithviraj Purushottam Naik; Rohit Agarwal

arXiv:2411.16096·cs.CV·November 26, 2024

ENCLIP: Ensembling and Clustering-Based Contrastive Language-Image Pretraining for Fashion Multimodal Search with Limited Data and Low-Quality Images

Prithviraj Purushottam Naik, Rohit Agarwal

PDF

Open Access

TL;DR

ENCLIP enhances CLIP's performance for fashion multimodal search by ensembling models and clustering images, effectively addressing limited data and low-quality images to improve search accuracy.

Contribution

This paper introduces ENCLIP, a novel ensembling and clustering-based method to improve CLIP's effectiveness in fashion search with scarce and low-quality data.

Findings

01

Improved search accuracy in fashion multimodal tasks.

02

Effective handling of limited data and low-quality images.

03

Demonstrated superiority over baseline models.

Abstract

Multimodal search has revolutionized the fashion industry, providing a seamless and intuitive way for users to discover and explore fashion items. Based on their preferences, style, or specific attributes, users can search for products by combining text and image information. Text-to-image searches enable users to find visually similar items or describe products using natural language. This paper presents an innovative approach called ENCLIP, for enhancing the performance of the Contrastive Language-Image Pretraining (CLIP) model, specifically in Multimodal Search targeted towards the domain of fashion intelligence. This method focuses on addressing the challenges posed by limited data availability and low-quality images. This paper proposes an algorithm that involves training and ensembling multiple instances of the CLIP model, and leveraging clustering techniques to group similar…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsImage Retrieval and Classification Techniques · Multimodal Machine Learning Applications

MethodsContrastive Language-Image Pre-training