Exploring Fine-grained Retail Product Discrimination with Zero-shot Object Classification Using Vision-Language Models
Anil Osman Tur, Alessandro Conti, Cigdem Beyan, Davide Boscaini,, Roberto Larcher, Stefano Messelodi, Fabio Poiesi, Elisa Ricci

TL;DR
This paper introduces the MIMEX dataset for fine-grained retail product classification, benchmarks current vision-language models, and proposes an ensemble and class adaptation method to improve zero-shot classification performance in retail settings.
Contribution
The paper presents a new dataset, benchmarks existing models, and develops a novel ensemble and class adaptation approach for enhanced zero-shot retail product classification.
Findings
VLMs perform poorly on fine-grained retail classification
Ensemble approach outperforms individual VLMs
Class adaptation improves performance with limited data
Abstract
In smart retail applications, the large number of products and their frequent turnover necessitate reliable zero-shot object classification methods. The zero-shot assumption is essential to avoid the need for re-training the classifier every time a new product is introduced into stock or an existing product undergoes rebranding. In this paper, we make three key contributions. Firstly, we introduce the MIMEX dataset, comprising 28 distinct product categories. Unlike existing datasets in the literature, MIMEX focuses on fine-grained product classification and includes a diverse range of retail products. Secondly, we benchmark the zero-shot object classification performance of state-of-the-art vision-language models (VLMs) on the proposed MIMEX dataset. Our experiments reveal that these models achieve unsatisfactory fine-grained classification performance, highlighting the need for…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsAdvanced Image and Video Retrieval Techniques · Image Retrieval and Classification Techniques
MethodsContrastive Language-Image Pre-training
