Convolutional Neural Networks and Vision Transformers for Fashion MNIST   Classification: A Literature Review

Sonia Bbouzidi; Ghazala Hcini; Imen Jdey; Fadoua Drira

arXiv:2406.03478·cs.CV·June 6, 2024·2 cites

Convolutional Neural Networks and Vision Transformers for Fashion MNIST Classification: A Literature Review

Sonia Bbouzidi, Ghazala Hcini, Imen Jdey, Fadoua Drira

PDF

Open Access

TL;DR

This literature review compares CNNs and Vision Transformers for Fashion MNIST classification, analyzing their strengths, limitations, and potential for combined use to improve e-commerce image recognition accuracy.

Contribution

It provides a comprehensive analysis of CNNs and ViTs in fashion image classification, highlighting factors influencing their performance and proposing combined architectures for better results.

Findings

01

ViTs effectively capture global context in images.

02

CNNs excel at recognizing local patterns.

03

Combining CNNs and ViTs can enhance classification accuracy.

Abstract

Our review explores the comparative analysis between Convolutional Neural Networks (CNNs) and Vision Transformers (ViTs) in the domain of image classification, with a particular focus on clothing classification within the e-commerce sector. Utilizing the Fashion MNIST dataset, we delve into the unique attributes of CNNs and ViTs. While CNNs have long been the cornerstone of image classification, ViTs introduce an innovative self-attention mechanism enabling nuanced weighting of different input data components. Historically, transformers have primarily been associated with Natural Language Processing (NLP) tasks. Through a comprehensive examination of existing literature, our aim is to unveil the distinctions between ViTs and CNNs in the context of image classification. Our analysis meticulously scrutinizes state-of-the-art methodologies employing both architectures, striving to identify…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsConsumer Perception and Purchasing Behavior

MethodsFocus