Mask Usage Recognition using Vision Transformer with Transfer Learning and Data Augmentation
Hensel Donato Jahja, Novanto Yudistira, Sutrisno

TL;DR
This paper demonstrates that using Vision Transformers with transfer learning and data augmentation significantly improves mask usage classification accuracy on the MaskedFace-Net dataset, outperforming traditional CNNs.
Contribution
It introduces a novel application of Vision Transformers with transfer learning and data augmentation for mask classification, achieving high accuracy and surpassing CNN performance.
Findings
Best results with ViT Huge-14 using transfer learning and augmentation
Achieved over 95% accuracy on test data
Outperformed ResNet in mask classification tasks
Abstract
The COVID-19 pandemic has disrupted various levels of society. The use of masks is essential in preventing the spread of COVID-19 by identifying an image of a person using a mask. Although only 23.1% of people use masks correctly, Artificial Neural Networks (ANN) can help classify the use of good masks to help slow the spread of the Covid-19 virus. However, it requires a large dataset to train an ANN that can classify the use of masks correctly. MaskedFace-Net is a suitable dataset consisting of 137016 digital images with 4 class labels, namely Mask, Mask Chin, Mask Mouth Chin, and Mask Nose Mouth. Mask classification training utilizes Vision Transformers (ViT) architecture with transfer learning method using pre-trained weights on ImageNet-21k, with random augmentation. In addition, the hyper-parameters of training of 20 epochs, an Stochastic Gradient Descent (SGD) optimizer with a…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsCOVID-19 diagnosis using AI
