COVID-VIT: Classification of COVID-19 from CT chest images based on vision transformer models
Xiaohong Gao, Yu Qian, Alice Gao

TL;DR
This paper develops and evaluates vision transformer models for classifying COVID-19 from chest CT images, demonstrating improved accuracy over traditional CNNs in a competitive challenge setting.
Contribution
Introduces the application of vision transformer models to COVID-19 CT image classification, showing superior performance compared to DenseNet CNNs.
Findings
ViT achieved an F1 score of 0.76
ViT outperformed DenseNet with an F1 score of 0.72
Models enable rapid, accurate, and explainable COVID-19 diagnosis
Abstract
This paper is responding to the MIA-COV19 challenge to classify COVID from non-COVID based on CT lung images. The COVID-19 virus has devastated the world in the last eighteen months by infecting more than 182 million people and causing over 3.9 million deaths. The overarching aim is to predict the diagnosis of the COVID-19 virus from chest radiographs, through the development of explainable vision transformer deep learning techniques, leading to population screening in a more rapid, accurate and transparent way. In this competition, there are 5381 three-dimensional (3D) datasets in total, including 1552 for training, 374 for evaluation and 3455 for testing. While most of the data volumes are in axial view, there are a number of subjects' data are in coronal or sagittal views with 1 or 2 slices are in axial view. Hence, while 3D data based classification is investigated, in this…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsCOVID-19 diagnosis using AI · Anomaly Detection Techniques and Applications · Machine Learning in Healthcare
MethodsMulti-Head Attention · Attention Is All You Need · Linear Layer · Residual Connection · Layer Normalization · 1x1 Convolution · Dropout · Vision Transformer · Max Pooling · Convolution
