CNN-based Local Vision Transformer for COVID-19 Diagnosis
Hongyan Xu, Xiu Su, Dadong Wang

TL;DR
This paper introduces a CNN-enhanced Vision Transformer architecture called COVT, designed to improve COVID-19 diagnosis accuracy from limited datasets by combining local feature extraction with global information processing.
Contribution
It proposes a novel hybrid structure that integrates CNNs with ViT, specifically tailored for small COVID-19 datasets, enhancing feature richness and training stability.
Findings
Effective on COVID-19 datasets
Improves feature extraction over pure ViT
Demonstrates robustness across datasets
Abstract
Deep learning technology can be used as an assistive technology to help doctors quickly and accurately identify COVID-19 infections. Recently, Vision Transformer (ViT) has shown great potential towards image classification due to its global receptive field. However, due to the lack of inductive biases inherent to CNNs, the ViT-based structure leads to limited feature richness and difficulty in model training. In this paper, we propose a new structure called Transformer for COVID-19 (COVT) to improve the performance of ViT-based architectures on small COVID-19 datasets. It uses CNN as a feature extractor to effectively extract local structural information, and introduces average pooling to ViT's Multilayer Perception(MLP) module for global information. Experiments show the effectiveness of our method on the two COVID-19 datasets and the ImageNet dataset.
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsCOVID-19 diagnosis using AI · Digital Imaging for Blood Diseases · Brain Tumor Detection and Classification
MethodsAttention Is All You Need · Linear Layer · Byte Pair Encoding · Dense Connections · Absolute Position Encodings · Position-Wise Feed-Forward Layer · Label Smoothing · Adam · Dropout · Layer Normalization
