CNN-based Local Vision Transformer for COVID-19 Diagnosis

Hongyan Xu; Xiu Su; Dadong Wang

arXiv:2207.02027·eess.IV·July 6, 2022·1 cites

CNN-based Local Vision Transformer for COVID-19 Diagnosis

Hongyan Xu, Xiu Su, Dadong Wang

PDF

Open Access

TL;DR

This paper introduces a CNN-enhanced Vision Transformer architecture called COVT, designed to improve COVID-19 diagnosis accuracy from limited datasets by combining local feature extraction with global information processing.

Contribution

It proposes a novel hybrid structure that integrates CNNs with ViT, specifically tailored for small COVID-19 datasets, enhancing feature richness and training stability.

Findings

01

Effective on COVID-19 datasets

02

Improves feature extraction over pure ViT

03

Demonstrates robustness across datasets

Abstract

Deep learning technology can be used as an assistive technology to help doctors quickly and accurately identify COVID-19 infections. Recently, Vision Transformer (ViT) has shown great potential towards image classification due to its global receptive field. However, due to the lack of inductive biases inherent to CNNs, the ViT-based structure leads to limited feature richness and difficulty in model training. In this paper, we propose a new structure called Transformer for COVID-19 (COVT) to improve the performance of ViT-based architectures on small COVID-19 datasets. It uses CNN as a feature extractor to effectively extract local structural information, and introduces average pooling to ViT's Multilayer Perception(MLP) module for global information. Experiments show the effectiveness of our method on the two COVID-19 datasets and the ImageNet dataset.

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsCOVID-19 diagnosis using AI · Digital Imaging for Blood Diseases · Brain Tumor Detection and Classification

MethodsAttention Is All You Need · Linear Layer · Byte Pair Encoding · Dense Connections · Absolute Position Encodings · Position-Wise Feed-Forward Layer · Label Smoothing · Adam · Dropout · Layer Normalization