Developing a Dual-Stage Vision Transformer Model for Lung Disease Classification
Anirudh Mazumder, Jianguo Liu

TL;DR
This paper introduces a dual-stage vision transformer model combining ViT and Swin Transformer to classify 14 lung diseases from X-ray images, achieving high accuracy and aiding in rapid diagnosis.
Contribution
It presents a novel dual-stage transformer architecture specifically designed for lung disease classification from X-ray scans, integrating two transformer models for improved accuracy.
Findings
Achieved 92.06% accuracy on unseen test data.
Effective in classifying 14 different lung diseases.
Demonstrated potential for aiding clinical diagnosis.
Abstract
Lung diseases have become a prevalent problem throughout the United States, affecting over 34 million people. Accurate and timely diagnosis of the different types of lung diseases is critical, and Artificial Intelligence (AI) methods could speed up these processes. A dual-stage vision transformer is built throughout this research by integrating a Vision Transformer (ViT) and a Swin Transformer to classify 14 different lung diseases from X-ray scans of patients with these diseases. The proposed model achieved an accuracy of 92.06% on a label-level when making predictions on an unseen testing subset of the dataset after data preprocessing and training the neural network. The model showed promise for accurately classifying lung diseases and diagnosing patients who suffer from these harmful diseases.
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsBrain Tumor Detection and Classification · COVID-19 diagnosis using AI
MethodsAttention Is All You Need · Layer Normalization · Adam · Linear Layer · Residual Connection · Position-Wise Feed-Forward Layer · Label Smoothing · Byte Pair Encoding · Absolute Position Encodings · Vision Transformer
