MVC: A Multi-Task Vision Transformer Network for COVID-19 Diagnosis from Chest X-ray Images
Huyen Tran, Duc Thanh Nguyen, John Yearwood

TL;DR
This paper introduces MVC, a multi-task vision transformer model that simultaneously classifies COVID-19 from chest X-rays and identifies affected regions, improving performance over existing methods.
Contribution
The paper presents a novel multi-task vision transformer framework that unifies disease classification and affected region identification in chest X-ray analysis.
Findings
MVC outperforms baseline models in classification accuracy.
MVC effectively identifies affected regions in X-ray images.
The approach demonstrates superior results on benchmark COVID-19 datasets.
Abstract
Medical image analysis using computer-based algorithms has attracted considerable attention from the research community and achieved tremendous progress in the last decade. With recent advances in computing resources and availability of large-scale medical image datasets, many deep learning models have been developed for disease diagnosis from medical images. However, existing techniques focus on sub-tasks, e.g., disease classification and identification, individually, while there is a lack of a unified framework enabling multi-task diagnosis. Inspired by the capability of Vision Transformers in both local and global representation learning, we propose in this paper a new method, namely Multi-task Vision Transformer (MVC) for simultaneously classifying chest X-ray images and identifying affected regions from the input data. Our method is built upon the Vision Transformer but extends its…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsCOVID-19 diagnosis using AI · AI in cancer detection · Radiomics and Machine Learning in Medical Imaging
MethodsMulti-Head Attention · Attention Is All You Need · Dense Connections · Vision Transformer · Linear Layer · Label Smoothing · Absolute Position Encodings · Adam · Residual Connection · Focus
