LungX: A Hybrid EfficientNet-Vision Transformer Architecture with Multi-Scale Attention for Accurate Pneumonia Detection
Mansur Yerzhanuly

TL;DR
LungX is a hybrid deep learning model combining EfficientNet, CBAM attention, and Vision Transformers, achieving state-of-the-art accuracy in pneumonia detection from chest X-rays with interpretable lesion localization.
Contribution
The paper introduces LungX, a novel hybrid architecture that integrates multi-scale features, attention mechanisms, and global context modeling for improved pneumonia diagnosis.
Findings
Achieves 86.5% accuracy and 0.943 AUC on chest X-ray datasets.
Outperforms EfficientNet-B0 baseline by 6.7% AUC.
Provides interpretable attention maps for lesion localization.
Abstract
Pneumonia remains a leading global cause of mortality where timely diagnosis is critical. We introduce LungX, a novel hybrid architecture combining EfficientNet's multi-scale features, CBAM attention mechanisms, and Vision Transformer's global context modeling for enhanced pneumonia detection. Evaluated on 20,000 curated chest X-rays from RSNA and CheXpert, LungX achieves state-of-the-art performance (86.5 percent accuracy, 0.943 AUC), representing a 6.7 percent AUC improvement over EfficientNet-B0 baselines. Visual analysis demonstrates superior lesion localization through interpretable attention maps. Future directions include multi-center validation and architectural optimizations targeting 88 percent accuracy for clinical deployment as an AI diagnostic aid.
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsCOVID-19 diagnosis using AI · Domain Adaptation and Few-Shot Learning · AI in cancer detection
