# Scalable Unimodal and Multimodal Deep Learning for Multi-Label Chest Disease Detection: A Comparative Analysis

**Authors:** Diğdem Orhan, Murat Ucan, Reda Alhajj, Mehmet Kaya

PMC · DOI: 10.3390/diagnostics16050734 · Diagnostics · 2026-03-01

## TL;DR

This paper compares unimodal and multimodal deep learning models for detecting multiple chest diseases using X-rays and clinical data, finding that combining data types improves accuracy.

## Contribution

The study introduces a systematic comparison of unimodal and multimodal deep learning models for multi-label chest disease detection using varying data scales.

## Key findings

- Multimodal models outperformed unimodal models across all architectures and data scales.
- Larger datasets improved model generalization and reduced performance variance, especially for rare diseases.

## Abstract

Background/Objectives: Early and accurate diagnosis of chest diseases is a critical challenge in clinical practice, particularly in scenarios where multiple pathologies may coexist. While deep learning-based medical image analysis has shown promising results, most existing studies rely on unimodal data and fixed-scale datasets, limiting their generalizability and clinical relevance. In this study, we present a comprehensive comparative analysis of unimodal and multimodal deep learning models for multi-label chest disease classification using chest X-ray images and associated clinical metadata. Methods: A total of twelve models were developed based on three widely used convolutional neural network architectures—ResNet50, EfficientNetB3, and DenseNet121—under both unimodal (image-only) and multimodal (image + clinical data) configurations. To systematically investigate the impact of data scale, experiments were conducted on two distinct versions: the Random Sample of NIH Chest X-ray Dataset and the NIH Chest X-ray Dataset, containing 5606 and 121,120 samples, respectively. Model performance was evaluated using label-based Area Under the Receiver Operating Characteristic Curve (AUROC) metrics. Results: Experimental results demonstrate that multimodal fusion consistently outperforms unimodal approaches across all architectures and data scales, with more pronounced improvements observed in large-scale settings. Furthermore, increasing data volume leads to improved generalization and reduced performance variance, particularly for rare pathologies. Conclusions: These findings highlight the effectiveness of multimodal, multi-label learning in enhancing diagnostic accuracy and support the development of robust clinical decision support systems for chest disease assessment.

## Full-text entities

- **Diseases:** Chest Disease (MESH:D002637)

## Full text

_Full body text omitted from this summary view._ Fetch the complete paper as Markdown: https://tomesphere.com/paper/PMC12984696/full.md

## Figures

9 figures with captions in the complete paper: https://tomesphere.com/paper/PMC12984696/full.md

## References

41 references — full list in the complete paper: https://tomesphere.com/paper/PMC12984696/full.md

---
Source: https://tomesphere.com/paper/PMC12984696