# Evidence-Guided Diagnostic Reasoning for Pediatric Chest Radiology Based on Multimodal Large Language Models

**Authors:** Yuze Zhao, Qing Wang, Yingwen Wang, Ruiwei Zhao, Rui Feng, Xiaobo Zhang

PMC · DOI: 10.3390/jimaging12030111 · Journal of Imaging · 2026-03-06

## TL;DR

This paper introduces a two-stage AI system for diagnosing pediatric chest X-rays that improves accuracy and trustworthiness by combining image analysis with medical knowledge.

## Contribution

The novel approach integrates a vision-language model with a large language model using medical evidence and domain knowledge for pediatric chest radiology.

## Key findings

- The proposed method achieves 90.1% diagnostic accuracy on the VinDr-PCXR dataset.
- It outperforms state-of-the-art baselines by up to 13.1% in diagnosis accuracy.
- The system demonstrates 82.5% AUC and 70.9% F1-score, showing strong performance in pediatric chest X-ray diagnosis.

## Abstract

Pediatric respiratory diseases are a leading cause of hospital admissions and childhood mortality worldwide, highlighting the critical need for accurate and timely diagnosis to support effective treatment and long-term care. Chest radiography remains the most widely used imaging modality for pediatric pulmonary assessment. Consequently, reliable AI-assisted diagnostic methods are essential for alleviating the workload of clinical radiologists. However, most existing deep learning-based approaches are data-driven and formulate diagnosis as a black-box image classification task, resulting in limited interpretability and reduced clinical trustworthiness. To address these challenges, we propose a trustworthy two-stage diagnostic paradigm for pediatric chest X-ray diagnosis that closely aligns with the radiological workflow in clinical practice, in which the diagnosis procedure is constrained by evidence. In the first stage, a vision–language model fine-tuned on pediatric data identifies radiological findings from chest radiographs, producing structured and interpretable diagnostic evidence. In the second stage, a multimodal large language model integrates the radiograph, extracted findings, patient demographic information, and external medical domain knowledge with RAG mechanism to generate the final diagnosis. Experiments conducted on the VinDr-PCXR dataset demonstrate that our method achieves 90.1% diagnostic accuracy, 70.9% F1-score, and 82.5% AUC, representing up to a 13.1% increase in diagnosis accuracy over the state-of-the-art baselines. These results validate the effectiveness of combining multimodal reasoning with explicit medical evidence and domain knowledge, and indicate the strong potential of the proposed approach for trustworthy pediatric radiology diagnosis.

## Full-text entities

- **Diseases:** visual abnormalities (MESH:D014786), respiratory disease (MESH:D012140), Other (MESH:D058497), MLLM (MESH:D007806), COVID-19 (MESH:D000086382), injury to (MESH:D014947), hallucination (MESH:D006212), Bronchiolitis (MESH:D001988), respiratory infections (MESH:D012141), deaths (MESH:D003643), Broncho-pneumonia (MESH:D011014), Bronchitis (MESH:D001991), pulmonary diseases (MESH:D008171)
- **Chemicals:** GPT-4o (-)
- **Species:** Homo sapiens (human, species) [taxon 9606]

## Full text

_Full body text omitted from this summary view._ Fetch the complete paper as Markdown: https://tomesphere.com/paper/PMC13028563/full.md

## Figures

4 figures with captions in the complete paper: https://tomesphere.com/paper/PMC13028563/full.md

## References

42 references — full list in the complete paper: https://tomesphere.com/paper/PMC13028563/full.md

---
Source: https://tomesphere.com/paper/PMC13028563