# Training medical students’ diagnostic reasoning skills using multivariate analysis

**Authors:** Fábio A. Schaberle, João Pestana, Luiz M. Santiago, Alberto A. C. C. Pais

PMC · DOI: 10.1186/s12909-026-08604-1 · BMC Medical Education · 2026-01-30

## TL;DR

This paper introduces a new method for teaching medical students to diagnose diseases by using multivariate analysis, combining clinical reasoning with data science skills.

## Contribution

A novel approach to medical education using multivariate analysis for diagnostic training, integrating data science and clinical reasoning.

## Key findings

- Multivariate analysis helps students visualize disease-symptom relationships through biplots and dendrograms.
- The method encourages comprehensive patient evaluation by highlighting rare diagnoses and prompting further inquiry.
- Combining programming with clinical training improves diagnostic reasoning and data analysis skills.

## Abstract

Contemporary medical education must address both information overload and the need to develop robust diagnostic reasoning skills for common and less prevalent diseases. While traditional methods remain foundational, integrating self-directed learning skills is critical to prepare future clinicians. To face these challenges, we propose employing multivariate analysis as a structured approach to diagnostic training, enabling students to systematically evaluate symptoms and signs, environmental contexts, and risk factors. This method is adaptable to both classroom and self-directed learning, offering a pragmatic tool for training clinical decision-making, reinforcing self-directed learning by integrating data science with diagnostic reasoning training.

The proposed method involves creating a structured database of diseases, their core symptoms and signs, and associated risk or environmental factors. Students then use this database as input for multivariate analysis (principal component analysis, PCA, and hierarchical cluster analysis, HCA) through guided R scripting exercises. By entering relevant symptoms and risk factors, learners can simulate diagnostic processes. The analysis generates: (1) biplots visualizing relationships between clinical features and diseases, and (2) dendrograms clustering clinically related conditions for comparative analysis.

We implemented an example of database using ICPC-2 nomenclature, containing diseases with associated symptoms and signs and risk factors. Diagnostic simulations demonstrated the application of the method across diverse clinical presentations, generating correlation plots and dendrograms for each case analysis. The outputs effectively showed both common diagnoses and rare conditions suggested by input symptoms and signs, prompting for the student to consider additional diagnostic factors. Notably, the system highlighted cases where uncommon diagnoses warranted further patient history (e.g., travel to endemic regions, family history), demonstrating its potential to train diagnostic reasoning by expanding students’ differential diagnosis considerations.

The process of building disease databases and performing multivariate analysis appears valuable for medical education, offering students opportunities to explore disease-symptom relationships while developing complementary programming skills. This approach may help medical students recognize how similar symptoms can lead to diagnostic challenges, encouraging more comprehensive patient evaluation. The methodology presented here could serve as a potential teaching tool for medical curricula, combining diagnostic reasoning practice with data analysis skills in a clinically relevant framework.

The online version contains supplementary material available at 10.1186/s12909-026-08604-1.

## Full-text entities

- **Species:** Homo sapiens (human, species) [taxon 9606]

## Full text

_Full body text omitted from this summary view._ Fetch the complete paper as Markdown: https://tomesphere.com/paper/PMC12934077/full.md

## Figures

4 figures with captions in the complete paper: https://tomesphere.com/paper/PMC12934077/full.md

## References

6 references — full list in the complete paper: https://tomesphere.com/paper/PMC12934077/full.md

---
Source: https://tomesphere.com/paper/PMC12934077