# Layer by Layer: Assessing AI Diagnostic Accuracy With Incremental Case Information in Neuroradiology

**Authors:** Golnaz Lotfian, Miral Jhaveri, Sumeet G Dua, Pokhraj P Suthar

PMC · DOI: 10.7759/cureus.85874 · Cureus · 2025-06-12

## TL;DR

This study evaluates how well Google Gemini, an AI model, improves its diagnostic accuracy in neuroradiology as more case information is provided.

## Contribution

The study introduces a novel method of assessing AI diagnostic performance with incremental case data in neuroradiology.

## Key findings

- Gemini's diagnostic accuracy increased from 3.5% to 45.7% as more case data was provided.
- Spine cases had the highest accuracy (51.9%), followed by head and neck (45.5%) and brain (44.0%).
- The improvement in performance over time was statistically significant (p < 0.0000000001).

## Abstract

Aim

Artificial intelligence (AI) has proven tremendous potential in improving diagnostic accuracy and efficiency in radiology. This study assesses the diagnostic performance of Google Gemini (version 1.5 Flash; Google DeepMind, Mountain View, California, USA), a proprietary large language model, in interpreting challenging diagnostic cases from the American Journal of Neuroradiology’s (AJNR) "Case of the Month" series.

Materials and methods

We analyzed 143 neuroradiology cases spanning brain, head and neck, and spine areas. Each case evolved over four weeks, starting with clinical history and followed by incremental imaging findings. Google Gemini was often prompted with the question, "What is the diagnosis?" Its accuracy was assessed at each level and across specialty categories. The data used were publicly available, and no ethical approval was necessary.

Results

Gemini's diagnosis accuracy improved with new case data, from 3.5% with history alone to 45.7% after complete imaging was supplied. Accuracy by category was highest in spine cases (51.9%), followed by head and neck (45.5%) and brain (44.0%). A chi-square test for trend verified that the performance increase over time was statistically significant (p < 0.0000000001).

Conclusion

Google Gemini displays moderate diagnosis accuracy that improves with accumulated information. While encouraging, its shortcomings underline the necessity for continual validation and transparency. This study shows the expanding relevance of AI in neuroradiology and the necessity of comprehensive evaluation before clinical integration.

## Full-text entities

- **Genes:** NINL (ninein like) [NCBI Gene 22981] {aka NLP}
- **Diseases:** Stroke (MESH:D020521), ICH (MESH:D020300), edema (MESH:D004487), HAT (MESH:D006470), LLM (MESH:D007806), thromboembolism (MESH:D013923), lesion (MESH:D009059), disorders of the brain, head and neck, and spine (MESH:D006258), brain injury (MESH:D001930), brain malignancies (MESH:D001932), ischemic stroke (MESH:D002544), neurological decline (MESH:D009461), hemispheric infarction (MESH:D007238), AI (MESH:C538142), tumor (MESH:D009369), intracerebral hemorrhage (MESH:D002543), MCA infarcts (MESH:D020244), traumatic brain injury (MESH:D000070642), PE (MESH:D011655), AJNR (MESH:D006478), COVID-19 (MESH:D000086382), spinal disorders (MESH:D013118), cerebral edema (MESH:D001929), glioma (MESH:D005910), multiple sclerosis lesion (MESH:D009103)
- **Chemicals:** Gemini (-)
- **Species:** Homo sapiens (human, species) [taxon 9606]

## Full text

_Full body text omitted from this summary view._ Fetch the complete paper as Markdown: https://tomesphere.com/paper/PMC12255534/full.md

## Figures

5 figures with captions in the complete paper: https://tomesphere.com/paper/PMC12255534/full.md

## References

28 references — full list in the complete paper: https://tomesphere.com/paper/PMC12255534/full.md

---
Source: https://tomesphere.com/paper/PMC12255534