# Artificial Intelligence for Detecting Aortic Arch Calcification on Chest Radiographs: A Systematic Review

**Authors:** Krzysztof Żerdziński, Julita Janiec, Maja Dreger, Piotr Dudek, Iga Paszkiewicz, Adam Mitręga, Michał Bielówka, Alicja Nawrat, Jakub Kufel, Marcin Rojek

PMC · DOI: 10.3390/diagnostics16020243 · Diagnostics · 2026-01-12

## TL;DR

This paper reviews AI models for detecting aortic arch calcification on chest X-rays, finding they can be effective but need more standardization for reliable clinical use.

## Contribution

The study systematically evaluates AI diagnostic accuracy for aortic arch calcification detection and highlights challenges in model comparability and validation.

## Key findings

- AI models showed high diagnostic discrimination (AUROC 0.81–0.99) for detecting aortic arch calcification.
- Models exhibited significant trade-offs between sensitivity and specificity across different validation cohorts.
- Current limitations include low GRADE certainty due to methodological heterogeneity and lack of cross-sectional imaging reference standards.

## Abstract

Background/Objectives: Aortic-arch calcification (AAC) is a robust predictor of cardiovascular events often overlooked on routine chest radiographs (CXR). This systematic review aimed to evaluate the diagnostic accuracy of artificial intelligence (AI) models for detecting AAC on CXR and assess their potential for clinical implementation. Methods: The review followed PRISMA 2020 guidelines (PROSPERO: CRD420251208627). A search of Embase, PubMed, Scopus, and Web of Science was conducted (Jan 2020–Oct 2025) for studies evaluating AI models detecting AAC in adults. Bias was assessed using QUADAS-2. Due to methodological heterogeneity, a narrative synthesis was performed instead of a meta-analysis. Results: Out of 115 records, three retrospective studies (2022–2024) utilizing CNNs across ~2.7 million images were included. Models demonstrated high diagnostic discrimination (AUROC 0.81–0.99), though performance estimates were often attenuated in external cohorts. Pronounced sensitivity–specificity trade-offs occurred: one model achieved 95.9% recall, while another exhibited near-perfect specificity (0.99) despite markedly low sensitivity (0.22). Although the risk of bias was predominantly low, the overall GRADE certainty remained low due to methodological heterogeneity and the absence of cross-sectional imaging reference standards. Conclusions: Deep learning-based models reliably detect AAC on routine CXR, offering a scalable tool for opportunistic cardiovascular risk stratification. However, significant heterogeneity in model architectures and validation strategies currently limits broad comparability. Future research requires standardized annotation protocols and external validation to ensure clinical generalizability.

## Full-text entities

- **Diseases:** AAC (MESH:D001015)

## Full text

_Full body text omitted from this summary view._ Fetch the complete paper as Markdown: https://tomesphere.com/paper/PMC12839748/full.md

## Figures

3 figures with captions in the complete paper: https://tomesphere.com/paper/PMC12839748/full.md

## References

42 references — full list in the complete paper: https://tomesphere.com/paper/PMC12839748/full.md

---
Source: https://tomesphere.com/paper/PMC12839748