Automatic segmentation of coronary plaques in coronary CT angiography using neural networks
Mahdi Moosavi, Keno Bressem, Rafael Adolf, Anastasiya Valentik, Albrecht Will, Eva Hendrich, Martin Hadamitzky

TL;DR
This paper introduces a machine learning model that automatically detects coronary plaques in CT scans, improving accuracy and efficiency in diagnosing heart disease.
Contribution
A novel neural network-based approach for automated coronary plaque segmentation in CCTA with high sensitivity and specificity.
Findings
The model achieved 84.8% sensitivity and 82.3% precision for plaque detection.
Vessel-level sensitivity was 94.7% and specificity was 84.9%.
Small, non-calcified plaques and artifacts remain challenging for the model.
Abstract
Rapid and accurate detection of coronary plaques on CCTA is critical for timely CAD diagnosis but is limited by reader workload and interobserver variability. Our objective was to evaluate the effectiveness of machine learning (ML) based on automated segmentation of coronary plaques in coronary computed tomography angiography (CCTA). We retrospectively analyzed CCTA scans from 1,642 patients (4,711 training vessels, 1,112 test vessels, plus 1,613 negative vessels) using an nnU-Net 3D full resolution architecture. Hyperparameters (batch size, learning-rate scheduler, epochs) were optimized on a 10% subset of the training dataset, and the final model was trained with 5-fold cross-validation on positive cases. Test performance was assessed at the plaque, vessel, and exam levels. Of 2,090 ground-truth plaques, the model achieved 1,772 TP for 84.8% sensitivity (95% CI 83.2–86.3%), 382 FP for…
Genes, proteins, chemicals, diseases, species, mutations and cell lines named across the full text — each resolved to its canonical identifier and authoritative record.
Click any figure to enlarge with its caption.
Fig 1
Fig 2
Fig 3
Fig 4
Fig 5
Fig 6Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsRetinal Imaging and Analysis · Cardiac Imaging and Diagnostics · Coronary Interventions and Diagnostics
Introduction
Coronary artery disease (CAD) remains a major cause of preventable mortality, particularly among older individuals, with its incidence continuing to rise globally [1,2]. Accurate estimation of disease burden is crucial for identifying patients at high risk of major adverse cardiac events [3]. Coronary computed tomography angiography (CCTA) has emerged as a powerful noninvasive diagnostic tool for quantifying CAD [4,5], with recent guidelines emphasizing its role as the primary modality for assessing CAD in symptomatic patients [6].
Despite CCTA’s capability for volumetric CAD quantification, clinical practice continues to rely on visual assessment to determine the composition and extent of CAD [7,8]. Quantitative plaque assessment remains labor-intensive, as it requires evaluation of multiple coronary segments and plaque components by trained readers, which limits scalability in routine clinical practice [9]. This limitation highlights the need for more efficient and accessible diagnostic strategies.
The integration of deep learning (DL) into cardiovascular imaging have shown potential to improve diagnostic accuracy and patient outcomes [10,11]. ML algorithms can rapidly and accurately analyze large datasets, providing physicians with invaluable insights [12–14]. Deep learning has been successfully applied to various aspects of cardiovascular imaging, including the diagnosis of acute ischemic stroke through CT angiography data analysis [15], and the automation of complex image segmentation tasks essential for identifying vascular pathologies [16,17].
By automating plaque detection, ML based approaches may support more efficient image interpretation workflows and facilitate earlier assessment of coronary artery disease. While imaging acquisition represents a fixed cost, automated plaque detection can reduce downstream interpretation time and labor-related expenses, consistent with economic analyses reported for automated pulmonary nodule detection [18]. However, the application of deep learning models to automate coronary plaque segmentation remains underexplored. At the time of this study, no directly comparable coronary plaque segmentation models with publicly available implementations were available that could be applied to our dataset for direct head-to-head evaluation. One established approach is the use of semantic segmentation frameworks such as nnU-Net, which are specifically designed for medical image segmentation and automatically adapt to diverse biomedical imaging datasets without requiring manual parameter tuning [19]. This research aims to investigate the efficacy of a neural network-based approach in automating coronary plaque segmentation and evaluate its impact on improving the diagnostic accuracy and accessibility of CCTA. By focusing on this ML application, we seek to contribute to ongoing efforts to optimize cardiovascular diagnostic workflows, potentially reducing healthcare costs and improving patient outcomes through earlier detection and intervention of coronary artery disease.
Materials and methods
In this study, to reduce the selection bias, we enrolled 1642 consecutive patients who underwent CCTA for suspected coronary artery disease (CAD) between 2014 and 2018 in German Heart Center of Munich, TUM University Hospital. Patients with stent implantations or coronary bypasses were excluded from analyses, as the scope of the study was to detect plaques in individuals without known CAD. All patients gave written informed consent before the investigation. The data acquisition protocol was approved by the Clinical Ethics Committee of the University Hospital rechts der Isar, Munich. All procedures were carried out in accordance with the relevant guidelines and regulations.
Post hoc analyses within the hospital do not require patient consent according to local legislation (Bayrisches Krankenhausgesetz, Art 24).
Image acquisition
CCTA examinations were performed using a 2x192-slice dual-source SOMATOM Force CT scanner (Siemens Medical Solutions, Erlangen, Germany). Patients were placed in a supine position and received intravenous Metoprolol if heart rate exceeded 60 beats/min and Nitroglycerin if systolic blood pressure was above 100 mmHg, barring contraindications. Prospective ECG-synchronized CTA of coronary arteries was performed during inspiration at the end-diastolic phase (70% of RR interval). Tube voltage was set at 120 kV, with tube current adapted automatically based on body size using CARE Dose. After calculation of circulation time using a test bolus, Contrast media (Imeron 350, Bracco Imaging GmbH, Konstanz, Germany) was administered followed by a 50 ml saline chaser using a flow rate of 5 ml/s. If this method failed to provide diagnostic images, the patient underwent another imaging procedure using a prospective ECG-gated method to acquire multiple segments of coronary arteries during diastolic phases. This method offered more reliable image acquisition but could produce step artifacts due to multiple acquisitions. Original CCTA images were acquired and reconstructed at an in-plane resolution of 512 × 512 pixels, with the number of axial slices varying according to the scan field of view.
Annotations and database preparation
The coronary artery tree was initially segmented using commercially available software (Syngo.via, Siemens Healthineers, Erlangen, Germany) and manually corrected by an experienced radiologist. Vessel regions containing non-calcified and partially calcified plaques were manually annotated. Calcified plaques were automatically annotated using a simple HU-based thresholding algorithm along vessel centerlines; however, this approach captured only the high-density calcified component and did not delineate the full plaque extent [10]. Individual coronary arteries were then extracted from the coronary tree and used as independent 3D inputs for model training and evaluation.
Patients’ examinations have been fully anonymized and exported using commercially available software (Syngo.via, Siemens Healthineers, Erlangen, Germany) without any traceability. Annotations ranged from 0–2, with 0 for background, 1 for non-pathological segments of coronary arteries, and 2 for plaques. The database was divided into a negative database without any plaques and a positive database with the existence of at least one plaque (Figs 1 and 2).
Annotation example.Non-pathological segments labeled as 1 and the plaque labeled as 2.
Example of stretched reconstruction of coronary arteries.2D view of 5 largest arteries of a patient (a), Annotation overlay on same arteries with light gray as segment with plaques and dark gray as non-pathological segments (b).
Data splitting
Due to the nnU-Net framework’s requirement for annotations for all training cases, only scans containing coronary plaques were included in the training set. To enhance the evaluation of negative predictability, 1613 individual coronary arteries from negative cases were added to the test dataset. The positive database was divided into training and test datasets, with an 80% and 20% split (using a fixed seed 42) based on patients’ IDs to prevent data leakage. The training dataset comprised axial 3D reconstructions of 4711 individual coronary arteries, while the test dataset included 1112 individual coronary arteries.
Model training and hyperparameter tuning
The experiments were conducted on a Linux system (Ubuntu 18.04) equipped with 32GB RAM, an 8-core CPU (Intel® Core™ i7-9700K @ 3.60GHz), and an RTX 2080ti GPU. We used nnU-Net with Python 3.11 and PyTorch 2.1 with CUDA 11.8 versions matching nnU-Net requirements. Data preprocessing, integrity checks, and resampling to 1 mm^3^ isotropic spacing used nnU-Net’s built-in pipeline with CT-specific normalization.
The “3d_fullres” neural network architecture was used, implementing a PlainConvUNet model with a batch size of 20 and input patches of size 48x48x48. The network featured an encoder-decoder architecture with four stages, each containing two convolutional layers. It employed three max-pooling layers per axis during down sampling, with kernel sizes ranging from 1x1x1 to 2x2x2. Convolutional layers in both encoder and decoder stages used 3x3x3 kernels, with a maximum of 320 features. Data resampling and normalization were performed using nnU-Net’s CT-specific preprocessing pipeline, which resamples images to the target isotropic spacing using spline interpolation and applies global intensity normalization with clipping and z-score normalization [19]. During training, image augmentation was applied using the default nnU-Net data augmentation pipeline, which includes standard spatial and intensity-based transformations [19]. These settings follow the nnU-Net design, which selects patch sizes and network depth to balance anatomical context coverage and GPU memory constraints for a given dataset [19].
Hyperparameter tuning and training
A 10% random subset of the training data (one-fold) was used for tuning (Table 1). The Dice similarity coefficient (1) was used as the standard overlap metric to evaluate the model’s performance after each parameter change. Hyperparameter tuning was restricted to parameters exposed by the nnU-Net framework, while core optimization settings such as optimizer type, initial learning rate, and weight decay were retained at their default nnU-Net values.
Table 1: Hyper parameter tuning.
Training
The final model configuration used a batch size of 20 and a cosine annealing learning rate scheduler. We trained the nnU-Net model using a standard 5-fold cross-validation for 1000 epochs per fold.
Cross-fold evaluation
To assess the stability of model performance across validation splits, we conducted non-parametric cross-fold comparisons using the Kruskal–Wallis test for all major metrics (Dice, Precision, Sensitivity, Specificity). The Kruskal–Wallis statistic quantified rank-based differences between validation sets and was interpreted in conjunction with the corresponding p-value [20]. All reported p-values exceeded the conventional significance threshold (α = 0.05), indicating no statistically significant differences in model performance across validation folds (Table 2).
Table 2: Cross-fold performance across validation splits.
Inference and evaluation
For inference, nnU-Net used an ensemble of all five cross-validation models, with predictions averaged across folds [19]. Post-processing applied binary opening, a morphological operation consisting of erosion followed by dilation, using a 3 × 3 × 3 structuring element for three iterations to remove small spurious predictions. An ablation analysis, in which the number of binary opening iterations was varied from 1 to 6 on a validation set generated during training, confirmed that three iterations provided a balance between false positive suppression and preservation of plaque integrity.
Evaluation methodology
Given the anatomical complexities of coronary arteries, which span across the middle of a 3D matrix, a 3D radius dilation mask centered on vessel centerlines was introduced. This matrix aimed to preserve the center-lined coronary arteries while eliminating noise or partial annotations from neighboring vessels. Leveraging the Labeling function within the SciPy library, individual connected segmentations were separated and labeled within both ground truths and predictions.
Results
Plaque-level evaluation
From a total of 2090 plaques across 1112 vessels in our test dataset, the neural network model correctly identified 1772 plaques (TP) (Fig 3). It failed to identify 318 plaques (FN) and mistakenly marked 382 parts of coronary arteries as plaques (FP) that had no overlap with ground truth annotations (Fig 4).
Examples of Predictions with dice scores more than 0.88.The model showed a promising performance in finding medium to large plaque annotations.
Example of small plaque annotations (< 99 mm3).The model has failed to identify some smaller non-calcified plaques.
Vessel-level evaluation
Using a dataset of 1613 negative vessels, the model correctly identified 1369 as normal and misclassified 244 as pathological (specificity = 84.9% (95% CI 83.0–86.5)). Out of 1112 positive vessels, the model correctly marked 1053 as positive but failed to detect plaques in 59 positive vessels (sensitivity = 94.7% (95% CI 93.2–95.9%)). The model demonstrated a positive predictive value (PPV (2)) of 81.2% (95% CI 79.0–83.2%) at the vessel level.
Examination-level evaluation
Among 231 patients with coronary artery disease, the model successfully identified 225 as positive (at least one true positive vessel) but missed 6 positive patients. In the negative dataset, the model correctly classified 530 patients as negative but incorrectly marked 135 as having CAD. Patient-level analysis showed a standard deviation of 1.19 for FP distribution and 1.2 for FN distribution over examinations.
In negative patients, 79.7% had zero FP plaques, while 20.3% had one or more FP plaques, primarily one or two, indicating instances of over-detection. The model demonstrated an NPV (3) of 98.8% (95% CI 97.6–99.5%) at the examination level.
Segmentation performance and error analysis
We evaluated the model’s plaque segmentation performance using Dice score distribution for segmentation size and presence of calcifications. This Analysis showed coherent trends across subgroups. Non-calcified plaques (0 mm^3^) had a higher false-negative rate of 39.4% with a median Dice of 0.66 (95% CI 0.59–0.74), whereas segmentations containing larger than 5 mm^3^ calcification demonstrated a 0% false-negative rate and a Dice of 0.93 (95% CI 0.92–0.93) (Table 3). Size-based assessment indicated that small plaques segmentations (<200 mm^3^) accounted for 97.16% of false negatives, while segmentations larger than approximately 400 mm^3^ did not exhibit false-negative errors in our dataset (Table 4). The model achieved a median Dice score of 0.88 (95% CI 0.87–0.89) across all plaque volumes (Fig 5).
Table 3: Detection performance stratified by calcification volume.
Table 4: Detection performance stratified by plaque volume.
Illustration of model’s performance.Dice score distribution based on plaque annotation volumes (top). A frequency curve illustrated the distribution of plaque volumes within the dataset (bottom).
Discussion
In this study, we developed and evaluated a neural network model based on nnU-Net for automated detection and segmentation of coronary plaques in CCTA images.
Our results compare favorably with previously reported performance in the literature on automated plaque detection and segmentation. Liu et al. reported a 3D convolutional network that categorized coronary plaques as calcified, mixed, or non-calcified, achieving Dice scores per vessel of 0.83, 0.68, and 0.73, respectively [21]. In comparison, our model achieved higher Dice scores, particularly for larger plaques, indicating improved segmentation accuracy. Masuda et al. demonstrated a convolutional neural network with an accuracy of 0.86 and an F1 score of 0.85 in plaque detection [22]. For binary segmentation tasks, the Dice similarity coefficient is mathematically equivalent to the F1-score, allowing direct comparison with these results. Zreik et al. reported an accuracy of 0.80 for detecting coronary stenosis [23]. Our model showed comparable performance in these aspects.
Dey et al. [14] used machine learning (ML) to predict lesion-specific ischemia, integrating various quantitative plaque measures from coronary CTA. Their model achieved an area-under-the-curve (AUC) of 0.84, outperforming individual CTA measures. While our study did not directly assess ischemia prediction, our accuracy in plaque detection and segmentation suggests that our model could potentially serve as a valuable input for such ischemia prediction models, potentially improving their performance further.
Hong et al. [24] developed a deep learning approach for stenosis quantification from coronary CTA, achieving excellent correlations with expert readers for minimal luminal area, percent diameter stenosis, and percent contrast density difference. While their focus was on stenosis quantification, our model’s strength lies in comprehensive plaque detection and segmentation. The high performance of both approaches suggests that deep learning methods can effectively automate various aspects of CCTA analysis, potentially complementing each other in comprehensive CAD assessment.
The high negative predictive value of our model suggests its potential utility as a cost-effective exclusion test for CAD. This aligns with the findings of van Velzen et al. [25], who reported high performance in assigning cardiovascular disease risk categories from cardiac CT, although they investigated calcium score CT only. Our model’s ability to accurately identify patients without plaques and thus without coronary artery disease could be particularly valuable in clinical settings for rapidly triaging patients and potentially reducing the need for unnecessary invasive procedures.
Limitations
Several limitations merit discussion.
First, the detection of smaller non-calcified plaques remains challenging, as evidenced by a higher occurrence of false negatives in this subgroup. Such plaques represent early atherosclerotic changes. The subtle intensity differences and variable arterial geometry complicate accurate segmentation. Addressing this challenge will require enriched training datasets emphasizing small plaque examples, as well as advanced loss functions or attention mechanisms to prioritize detection of smaller non-calcified components. For example, focal loss has been shown to improve sensitivity for under-represented targets by down-weighting easy background examples [26], while attention mechanisms can help networks focus on subtle, localized features [27]. Although such approaches have shown promise in prior work, they are not part of the standard nnU-Net framework, which relies on Dice and cross-entropy loss.
Second, image quality factors—including motion or step artifacts and severe vessel narrowing at branch points—contributed to false positives. We found that approximately 30% of false positives did not correspond to actual pathology; the remaining false positives were due to artifacts such as step or motion artifacts (Fig 6). Although experienced radiologists can correctly identify most artifactual detections as non-pathological (e.g., step artifacts), some coronary segments may remain non-diagnostic due to limited image quality (e.g., blooming from calcifications or motion-related artifacts). In such cases, further evaluation using alternative methods, such as invasive coronary angiography or stress imaging, may be warranted. Third, the model was trained and evaluated using data from a single center. No publicly available annotated CCTA dataset for coronary plaques was accessible at the time of the study. To support independent assessment, we provided a sample dataset and pipelines for database conversion and inference, which enable other centers to prepare their internal data for model testing.
False positive prediction.A motion artifact which caused a false prediction as plaque.
Furthermore, our reliance on stretched (straightened) reconstructions—though standard on many vendor platforms—limits direct applicability to native curved MPR images. While stretched reconstructions may not be the primary viewing method, they serve as an additional analytical tool that complements, rather than replaces, traditional visualization [28,29]. Our method can be seamlessly integrated into existing workflows, with results mapped back to conventional views by reversing the mathematical transformation. We plan to extend our pipeline into a fully automatic segmentation tool that operates directly on the original MPR data, extracts individual coronary arteries, applies segmentation, and subsequently maps all identified plaques back onto the original MPR images.
Conclusion
Our nnU-Net-based model demonstrates strong segmentation performance for medium-to-large and calcified coronary plaques in CCTA, with high negative predictive value for ruling out significant disease. Performance remains limited for small and non-calcified lesions, and variability in image quality reduces reliability in challenging cases. Considering this limitation, translation toward clinical use will require further work, including improvements in detecting these lesion types through training and validation on multi-center datasets and systematic comparison against expert radiologists. We believe that the presented results could help encourage multi-center collaborations and support future progress toward semi-automated plaque detection in coronary CTA.
The reference list from the paper itself. Each links out to its DOI / PubMed record.
- 1Mathers CD, Stevens GA, Boerma T, White RA, Tobias MI. Causes of international increases in older age life expectancy. Lancet. 2015;385(9967):540–8. doi: 10.1016/S 0140-6736(14)60569-9 25468166 · doi ↗ · pubmed ↗
- 2GBD 2017 Causes of Death Collaborators. Global, regional, and national age-sex-specific mortality for 282 causes of death in 195 countries and territories, 1980-2017: a systematic analysis for the Global Burden of Disease Study 2017. Lancet. 2018;392(10159):1736–88. doi: 10.1016/S 0140-6736(18)32203-7 30496103 PMC 6227606 · doi ↗ · pubmed ↗
- 3Jávorszky N, Homonnay B, Gerstenblith G, Bluemke D, Kiss P, Török M, et al. Deep learning-based atherosclerotic coronary plaque segmentation on coronary CT angiography. Eur Radiol. 2022;32(10):7217–26. doi: 10.1007/s 00330-022-08801-8 35524783 · doi ↗ · pubmed ↗
- 4Knuuti J, Wijns W, Saraste A, Capodanno D, Barbato E, Funck-Brentano C, et al. 2019 ESC Guidelines for the diagnosis and management of chronic coronary syndromes. Eur Heart J. 2020;41(3):407–77. doi: 10.1093/eurheartj/ehz 425 31504439 · doi ↗ · pubmed ↗
- 5Moss AJ, Williams MC, Newby DE, Nicol ED. The Updated NICE Guidelines: Cardiac CT as the First-Line Test for Coronary Artery Disease. Curr Cardiovasc Imaging Rep. 2017;10(5):15. doi: 10.1007/s 12410-017-9412-6 28446943 PMC 5368205 · doi ↗ · pubmed ↗
- 6Leipsic J, Abbara S, Achenbach S, Cury R, Earls JP, Mancini GJ, et al. SCCT guidelines for the interpretation and reporting of coronary CT angiography: a report of the Society of Cardiovascular Computed Tomography Guidelines Committee. J Cardiovasc Comput Tomogr. 2014;8(5):342–58. doi: 10.1016/j.jcct.2014.07.003 25301040 · doi ↗ · pubmed ↗
- 7Kolossváry M, Szilveszter B, Merkely B, Maurovich-Horvat P. Plaque imaging with CT-a comprehensive review on coronary CT angiography based risk assessment. Cardiovasc Diagn Ther. 2017;7(5):489–506. doi: 10.21037/cdt.2016.11.06 29255692 PMC 5716945 · doi ↗ · pubmed ↗
- 8Haase R, Schlattmann P, Gueret P, Andreini D, Pontone G, Alkadhi H, et al. Diagnosis of obstructive coronary artery disease using computed tomography angiography in patients with stable chest pain depending on clinical probability and in clinically important subgroups: meta-analysis of individual patient data. BMJ. 2019;365:l 1945. doi: 10.1136/bmj.l 1945 31189617 PMC 6561308 · doi ↗ · pubmed ↗
