# Predicting incident cardio-metabolic disease among persons with and without depressive and anxiety disorders: a machine learning approach

**Authors:** Arja O. Rydin, George Aalbers, Wessel A. van Eeden, Femke Lamers, Yuri Milaneschi, Brenda W. J. H. Penninx

PMC · DOI: 10.1007/s00127-025-02857-9 · Social Psychiatry and Psychiatric Epidemiology · 2025-02-18

## TL;DR

This study used machine learning to predict the development of cardio-metabolic diseases in people with and without depression or anxiety, but found limited predictive accuracy.

## Contribution

The study evaluates the added value of psychiatric and biological variables in predicting CMD using machine learning in a high-risk population.

## Key findings

- Machine learning models achieved moderate predictive accuracy (AUC-ROC up to 0.669) for CMD onset.
- Detailed psychiatric variables contributed little to prediction, while age and hypertension were most important.
- Combining domains did not significantly improve performance over demographics alone.

## Abstract

There is a global increase of cardiovascular disease and diabetes (Cardio-Metabolic diseases: CMD). Suffering from depression or anxiety disorders increases the probability of developing CMD. In this study we tested a wide array of predictors for the onset of CMD with Machine Learning (ML), evaluating whether adding detailed psychiatric or biological variables increases predictive performance.

We analysed data from the Netherlands Study of Depression and Anxiety, a longitudinal cohort study (N = 2071), using 368 predictors covering 4 domains (demographic, lifestyle & somatic, psychiatric, and biological markers). CMD onset (24% incidence) over a 9-year follow-up was defined using self-reported stroke, heart disease, diabetes with high fasting glucose levels and (antithrombotic, cardiovascular, or diabetes) medication use (ATC codes C01DA, C01-C05A-B, C07-C09A-B, C01DB, B01, A10A-X). Using different ML methods (Logistic regression, Support vector machine, Random forest, and XGBoost) we tested the predictive performance of single domains and domain combinations.

The classifiers performed similarly, therefore the simplest classifier (Logistic regression) was selected. The Area Under the Receiver Operator Characteristic Curve (AUC-ROC) achieved by singe domains ranged from 0.569 to 0.649. The combination of demographics, lifestyle & somatic indicators and psychiatric variables performed best (AUC-ROC = 0.669), but did not significantly outperform demographics. Age and hypertension contributed most to prediction; detailed psychiatric variables added relatively little.

In this longitudinal study, ML classifiers were not able to accurately predict 9-year CMD onset in a sample enriched of subjects with psychopathology. Detailed psychiatric/biological information did not substantially increase predictive performance.

The online version contains supplementary material available at 10.1007/s00127-025-02857-9.

## Linked entities

- **Diseases:** cardiovascular disease (MONDO:0004995), diabetes (MONDO:0005015), stroke (MONDO:0005098), heart disease (MONDO:0005267)

## Full-text entities

- **Diseases:** Cardio-Metabolic diseases (MESH:D008659), heart disease (MESH:D006331), Depression (MESH:D003866), stroke (MESH:D020521), Anxiety (MESH:D001007), psychiatric (MESH:D001523), hypertension (MESH:D006973), cardiovascular disease (MESH:D002318), diabetes (MESH:D003920), CMD (MESH:C565145), anxiety disorders (MESH:D001008)
- **Chemicals:** glucose (MESH:D005947)
- **Mutations:** C05A, A10A, C09A

## Full text

_Full body text omitted from this summary view._ Fetch the complete paper as Markdown: https://tomesphere.com/paper/PMC12162734/full.md

## Figures

3 figures with captions in the complete paper: https://tomesphere.com/paper/PMC12162734/full.md

## References

4 references — full list in the complete paper: https://tomesphere.com/paper/PMC12162734/full.md

---
Source: https://tomesphere.com/paper/PMC12162734