# Stratifying high-risk prediabetes clusters using blood-based epigenetic markers

**Authors:** Amandeep Singh, Reiner Jumpertz-von Schwartzenberg, Robert Wagner, Leontine Sandforth, Arvid Sandforth, Markus Jähnert, Marlene Ganslmeier, Stefan Kabisch, Nikolaos Perakakis, Hubert Preißl, Andreas Fritsche, Norbert Stefan, Dirk Walter, Meriem Ouni, Andreas L. Birkenfeld, Annette Schürmann

PMC · DOI: 10.1186/s40364-025-00887-8 · Biomarker Research · 2026-01-17

## TL;DR

This study uses blood-based epigenetic markers to identify high-risk prediabetes clusters, offering a simpler way to predict diabetes risk and complications.

## Contribution

A machine learning workflow was developed to identify blood-based epigenetic markers for stratifying prediabetes clusters.

## Key findings

- 1,557 CpG sites were identified as predictors for distinguishing prediabetes clusters with 92% accuracy in a replication cohort.
- Cluster-specific CpG sites were linked to pathways like TGF-β, MAPK, and Wnt/SMAD signaling, reflecting metabolic deterioration.
- Blood-based epigenetic signatures can replace complex clinical assessments for identifying high-risk prediabetes individuals.

## Abstract

Previously, we identified six prediabetes clusters, three at moderate and three at high-risk for type 2 diabetes and/or complications. While this novel classification could enable earlier and improved disease prevention, it relies on intensive clinical phenotyping. Here, we developed a machine learning workflow to identify blood-based epigenetic markers to distinguish between prediabetes clusters.

DNA methylation was profiled in blood cells of different cohorts including individuals that belong to clusters 2 (low-risk), 3, 5, and 6 (each high-risk) and data was subjected to a machine learning workflow.

In a discovery cohort (n = 187), we identified 1,557 CpG sites as predictors for clusters 2, 3, 5, and 6. These CpGs were sufficient to distinguish between individuals belonging to the high-risk clusters 3, 5 and 6 in an independent replication cohort (n = 146) with an accuracy of 92%. Between 300 and 339 CpG sites were specific for each cluster and the corresponding genes linked to TGF-β receptor and calcium signaling (cluster 3), MAPK cascade and ECM organization (cluster 5), and Wnt/SMAD signaling (cluster 6), mirroring the metabolic deterioration observed in each cluster.

Without the need for complex clinical measurements, the identified blood-based epigenetic signatures may improve the detection of individuals at high-risk of developing diabetes and complications and point to the potential molecular mechanism responsible for the heterogeneity in prediabetes. These markers highlight the potential of the blood epigenome as an effective proxy for predicting future complications and make extensive clinical assessments obsolete, enabling the identification of clusters in larger populations.

The online version contains supplementary material available at 10.1186/s40364-025-00887-8.

## Linked entities

- **Diseases:** type 2 diabetes (MONDO:0005148)

## Full-text entities

- **Genes:** CRP (C-reactive protein) [NCBI Gene 1401] {aka PTX1}, CDK5 (cyclin dependent kinase 5) [NCBI Gene 1020] {aka LIS7, PSSALRE}, Prdm16 (PR domain containing 16) [NCBI Gene 70673] {aka 5730557K01Rik, csp1, mel1}, Htra1 (HtrA serine peptidase 1) [NCBI Gene 56213] {aka HTRA, L56, Prss11, RSPP11}, RUNX2 (RUNX family transcription factor 2) [NCBI Gene 860] {aka AML3, CBF-alpha-1, CBFA1, CCD, CCD1, CLCD}, SOX8 (SRY-box transcription factor 8) [NCBI Gene 30812], SLC17A5 (solute carrier family 17 member 5) [NCBI Gene 26503] {aka AST, ISSD, NSD, SD, SIALIN, SIASD}, Tgfb1 (transforming growth factor, beta 1) [NCBI Gene 21803] {aka TGF-beta1, TGFbeta1, Tgfb, Tgfb-1}, LEF1 (lymphoid enhancer binding factor 1) [NCBI Gene 51176] {aka ECTD1, ECTD17, LEF-1, TCF10, TCF1ALPHA, TCF7L3}, KCNQ1 (potassium voltage-gated channel subfamily Q member 1) [NCBI Gene 3784] {aka ATFB1, ATFB3, JLNS1, KCNA8, KCNA9, KVLQT1}, SFRP1 (secreted frizzled related protein 1) [NCBI Gene 6422] {aka FRP, FRP-1, FRP1, FrzA, SARP2}, GPT (glutamic--pyruvic transaminase) [NCBI Gene 2875] {aka AAT1, ALT, ALT1, GPT1, SGPT}, INS (insulin) [NCBI Gene 3630] {aka IDDM, IDDM1, IDDM2, ILPR, IRDN, MODY10}, IGFBP5 (insulin like growth factor binding protein 5) [NCBI Gene 3488] {aka IBP5}, RAB27A (RAB27A, member RAS oncogene family) [NCBI Gene 5873] {aka GS2, HsT18676, RAB27, RAM}
- **Diseases:** UMAP (MESH:C567162), T2D (MESH:D003924), cancer (MESH:D009369), dilated cardiomyopathy (MESH:D002311), myocardial fibrosis (MESH:D005355), beta cell failure (MESH:D051437), MOD (MESH:C564833), obesity (MESH:D009765), cardiac dysfunction (MESH:D006331), heart and kidney diseases (MESH:D007674), inflammation (MESH:D007249), Metabolic dysfunction (MESH:D008659), chronic kidney disease (MESH:D051436), MASLD (MESH:D008107), age-related diabetes (MESH:D048909), PLIS (MESH:D011236), death (MESH:D003643), cardiovascular disease (MESH:D002318), weight loss (MESH:D015431), impaired insulin secretion (MESH:D007333), gestational diabetes (MESH:D016640), impaired fasting glucose and/or glucose tolerance (MESH:D018149), insulin-deficient diabetes (MESH:D003922), Diabetes (MESH:D003920), ASCVD (MESH:D050197)
- **Chemicals:** creatinine (MESH:D003404), Ca2+ (-), glucose (MESH:D005947), lipid (MESH:D008055), calcium (MESH:D002118), triglycerides (MESH:D014280), blood glucose (MESH:D001786), Bisulfite (MESH:C042345), cholesterol (MESH:D002784)
- **Species:** Mus musculus (house mouse, species) [taxon 10090]

## Full text

_Full body text omitted from this summary view._ Fetch the complete paper as Markdown: https://tomesphere.com/paper/PMC12829285/full.md

## Figures

4 figures with captions in the complete paper: https://tomesphere.com/paper/PMC12829285/full.md

## References

52 references — full list in the complete paper: https://tomesphere.com/paper/PMC12829285/full.md

---
Source: https://tomesphere.com/paper/PMC12829285