Real-World Utility of GWAS-Based Diabetes Mellitus Panel Testing
In Hwa Jeong, Kyung-Won Hong, Ja-Eun Choi, Bo-Kyung Shine

TL;DR
This study shows that a genetic test can predict diabetes risk in healthy people and suggests that smoking can increase risk in those already genetically predisposed.
Contribution
The study demonstrates the clinical utility of a PRS-based diabetes panel and identifies gene-smoking interactions in risk prediction.
Findings
The very high-risk group had significantly higher fasting blood glucose and HbA1c levels and a 4.5-fold increased risk of developing diabetes.
Smoking increased diabetes risk by 25% in the very high-risk genetic group.
HHEX variants in the high-risk group showed the greatest susceptibility, especially among current smokers.
Abstract
This study evaluated the clinical utility of a polygenic risk score (PRS)-based multigene panel test for predicting diabetes mellitus (DM) in a healthy population. A total of 302 individuals underwent genetic testing using the HelloGene™ DM panel, which includes four DM-related single nucleotide polymorphisms (CDKAL1, HHEX, KCNQ1, and TCF7L2). PRS values were calculated using an algorithm developed from the Korean Genome and Epidemiology Study (KoGES; n = 39,605), and participants were classified into four genetic risk groups (low, moderate, high, and very high). Fasting blood glucose, glycated hemoglobin (HbA1c), and body mass index were assessed at baseline and after at least three years of follow-up, and lifestyle factors including smoking, alcohol consumption, and exercise status were recorded. No significant differences in age, sex, or lifestyle habits were observed among PRS…
Genes, proteins, chemicals, diseases, species, mutations and cell lines named across the full text — each resolved to its canonical identifier and authoritative record.
Click any figure to enlarge with its caption.
Figure 1
Figure 2
Figure 3Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsGenetic Associations and Epidemiology · BRCA gene mutations in cancer · Nutrition, Genetics, and Disease
1. Introduction
Type 2 diabetes mellitus (T2DM) is a chronic metabolic disorder characterized by hyperglycemia and insulin resistance, which arises from a complex interplay of genetic, environmental, and behavioral factors [1]. The key contributors to T2DM include dietary habits, physical inactivity, obesity, sleep disturbances, and socioeconomic status. In South Korea, the prevalence of diabetes in 2022 was 12.5% among individuals aged 19 years and older and 14.8% among those aged 30 years and older. Although the age-standardized prevalence has remained relatively stable over the past decade, the absolute number of individuals with diabetes continues to increase due to population aging, and the projected prevalence is expected to rise substantially by 2051 [2]. This epidemiologic pattern underscores the need for effective strategies to identify high-risk individuals and implement targeted preventive measures.
Genetic predisposition plays a pivotal role in the development of T2DM, by contributing to both insulin resistance and beta-cell dysfunction [1]. Since the first genome-wide association study (GWAS) by the Wellcome Trust Case Control Consortium in 2007 [3], numerous studies have identified T2DM-associated genetic variants in various populations. Among these, four loci, CDKAL1 (rs7754840), HHEX (rs1111875), KCNQ1 (rs2237892), and TCF7L2 (rs7903146) have consistently shown strong associations with T2DM in Western and Asian populations, including Koreans. Variants in the CDKAL1 and HHEX loci have been associated with increased T2DM risk, with per-allele odds ratios typically reported in the range of 1.10–1.25, while KCNQ1 and TCF7L2 variants show comparable effect sizes across multiple ancestries. The CDKAL1 locus has been linked to an increased risk of T2DM in the Japanese population [4], with C allele carriers exhibiting reduced first-phase insulin release and impaired proinsulin conversion [5,6]. Similar associations have been reported in Ashkenazi Jewish, African American, and Han Chinese populations [7,8,9]. Likewise, HHEX is significantly associated with T2DM in Caucasian, Japanese, and Korean populations, and has also been linked to impaired fasting glucose in Chinese Han individuals [10,11,12,13]. The KCNQ1 locus strongly associated with insulin secretion following glucose intake and has been validated as a T2DM-associated gene in the Chinese and Korean populations [11,14,15]. Finally, TCF7L2 is recognized as one of the most robust predictors of T2DM risk, with consistent associations across Japanese and Korean cohorts, including gestational diabetes cases [16,17,18].
Incorporating these genes into polygenic risk scores (PRS) offers a promising tool for stratifying individuals according to their genetic susceptibility to T2DM. By combining the effect sizes of multiple genetic variants into a single risk estimate, the PRS enables the identification of high-risk individuals and supports targeted prevention and personalized treatment strategies. Despite significant progress in understanding the genetic underpinnings of T2DM, an important gap persists in the assessment of the clinical utility of PRS in real-world settings. In particular, whether healthy individuals with high PRS are at a consistent risk of developing T2DM in healthy populations has not been thoroughly explored. This study aimed to address this gap by evaluating the predictive power of the PRS for diabetes onset, bridging the gap between genetic risk prediction and its practical implications for disease prevention and management.
2. Results
2.1. Selection of PRS Model
Overall study design is described in Figure 1. To evaluate polygenic risk score models with different levels of genetic complexity and clinical applicability, we intentionally constructed three complementary PRS models. Model 1 represents a genome-wide approach incorporating a large number of variants to maximize population-level predictive performance. Model 2 reflects an ancestry-specific PRS based on SNPs identified in East Asian genome-wide association studies. Model 3 was designed as a targeted PRS using a limited number of well-established diabetes-associated SNPs, with the aim of assessing its potential utility in a clinically applicable genetic testing setting.
Using the KoGES dataset, three PRS models were developed to evaluate the diabetes risk, and the proportion of diabetes cases for each model is illustrated in Figure 2. The analysis showed that model 3 exhibited a relatively higher proportion of diabetes cases within the ‘Q4 risk quartile’ compared to models 1 and 2. The detailed distribution of diabetes cases across PRS quartile groups in the validation set is presented in Supplementary Table S2. Supplementary Figure S1 provides a comprehensive overview of the population characteristics of the KoGES cohort and the distribution of the PRS scores across the dataset. In Model 3, the targeted HelloGene™ PRS was calculated using beta coefficients estimated from the KoGES training set for the four selected diabetes-associated SNPs.
2.2. Baseline Characteristics of the Study Population
The study population included 258 males (85.4%) and 44 females (14.6%) with a median age of 48 years. The participants were categorized into four PRS risk groups: ‘low risk’ (59/302), ‘moderate risk’ (111/302), ‘high risk’ (70/302), and ‘very high risk’ (61/302). No significant differences were observed in age, sex, or habitual traits (e.g., smoking, alcohol consumption, and exercise habits) among the PRS categories. The follow-up FBG levels were significantly higher in the ‘very high risk’ group compared to that in the other groups (p = 0.001). The initial and follow-up HbA1c levels were also significantly higher in the ‘very high risk’ group compared to that in the other groups (p = 0.002 and 0.001, respectively) (Table 1).
2.3. Impact of HelloGene™ DM Panel to Total Study Population and ‘Current Smoker’ Group
The overall analysis showed that participants in the ‘very high risk’ HelloGene™ DM panel PRS quartile were approximately 4.5 times more likely to have diabetes compared to those in the lowest quartile (Odds ratio [OR] = 4.495, 95% confidence interval [CI]: 1.420–13.133, p = 0.004) (Figure 3A). A subgroup analysis based on the smoking status showed that the association was strongest among ‘current smokers,’ who were four times more likely to have diabetes than non-smokers (OR = 4.023, 95% CI: 1.292–12.524, p = 0.016), whereas the effect was less pronounced in other smoking groups (Figure 3B). Furthermore, when examining the proportion of diabetes cases by smoking habits across the HelloGene™ DM panel PRS quartile groups, approximately 25% of individuals in the ‘very high risk’ group who smoked were identified as having diabetes (Figure 3C).
The CDKAL1 and TCF7L2 genotypes were the most prevalent in moderate- and high-risk groups. Specifically, CDKAL1 was predominant in the ‘high-risk’ group, while TCF7L2 was most common in the ‘moderate risk’ group. Among current smokers, HHEX genotype carriers accounted for 25% of the total population (Supplementary Table S3). When the risk of developing DM was analyzed for each gene, only TCF7L2 showed a statistical significance (p = 0.008). In the smoking subgroup, HHEX showed a borderline association with T2DM risk, which should be interpreted as an exploratory finding.
2.4. Clinical Utility of HelloGene™ Test
When moderate risk was selected for positive cutoff for clinical significance, the accuracy was 26.46% (95% CI: 21.57–31.82%). When high risk was selected for positive cutoff for clinical significance, the accuracy increased to 59.88% (95% CI: 54.12–65.46%). The selection of the very high-risk group as positive cutoff showed the strongest clinical significance: 80.48% accuracy with a 95% CI of 75.5–84.79% (Supplementary Table S4).
3. Discussion
This study evaluated the clinical utility of the HelloGene T2DM panel test in predicting T2DM risk in a real-world healthy population. By integrating PRS with follow-up metabolic and habitual data, we showed that a targeted PRS model based on four key T2DM-associated SNPs was able to stratify individuals into relative risk categories. The very high-risk group exhibited higher levels of metabolic markers and a greater proportion of diabetes cases during follow-up, suggesting relative enrichment of metabolic risk rather than definitive risk prediction.
The clinical utility of the PRS in patients with T2DM has been explored in several studies. In five cohorts encompassing diverse ancestries, T2DM PRS was significantly associated with the future development of T2DM in women with gestational diabetes mellitus [20]. In an Asian population, the integration of genome-wide PRS data with metabolic indicators, such as homeostatic model assessment of beta-cell function and insulin resistance, demonstrated enhanced predictive power, as shown in a study by Kang et al. [21]. Furthermore, the clinical applicability of T2DM PRS was examined in another East Asian cohort that identified its relevance in predicting the progression of glucose tolerance states, from normal glucose regulation to overt diabetes mellitus. We also conducted a more in-depth longitudinal follow-up to evaluate the predictive ability of T2DM in a larger cohort. Notably, individuals with T2DM in the top decile are more likely to be treated with insulin than those in the remaining PRS groups [22].
In our study, participants in the ‘very high-risk group’ exhibited significantly higher follow-up FBG and HbA1c levels. An association between smoking status and PRS scores was also observed, with a higher prevalence of T2DM among HHEX genotype carriers in the smoking subgroup. These findings suggest a potential combined influence of genetic susceptibility and lifestyle factors on disease risk, although formal statistical testing of gene–environment interactions was beyond the scope of the present study. Gene-specific analyses identified TCF7L2 as the most significant predictor of T2DM in the overall population, confirming its well-established role as a robust genetic marker. In the smoking subgroup, HHEX showed a subgroup-specific and borderline association with T2DM risk, which should be interpreted as an exploratory finding. Together, these observations are consistent with prior evidence indicating that lifestyle modifications, such as smoking cessation, may help reduce disease risk in genetically susceptible individuals and warrant further investigation in studies specifically designed to evaluate gene–lifestyle relationships.
Clinical utility analysis showed that applying higher-risk cutoffs improved classification performance, with the ‘very high-risk’ group achieving an accuracy of 80.48% in identifying individuals who developed diabetes during follow-up. These findings reinforce the value of the PRS as a tool for identifying individuals who would benefit most from targeted interventions, such as intensive glucose monitoring or early lifestyle modifications. Importantly, the clinical utility discussed in this study refers to early risk stratification and identification of metabolic vulnerability in a real-world screening context, rather than long-term prediction of diabetes incidence, which would require substantially longer follow-up. However, the moderate accuracy at lower cutoff thresholds highlights the need to integrate the PRS with additional clinical and behavioral risk factors to further enhance risk classification performance.
Despite these promising findings, several limitations should be acknowledged. First, the real-world DONG-A-DM cohort was relatively small and derived from a single tertiary hospital–based health screening program, which may limit generalizability. In addition, the cohort was predominantly male and relatively young, reflecting the demographic characteristics of individuals undergoing voluntary health screening, and caution is therefore warranted when extrapolating these findings to women and older populations. Second, the follow-up duration of approximately three years, while sufficient to capture early metabolic changes and incident diabetes cases, may not fully reflect long-term T2DM risk. Third, important diabetes-related risk factors, including family history, detailed dietary patterns, precise physical activity measures, and socioeconomic status, were not comprehensively captured, and residual confounding cannot be excluded. Finally, subgroup analyses involving lifestyle factors were exploratory in nature and should be interpreted cautiously. These limitations highlight the need for larger and more diverse cohorts, as well as longer follow-up studies integrating comprehensive clinical, behavioral, and socioeconomic data.
Despite these limitations, this study provides meaningful evidence that a targeted 4-SNP PRS can stratify individuals into relative diabetes risk categories in a real-world health screening context. Rather than aiming for definitive risk prediction, the strength of this approach lies in its simplicity, interpretability, and feasibility for early risk stratification and identification of metabolic vulnerability. The observed enrichment of adverse metabolic profiles and diabetes cases in the very high-risk group supports the potential role of simplified PRS models as an initial screening tool to guide more intensive monitoring or lifestyle-focused interventions.
4. Materials and Methods
4.1. Study Design
The study population was divided into two groups: (1) the large-scale Korean cohort dataset from the Korean Genome and Epidemiology Study (KoGES), used to develop and validate the PRS for the HelloGene™ DM panel, and (2) the real-world dataset from Dong-A University Hospital (DONG-A-DM), used to evaluate the risk of diabetes onset and assess the clinical utility of the PRS. A schematic overview of the study methodology is shown in Figure 1. Detailed information on the KoGES samples, including sample size, demographic composition, and collection methods, is available in previous studies [23] and Supplementary Table S1.
4.2. Polygenic Risk Score Algorithm Construction
HelloGene™ Diabetes Mellitus Panel (THERAGEN Health Co., Ltd., Seongnam, Republic of Korea) was used to assess the genetic risk based on four SNPs: CDKAL1 (rs7754840), HHEX (rs1111875), TCF7L2 (rs7903146), and KCNQ1 (rs2237892). PRS calculations were performed using Pruning and Thresholding (P+T) and Clumping and Thresholding (C+T) algorithms in the PLINK software (version 1.9) [19], with additional statistical analyses conducted using R software (version 4.2.2).
The KoGES dataset initially comprised 58,701 participants. However, after excluding individuals with cancer, thyroid disease, or missing data, the final sample size was 39,605 participants. Of these, 70% (30,465 participants) were allocated to the training set for the PRS model development, while 30% (9140 participants) were designated as the validation set. These exclusions were applied to minimize potential confounding, as cancer and thyroid disease—and their related treatments—can substantially influence glucose metabolism and insulin sensitivity, thereby obscuring the association between polygenic risk scores and type 2 diabetes risk.
Three PRS models were constructed for comparison: model 1 (genome-wide PRS based on 7,891,819 SNPs from GWAS summary statistics), model 2 (East Asian PRS derived from 2723 SNPs identified in East Asian GWAS) and model 3 (targeted PRS using four SNPs identified by THERAGEN Health). Linkage disequilibrium (LD) clumping was used to select independent SNPs, resulting in 232, 30, and 4 SNPs for models 1, 2, and 3, respectively. The PRS models were validated by applying them to the KoGES validation set, calculating individual PRS scores, and stratifying the participants into quartiles (Q1: low risk, Q2: moderate risk, Q3: high risk, and Q4: very high risk). The proportion of diabetes cases within each quartile was used to evaluate the predictive accuracy (Figure 2).
For each PRS model, individual PRS values were calculated as the weighted sum of risk alleles. The beta coefficients (β) used for polygenic risk score calculation represent log-odds effect sizes estimated from the KoGES training dataset (n = 30,465). These coefficients were derived using logistic regression models implemented in PLINK, with type 2 diabetes status as the outcome variable.
4.3. Study Subjects
The DONG-A-DM cohort was established as a retrospective observational cohort based on a routine health screening program at Dong-A University Hospital. The DONG-A-DM study population included 302 healthy individuals who underwent health examinations at the Dong-A University Hospital Health Examination Center between 2016 and 2021 and provided informed consent for genetic testing. As part of the routine health examination process at a tertiary university hospital, participants had previously provided informed consent for the use of their clinical, laboratory, and genetic data for research purposes. Although this consent was not restricted to diabetes-specific outcomes, all participants included in the present analysis were free of diabetes at baseline and were followed longitudinally to assess incident diabetes during follow-up. Participants returned for follow-up examinations 3 years after their baseline assessments, allowing for a longitudinal analysis of diabetes risk. Incident type 2 diabetes during follow-up was defined according to standard diagnostic criteria, including fasting plasma glucose ≥ 126 mg/dL, HbA1c ≥ 6.5%, physician-diagnosed diabetes, or initiation of antidiabetic medication, consistent with the American Diabetes Association and World Health Organization guidelines. Ethical approval was obtained from the Institutional Review Board (IRB) of Dong-A University (IRB No: DAUHIRB-EXP-24-212), and all participants provided written informed consent.
4.4. Clinical and Genomic Data Collection
Demographic and lifestyle data, including age, sex, smoking status, alcohol consumption, and exercise habits, were collected using structured questionnaires. The body mass index (BMI) was calculated using height and weight. Fasting blood glucose (FBG) levels were measured using the enzymatic UV method (AU5800, Beckman Coulter Inc., Brea, CA, USA), and glycated hemoglobin (HbA1c) levels were assessed using ion-exchange high-performance liquid chromatography (Tosoh HLC-723 G11 analyzer, Tosoh Corp., Tokyo, Japan).
Blood samples (3 mL) were collected in EDTA tubes for genetic testing. DNA was extracted from whole blood using the Exgene™ Blood Extraction SV kit (GeneAll, Seoul, Republic of Korea). Prior to genotyping, DNA concentration and purity were assessed using a Tecan F200 microplate reader (Tecan, Männedorf, Switzerland) to ensure quality. Genotyping was performed using the TaqMan assay on the OpenArray platform, adhering to the manufacturer’s protocol for the QuantStudio™ 12 K Flex Accufill System (Thermo Fisher Scientific, Waltham, MA, USA).
Genetic quality control procedures were applied prior to PRS construction. Samples with a genotyping call rate below 95% were excluded, and SNPs with call rates below 95% were removed. Hardy–Weinberg equilibrium testing was performed in the KoGES training dataset, and variants with significant deviation from equilibrium (p < 1 × 10^−6^) were excluded. These quality control steps were implemented to ensure robust and reliable genotype data for subsequent PRS analyses.
4.5. Statistical Analyses
Genetic data were coded 0 (non-risk allele homozygote), 1 (heterozygote), or 2 (risk allele homozygote). PRS was calculated as the weighted sum of risk alleles: [24], where N represents the number of SNPs included in the score, βi is the log-odds effect size of variant i estimated from the KoGES training dataset, and dosage_ij_ is the number of risk alleles for variant i in individual j. PRS values were categorized into quartiles (Q1–Q4) based on their distribution in the KoGES dataset. The associations between PRS quartiles and diabetes risk were analyzed using ANOVA and logistic regression, adjusted for age, sex, smoking, alcohol consumption, and exercise. Interaction analyses stratified participants according to smoking status, alcohol consumption, and exercise status habit. Interaction and subgroup analyses stratified participants according to lifestyle factors were performed to explore potential effect modification.
A post hoc power analysis was conducted to evaluate whether the available sample size provided sufficient power to detect clinically meaningful differences in diabetes risk across PRS categories. Based on the observed incidence of diabetes, the study had adequate power to detect differences between the highest and lowest PRS quartiles. Primary analyses evaluating the association between PRS risk groups and diabetes outcomes were prespecified, whereas subgroup analyses according to lifestyle factors were considered exploratory. Accordingly, results from subgroup analyses were interpreted cautiously, with emphasis placed on effect size estimates and consistency of associations rather than strict multiple testing correction.
Statistical analyses were performed using SPSS version 18, with significance set at p < 0.05.
5. Conclusions
This study evaluated the performance of the HelloGene™ DM panel test for stratifying genetic risk of type 2 diabetes mellitus in a real-world health screening cohort. Using a targeted polygenic risk score based on four T2DM-associated SNPs, individuals were categorized into relative risk groups. Participants in the very high-risk group showed a higher prevalence of diabetes and less favorable metabolic profiles during follow-up, suggesting an enrichment of metabolic vulnerability. Overall, these findings indicate the feasibility of a simplified, disease-specific PRS approach for early risk stratification in routine screening settings. Although further large-scale and prospective studies are warranted to confirm long-term predictive value and broader clinical applicability, the present results provide preliminary evidence supporting the potential role of targeted PRS models in diabetes risk assessment.
The reference list from the paper itself. Each links out to its DOI / PubMed record.
- 1Kahn S.E. Cooper M.E. Del Prato S. Pathophysiology and treatment of type 2 diabetes: Perspectives on the past, present, and future Lancet 20143831068108310.1016/S 0140-6736(13)62154-624315620 PMC 4226760 · doi ↗ · pubmed ↗
- 2Ha K.H. Kim D.J. Changes in the epidemiological landscape of diabetes in South Korea: Trends in prevalence, incidence, and healthcare expenditures Endocrinol. Metab.20243966967710.3803/En M.2024.207339322188 PMC 11525706 · doi ↗ · pubmed ↗
- 3Genome-wide association study of 14,000 cases of seven common diseases and 3000 shared controls Nature 200744766167810.1038/nature 0591117554300 PMC 2719288 · doi ↗ · pubmed ↗
- 4Horikawa Y. Miyake K. Yasuda K. Enya M. Hirota Y. Yamagata K. Hinokio Y. Oka Y. Iwasaki N. Iwamoto Y. Replication of genome-wide association studies of type 2 diabetes susceptibility in Japan J. Clin. Endocrinol. Metab.2008933136314110.1210/jc.2008-045218477659 · doi ↗ · pubmed ↗
- 5Kirchhoff K. Machicao F. Haupt A. Schäfer S.A. Tschritter O. Staiger H. Stefan N. Häring H.U. Fritsche A. Polymorphisms in the TCF 7L 2, CDKAL 1 and SLC 30A 8 genes are associated with impaired proinsulin conversion Diabetologia 20085159760110.1007/s 00125-008-0926-y 18264689 · doi ↗ · pubmed ↗
- 6StancákováA. Pihlajamäki J. Kuusisto J. Stefan N. Fritsche A. Häring H. Andreozzi F. Succurro E. Sesti G. Boesgaard T.W. Single-nucleotide polymorphism rs 7754840 of CDKAL 1 is associated with impaired insulin secretion in nondiabetic offspring of type 2 diabetic subjects and in a large sample of men with normal glucose tolerance J. Clin. Endocrinol. Metab.2008931924193010.1210/jc.2007-221818285412 · doi ↗ · pubmed ↗
- 7Bao X.Y. Peng B. Yang M.S. Replication study of novel risk variants in six genes with type 2 diabetes and related quantitative traits in the Han Chinese lean individuals Mol. Biol. Rep.2012392447245410.1007/s 11033-011-0995-821643948 · doi ↗ · pubmed ↗
- 8Bronstein M. PisantéA. Yakir B. Darvasi A. Type 2 diabetes susceptibility loci in the Ashkenazi Jewish population Hum. Genet.200812410110410.1007/s 00439-008-0520-x 18516622 · doi ↗ · pubmed ↗
