# New composite phenotypes enhance chronic kidney disease classification and genetic associations

**Authors:** Kim Ngan Tran, Heidi G. Sutherland, Andrew J. Mallett, Lyn R. Griffiths, Rodney A. Lea, David Buchner, David Buchner, David Buchner, David Buchner

PMC · DOI: 10.1371/journal.pgen.1011718 · PLOS Genetics · 2025-05-23

## TL;DR

A new multi-phenotype approach improves CKD classification and reveals genetic insights beyond traditional biomarkers.

## Contribution

A novel combinatorial PCA method generates composite phenotypes that enhance CKD classification and genetic discovery.

## Key findings

- A composite phenotype combining seven biomarkers achieved an AUC of 0.878 for CKD classification, outperforming eGFR alone.
- The SH2B3 locus was uniquely identified in composite phenotypes and showed strong colocalization with eGFR.
- Over 50,000 composite phenotypes demonstrated higher classification power than individual CKD biomarkers.

## Abstract

Chronic kidney disease (CKD) is a multifactorial condition driven by diverse etiologies that lead to a gradual loss of kidney function. Although genome-wide association studies (GWAS) have identified numerous genetic loci linked to CKD, a large portion of its genetic basis remains unexplained. This knowledge gap may partly arise from the reliance on single biomarkers, such as estimated glomerular filtration rate (eGFR), to assess kidney function. To address this limitation, we developed and applied a novel multi-phenotype approach, combinatorial Principal Component Analysis (cPCA), to better understand the complex genetic architecture of CKD. Using UK Biobank dataset (n = 337,112), we analyzed 21 CKD-related phenotypes, generating over 2 million composite phenotypes (CPs) through cPCA. Nearly 50,000 of these CPs demonstrated significantly higher classification power for clinical CKD compared to individual biomarkers. The top-ranked CP—a combination of albumin, cystatin C, eGFR, gamma-glutamyltransferase, HbA1c, low-density lipoprotein, and microalbuminuria, achieved an AUC of 0.878 (95% CI: 0.873–0.882), significantly outperforming eGFR alone (AUC: 0.830, 95% CI: 0.825–0.835). Genetic association analysis of the ~ 50,000 high-performing CPs identified all major eGFR-associated loci, except for the SH2B3 locus rs3184504, a loss-of-function variant, which was uniquely identified in CPs (p = 3.1×10-56) but not in eGFR within the same sample size. In addition, SH2B3 locus showed strong evidence of colocalization with eGFR, supporting its role in kidney function. These results highlight the power of the multi-phenotype cPCA approach in understanding the genetic basis of CKD, with potential applications to other complex diseases.

Chronic kidney disease (CKD) can result from diverse underlying causes, such as diabetes, high blood pressure, infections, and lifestyle factors. However, most CKD studies rely on single measurements, such as estimated glomerular filtration rate (eGFR), which assesses kidney filtration but may not fully capture the complexity of the disease. Here, we applied a novel approach to explore CKD from a broader perspective. Using a large dataset of over 300,000 individuals, we combined 21 kidney-related health measures into millions of new composite traits, providing a more comprehensive view of kidney function. One of these composite traits resulted from a combination of albumin, cystatin C, eGFR, gamma-glutamyltransferase, HbA1c, low-density lipoprotein, and microalbuminuria, proved to be significantly more effective at identifying CKD than any single measurement. Additionally, we identified key genetic factors associated with CKD, including the SH2B3 gene. By integrating multiple measurements, our work offers a clearer understanding of the genetic basis of CKD and paves the way for similar approaches to unravel other complex diseases, ultimately aiding in their prevention and treatment.

## Linked entities

- **Genes:** SH2B3 (SH2B adaptor protein 3) [NCBI Gene 10019]
- **Diseases:** chronic kidney disease (MONDO:0005300), diabetes (MONDO:0005015)

## Full-text entities

- **Genes:** CP (ceruloplasmin) [NCBI Gene 1356] {aka AB073614, CP-2}, CST3 (cystatin C) [NCBI Gene 1471] {aka ADLDWA, ARMD11, HEL-S-2}, GGT1 (gamma-glutamyltransferase 1) [NCBI Gene 2678] {aka CD224, D22S672, D22S732, GGT, GGT 1, GGTD}, ALB (albumin) [NCBI Gene 213] {aka FDAHT, HSA, PRO0883, PRO0903, PRO1341}, SH2B3 (SH2B adaptor protein 3) [NCBI Gene 10019] {aka IDDM20, LNK}
- **Diseases:** loss of kidney function (MESH:D007680), CKD (MESH:D051436)
- **Mutations:** AUC of 0, rs3184504

## Full text

_Full body text omitted from this summary view._ Fetch the complete paper as Markdown: https://tomesphere.com/paper/PMC12133187/full.md

## Figures

4 figures with captions in the complete paper: https://tomesphere.com/paper/PMC12133187/full.md

## References

51 references — full list in the complete paper: https://tomesphere.com/paper/PMC12133187/full.md

---
Source: https://tomesphere.com/paper/PMC12133187