# Development of a k-Nearest Neighbors Model for the Prediction of Late-Onset Alzheimer’s Risk by Combining Polygenic Risk Scores and Phenotypic Variables

**Authors:** Sandra Ferreiro López, Rosana Ferrero, Jorge Blom-Dahl, Marta Alonso-Bernáldez, Adán González, Guillermo Pérez-Solero, Jair Tenorio-Castano

PMC · DOI: 10.3390/genes16040377 · 2025-03-26

## TL;DR

This paper introduces a new k-nearest neighbors model that combines genetic and clinical data to predict the risk of late-onset Alzheimer’s disease with improved accuracy.

## Contribution

The novel contribution is the integration of polygenic risk scores and phenotypic variables in a KNN model, achieving better performance than previous models.

## Key findings

- The model achieved a sensitivity of 0.80 and an AUC of 0.71 for predicting LOAD risk.
- Polygenic genetic risk, APOE haplotype, and age were the most influential factors in the model’s predictions.
- The model outperformed a previous 2019 KNN model with a higher AUC score.

## Abstract

Introduction: Alzheimer’s disease (AD), and more specifically late-onset Alzheimer’s disease (LOAD), represents a considerable challenge in terms of early and timely diagnosis and treatment. Early diagnosis is crucial to improve the efficacy of the therapies and patients’ quality of life. The current challenge is to accurately identify at-risk individuals before the manifestations of the first symptoms of AD. Methods and results: Here, we present an improved model for LOAD risk prediction, which applies the k-nearest neighbors (KNN) algorithm. We have achieved a sensitivity of 0.80 and an area under the curve (AUC) of 0.71, which represents a high performance especially when compared to an AUC of 0.66 reported previously in 2019 using a KNN model. Discussion: The application of a mathematical model that combines genetic and clinical covariates showed a good prediction of the AD/LOAD risk, with the higher weight being the polygenic genetic risk, APOE haplotype, and age. Compared to previous studies, our model integrates and correlates genetic prediction together with phenotypic information by fine-tuning the parameters of the model in order to achieve the best performance. This algorithm can be used in the general population and does not require the manifestation of any symptoms for its effective application. Thus, we present here an advanced model for risk prediction of LOAD.

## Linked entities

- **Diseases:** Alzheimer’s disease (MONDO:0004975)

## Full-text entities

- **Genes:** APOE (apolipoprotein E) [NCBI Gene 348] {aka AD2, APO-E, ApoE4, LDLCQ5, LPG}
- **Diseases:** AD (MESH:D000544)
- **Species:** Homo sapiens (human, species) [taxon 9606]

## Figures

4 figures with captions in the complete paper: https://tomesphere.com/paper/PMC12027161/full.md

---
Source: https://tomesphere.com/paper/PMC12027161