# Machine learning models for risk prediction of age-related macular degeneration in Fujian eye study

**Authors:** Yang Li, Bin Wang, Xiangdong Luo, Mingqin Zhang, Qinrui Hu, Xiaoxin Li

PMC · DOI: 10.1371/journal.pone.0335620 · 2025-11-04

## TL;DR

This study uses machine learning to predict age-related macular degeneration risk, finding that logistic regression is the most accurate model with educational background as a key factor.

## Contribution

The study identifies logistic regression as the most effective model for AMD risk prediction and highlights educational background as the most influential factor.

## Key findings

- Logistic regression achieved the highest balanced accuracy of 0.6364 for AMD risk prediction.
- Educational background was the most influential feature with an average SHAP value of 0.8199.
- Outdoor time and left eye spherical equivalent were also significant predictive factors.

## Abstract

Age-related macular degeneration (AMD) is a retinal disorder that significantly impairs vision. This study investigates various machine learning models for predicting AMD risk, laying the groundwork for further research using big data and determining the most effective predictive model.

Utilizing data from 8211 records with 39 features from the Fujian Eye Study, a cross-sectional epidemiological investigation, several machine learning models were developed and assessed. The models included logistic regression (LR), K-nearest neighbors (KNN), support vector machine (SVM), decision tree (DT), random forest (RF), light gradient boosting machine (LightGBM), and extreme gradient boosting (XGBoost). Data preprocessing, feature selection, and model training were all key components of the study.

After evaluating multiple models, the logistic regression model emerged as the most accurate, achieving a balanced accuracy of 0.6364. Among the predictive features, educational background had the highest influence on the model’s predictions, with an average SHAP (SHapley Additive exPlanations) value of 0.8199. Other significant factors included outdoor time and left eye spherical equivalent (OSSE), with SHAP values of 0.6474 and 0.6377, respectively.

This study confirms that logistic regression is the most effective machine learning model for predicting AMD risk, with educational background identified as the most critical risk factor.

## Linked entities

- **Diseases:** age-related macular degeneration (MONDO:0005150)

## Full-text entities

- **Diseases:** retinal disorder (MESH:D012173), AMD (MESH:D008268)

## Figures

6 figures with captions in the complete paper: https://tomesphere.com/paper/PMC12585047/full.md

---
Source: https://tomesphere.com/paper/PMC12585047