# Machine learning-based prediction of knee pain risk using lipid metabolism biomarkers: a prospective cohort study from CHARLS

**Authors:** Biao Guo, Yuan Li, Weihang Peng, Yabin Liu, Fei He, Zhe Zhai

PMC · DOI: 10.3389/fphys.2025.1607276 · Frontiers in Physiology · 2025-06-25

## TL;DR

This study uses machine learning and lipid metabolism biomarkers to predict knee pain risk in older adults, finding that composite metabolic indices are more effective than single markers.

## Contribution

The study introduces composite lipid metabolism indices and interpretable machine learning to improve knee pain risk prediction in aging populations.

## Key findings

- Composite metabolic indices like LAP, TyG, and TyG-BMI outperformed traditional lipid markers in predicting knee pain risk.
- The Stacked Ensemble model achieved an AUC of 0.85 and a Brier score of 0.13, showing strong predictive performance.
- SHAP analysis identified LAP and TyG-related indices as the most influential predictors of knee pain risk.

## Abstract

Knee pain significantly impairs health and quality of life among middle-aged and older adults. However, the predictive utility of lipid metabolism biomarkers for knee pain risk remains inadequately explored.

This study utilized data from the China Health and Retirement Longitudinal Study (CHARLS, 2011–2013) to investigate the association between lipid-related metabolic indicators and the risk of knee pain. Multiple lipid biomarkers and composite indices—including the lipid accumulation product (LAP), triglyceride-glucose (TyG) index, and TyG-BMI—were incorporated. Five machine learning models were developed and evaluated for predictive performance. Model interpretation was conducted using SHAP (SHapley Additive exPlanations) to identify the most influential predictors.

A higher prevalence of knee pain was observed in high-altitude, cold regions such as Qinghai and Sichuan provinces. Composite metabolic indices (LAP, TyG, and TyG-BMI) exhibited stronger predictive power than traditional single lipid markers. Among the models, the Stacked Ensemble algorithm achieved the best performance, with an AUC of 0.85 and a Brier score of 0.13. SHAP analysis highlighted LAP and TyG-related indices as the top contributors to prediction outcomes.

These findings emphasize the importance of lipid metabolism indicators in the early identification of knee pain risk. The integration of interpretable machine learning approaches and composite metabolic indices offers a promising strategy for personalized prevention in aging populations.

## Full-text entities

- **Diseases:** Knee pain (MESH:D046788)
- **Chemicals:** glucose (MESH:D005947), TyG (-), triglyceride (MESH:D014280), lipid (MESH:D008055)

## Full text

_Full body text omitted from this summary view._ Fetch the complete paper as Markdown: https://tomesphere.com/paper/PMC12239094/full.md

## Figures

8 figures with captions in the complete paper: https://tomesphere.com/paper/PMC12239094/full.md

## References

47 references — full list in the complete paper: https://tomesphere.com/paper/PMC12239094/full.md

---
Source: https://tomesphere.com/paper/PMC12239094