# Harnessing Clinical and Biochemical Data for Personalized Cardiovascular Risk Prediction: a Machine Learning Approach Toward Precision Nutrition

**Authors:** Joyeta Ghosh, Tinni Chaudhuri, Jose Arturo Molina Mora, Jyoti Taneja, Ravi Kant

PMC · DOI: 10.1016/j.tjnut.2026.101363 · 2026-01-13

## TL;DR

This study uses machine learning to predict cardiovascular disease risk in rural elderly women in India, identifying key health indicators and showing high accuracy with models like Random Forest and XGBoost.

## Contribution

The novel use of interpretable machine learning models to predict CVD risk in rural postmenopausal women using clinical and biochemical data.

## Key findings

- Random Forest and XGBoost models achieved high accuracy (98.91%) and AUC (0.998) in predicting CVD risk.
- Waist circumference, blood pressure, and fasting glucose were identified as the strongest predictors of elevated CVD risk.
- The study demonstrates the feasibility of AI-driven tools for low-cost, early CVD risk detection in resource-limited settings.

## Abstract

Cardiovascular disease (CVD) is a leading cause of morbidity and mortality among postmenopausal women in rural India, where healthcare resources remain limited.

This study aimed to leverage artificial intelligence (AI) and machine learning (ML) approaches to predict CVD risk in rural elderly women, identify key clinical predictors, and assess model performance using interpretable AI tools.

This observational cross-sectional study was conducted in Singur Block (West Bengal) and Amdanga Block (North 24 Parganas District) between March 2014 and August 2018. Data from 458 rural postmenopausal women were analyzed. The outcome variable was the presence or absence of elevated cardiovascular disease risk, defined using composite International Diabetes Federation and American Heart Association criteria. Predictors included waist circumference, blood pressure, fasting blood glucose, HDL cholesterol, triglycerides, and vitamin D concentrations. Seven ML models [Random Forest, Gradient Boosting, Ensemble (Voting Classifier), Extra Trees, Support Vector Machine, Neural Network, and Logistic Regression] were developed and compared. Model evaluation employed 5-fold cross-validation with metrics including accuracy, AUC, precision, recall, and F1 score.

Among the 458 participants, 171 (37.3%) exhibited elevated CVD risk. The Random Forest model achieved an accuracy of 98.91% (95% CI: 97.8%, 99.6%), whereas eXtreme Gradient Boosting (XGBoost) demonstrated comparable performance with an AUC of 0.998 (95% CI: 0.993, 1.000), precision of 97.2%, and recall of 98.3%. Feature-importance analysis revealed waist circumference, blood pressure, and fasting glucose as the strongest predictors, with HDL cholesterol and vitamin D contributing modestly but significantly.

ML models—particularly Random Forest and XGBoost—demonstrated high accuracy and interpretability in predicting CVD risk among rural postmenopausal women. These findings highlight the potential of AI-driven, low-cost predictive tools for early CVD risk detection and personalized preventive healthcare in resource-limited rural settings.

## Linked entities

- **Diseases:** cardiovascular disease (MONDO:0004995)

## Full-text entities

- **Diseases:** CVD (MESH:D002318), Diabetes (MESH:D003920)
- **Chemicals:** vitamin D (MESH:D014807), triglycerides (MESH:D014280), glucose (MESH:D005947)
- **Species:** Homo sapiens (human, species) [taxon 9606]

## Figures

8 figures with captions in the complete paper: https://tomesphere.com/paper/PMC13014513/full.md

---
Source: https://tomesphere.com/paper/PMC13014513