# A supervised machine learning approach with feature selection for sex-specific biomarker prediction

**Authors:** Luke Meyer, Danielle Mulder, Joshua Wallace

PMC · DOI: 10.1038/s41540-025-00523-z · NPJ Systems Biology and Applications · 2025-07-01

## TL;DR

This study shows that using sex-specific data improves machine learning predictions of clinical biomarkers, reducing errors compared to combined data.

## Contribution

The study introduces a supervised ML approach with feature selection for sex-specific biomarker prediction.

## Key findings

- The model predicted nine biomarkers with 5–10% error, showing better performance for males in some cases.
- Stratifying data by sex improved model accuracy compared to using combined data with or without sex as a feature.

## Abstract

Biomarkers are crucial in aiding in disease diagnosis, prognosis, and treatment selection. Machine learning (ML) has emerged as an effective tool for identifying novel biomarkers and enhancing predictive modelling. However, sex-based bias in ML algorithms remains a concern. This study developed a supervised ML model to predict nine common clinical biomarkers, including triglycerides, BMI, waist circumference, systolic blood pressure, blood glucose, uric acid, urinary albumin-to-creatinine ratio, high-density lipoproteins, and albuminuria. The model’s predictions were within 5–10% error of actual values. For predictions within 10% error, the top performing models were waist circumference, albuminuria, BMI, blood glucose and systolic blood pressure, with males scoring higher than females, followed by the combined data set containing sex as an input feature and the combined data without sex as an input feature performing the poorest. This study highlighted the benefits of stratifying data according to sex for ML based models.

## Full-text entities

- **Genes:** ALB (albumin) [NCBI Gene 213] {aka FDAHT, HSA, PRO0883, PRO0903, PRO1341}
- **Diseases:** albuminuria (MESH:D000419)
- **Chemicals:** creatinine (MESH:D003404), blood glucose (MESH:D001786), uric acid (MESH:D014527), triglycerides (MESH:D014280)

## Full text

_Full body text omitted from this summary view._ Fetch the complete paper as Markdown: https://tomesphere.com/paper/PMC12219308/full.md

## Figures

11 figures with captions in the complete paper: https://tomesphere.com/paper/PMC12219308/full.md

## References

14 references — full list in the complete paper: https://tomesphere.com/paper/PMC12219308/full.md

---
Source: https://tomesphere.com/paper/PMC12219308