# Regularized regression outperforms trees for predicting cognitive function in the Health and Retirement Study

**Authors:** Kyle Masato Ishikawa, Deborah Taira, Joseph Keaweʻaimoku Kaholokula, Matthew Uechi, James Davis, Eunjung Lim

PMC · DOI: 10.1016/j.mlwa.2025.100694 · Machine learning with applications · 2025-11-27

## TL;DR

This study found that regularized regression models perform better than tree-based models in predicting cognitive function, offering a good balance between accuracy and interpretability.

## Contribution

The study demonstrates that regularized regression outperforms tree-based models in predicting cognitive outcomes while maintaining interpretability.

## Key findings

- Elastic net regression had the best performance with RMSE = 3.520 and R2 = 0.435.
- Baseline cognitive function and computer use frequency were the most influential predictors.
- Regularized regression models provided better interpretability and predictive performance than tree-based models.

## Abstract

Generalized linear models have been favored in healthcare research due to their interpretability. In contrast, tree-based models, such as random forest or boosted trees, are often preferred in machine learning (ML) and commercial settings due to their strong predictive performance. However, for clinical applications, model interpretability remains essential for actionable results and patient understanding. This study used ML to detect cognitive decline for the purpose of timely screening and uncovering associations with psychosocial determinants. All models were interpreted to enhance transparency and understanding of their predictions.

Data from the 2018 to 2020 Health and Retirement Study was used to create three linear regression models and three tree-based models. Ten percent of the sample was withheld for estimating performance, and model tuning used five-fold cross validation with two repeats. Survey frequency weights were applied during tuning, training, and final evaluation. Model performance was evaluated using RMSE and R2 and interpretability was assessed via coefficients, variable importance, and decision trees.

The elastic net model had the best performance (RMSE = 3.520, R2 = 0.435), followed by standard linear regression, boosted trees, random forest, multivariate adaptive regression splines, and lastly, decision trees. Across all models, baseline cognitive function and frequency of computer use were the most influential predictors.

Elastic net regression outperformed tree-based models, suggesting that cognitive outcomes may be best modeled with additive linear relationships. Its ability to remove correlated and weak predictors contributed to its balance of interpretability and predictive performance for this particular dataset.

## Full-text entities

- **Diseases:** cognitive decline (MESH:D003072)
- **Species:** Homo sapiens (human, species) [taxon 9606]

## Full text

_Full body text omitted from this summary view._ Fetch the complete paper as Markdown: https://tomesphere.com/paper/PMC12652623/full.md

## Figures

5 figures with captions in the complete paper: https://tomesphere.com/paper/PMC12652623/full.md

## References

30 references — full list in the complete paper: https://tomesphere.com/paper/PMC12652623/full.md

---
Source: https://tomesphere.com/paper/PMC12652623