# Considerations for evaluating the practical utility of machine learning in suicide risk estimation: the role of cost and equity

**Authors:** Christopher Kitchen, Anas Belouali, Paul S Nestadt, Holly C Wilcox, Hadi Kharrazi

PMC · DOI: 10.21203/rs.3.rs-8216032/v1 · Research Square · 2025-12-30

## TL;DR

This paper explores how machine learning can improve suicide risk prediction by focusing on precision and fairness in clinical settings.

## Contribution

The study introduces AUPRC-maxima optimization for suicide risk prediction using machine learning models.

## Key findings

- XGBoost achieved high precision in predicting suicide risk from hospital discharge and claims records.
- Different ML models perform better depending on whether precision or sensitivity is prioritized.
- No algorithmic bias was found by age, sex, or race, but performance varied with clinical characteristics.

## Abstract

A key vulnerability in modeling suicide death is a lack of precision and therefore estimates are thought as ultimately unhelpful to clinicians, even with more advanced or nuanced machine learning (ML) techniques. We sought to fill several conceptual gaps by assessing performance, focusing on the precision-recall tradeoff, across multiple techniques, and with ad hoc contextualization for sensitivity, cost-balance, and fairness. To identify robust, differential performances of a cross section of ML techniques on a suicide risk task, emphasizing overall AUPRC maximization and downstream effects on hypothetical decision support. A retrospective cohort was selected for patients receiving care or having died per the Office of the Medical Examiner (OCME), between 2017 and 2020 using the Maryland Suicide Datawarehouse (MSDW). AUPRC-optimized settings yielded cross-validated AUPRC significantly improved over logistic regressions, especially for XGBoost in both hospital discharge (AUPRC: 0.667; PPV: 0.941) and commercial claims records (AUPRC: 0.558; PPV: 0.857). F-Beta statistics revealed that when precision is preferred (e.g., 99.9 percentile), XGBoost are among the most efficient tools, while random forest and MLP are better when sensitivity is preferred (90 percentile or lower). No algorithmic bias was identified by age, sex or race, but significant changes in performance are noted with certain clinical characteristics. To our knowledge, this is the first use of an AUPRC-maxima optimization for ML tools with predicting suicide death. The utility of suicide risk models in clinical decision support is discussed as being tied to innate class imbalance challenges in model training, with recommendations being provided on how to better evaluate performance.

## Full-text entities

- **Diseases:** died (MESH:D003643)
- **Species:** Homo sapiens (human, species) [taxon 9606]

## Full text

_Full body text omitted from this summary view._ Fetch the complete paper as Markdown: https://tomesphere.com/paper/PMC12772703/full.md

## Figures

2 figures with captions in the complete paper: https://tomesphere.com/paper/PMC12772703/full.md

## References

56 references — full list in the complete paper: https://tomesphere.com/paper/PMC12772703/full.md

---
Source: https://tomesphere.com/paper/PMC12772703