# Navigating extreme class imbalance in suicide risk prediction

**Authors:** Christopher Kitchen, Anas Belouali, Paul S. Nestadt, Holly C. Wilcox, Hadi Kharrazi

PMC · DOI: 10.3389/fpsyt.2025.1679618 · Frontiers in Psychiatry · 2026-01-12

## TL;DR

This study examines how class imbalance and other factors affect the performance of suicide risk prediction models, showing that AUPRC improves with realistic training conditions.

## Contribution

The paper demonstrates how realistic training conditions and clinical cohorts improve AUPRC in suicide risk prediction models.

## Key findings

- AUPRC increased with greater sample imbalance for training or outcome horizon.
- AUPRC was significantly higher for patients in emergency rooms or inpatient settings.
- Performance was worse for patients under 18 years old.

## Abstract

The implementation of suicide risk models is challenging because the conditions in which they are developed often do not reflect those in which they are being used. The setting of an arbitrary classification threshold limits the interpretability of predictions, and their associated performance statistics. This work endeavors to explore different class imbalance ratios, across training sample compositions, time horizons and patient characteristics to understand how degree of imbalance affects the associated performance of regression-based predictive models of suicide.

The study population included 1,649,577 patients who were selected from the Maryland Suicide Data Warehouse (MSDW) between 2016 and 2020. The MSDW contains clinical and demographic features derived from claims (Maryland Health Care Commission, MHCC)and hospital discharge records Health Services Cost Review Commission (HSCRC), for decedents and living patients within the state of Maryland. Suicide death was our primary outcome of interest in a cross validated framework stratified by sources of data in the MSDW.

Cross validated AUROC was not found to vary consistently with respect to training sample imbalance nor time horizon, but both were found to have a direct association with AUPRC. Indeed, AUPRC increased with greater sample imbalance for training or outcome horizon (AUPRC 0.246; 0.246; 0.593 for all decedents, HSCRC, and MHCC respectively). Stratified samples revealed no significant cross validated performance than the overall sample for AUROC (0.832; 0.913; 0.927, for decedents, HSCRC and MHCC). However, AUPRC was significantly greater when limiting our HSCRC and MHCC samples to patients seen in the emergency room (AUPRC 0.417; 0.782) or in the inpatient settings (0.371; 0.773), or patients who had ICD-10-CM coded social needs (0.479, HSCRC only). Performance was significantly worse when restricting samples to patients aged less than 18 years (AUPRC 0.036; 0.208, HSCRC and MHCC respectively).

A low precision for estimated suicide risk can be understood as a consequence of some tradeoffs during model development, particularly training models with matched cases, balanced classes or within short time horizons. This work demonstrates the improved AUPRC performance of regression models in a cross validated framework when these conditions are made more realistic, in the context of class imbalance or less restrictive in that of time horizon. Additionally, we illustrate using the same data that training models in certain clinical cohorts (e.g., defined by age, care utilization and social need) can lead to robustly different estimates for precision and recall, but not AUROC.

## Full-text entities

- **Diseases:** Suicide death (MESH:D003643)
- **Species:** Homo sapiens (human, species) [taxon 9606]

## Full text

_Full body text omitted from this summary view._ Fetch the complete paper as Markdown: https://tomesphere.com/paper/PMC12833411/full.md

## Figures

2 figures with captions in the complete paper: https://tomesphere.com/paper/PMC12833411/full.md

## References

50 references — full list in the complete paper: https://tomesphere.com/paper/PMC12833411/full.md

---
Source: https://tomesphere.com/paper/PMC12833411