# Identifying past-year self-reported suicidality in outpatients with somatic symptom disorder using an interpretable machine-learning model: a multicenter study with an online calculator

**Authors:** Xing Wang, Shuixiu Lai, Peng Wang, Yibo Li, Yunhui Zhong, Tieshi Zhu

PMC · DOI: 10.1186/s12888-026-07901-9 · BMC Psychiatry · 2026-02-18

## TL;DR

This study developed a machine-learning model to identify SSD outpatients at risk of recent suicidality, using data from 899 patients and showing strong predictive performance.

## Contribution

An interpretable machine-learning model for suicidality risk in SSD outpatients, with a web-based calculator for clinical use.

## Key findings

- The random forest model achieved high accuracy (AUC 0.978) in predicting past-year suicidality in SSD patients.
- Insomnia severity was the top predictor, followed by mindfulness and neuropsychological status.
- The model showed good calibration and clinical benefit across decision thresholds.

## Abstract

Somatic symptom disorder (SSD) is associated with an elevated risk of suicidality. However, clinically implementable tools to identify outpatients with SSD who may warrant prioritized suicidality assessment remain limited. We therefore aimed to develop an interpretable model using routinely available outpatient data to stratify the likelihood of past-year self-reported suicidality.

We analyzed a multicenter cross-sectional registry from 3 hospitals in Ganzhou including adults aged 18–60 years with DSM-5–defined SSD. Past-year self-reported suicidality was assessed using a prespecified binary (yes/no) item. Data were split 70/30 into training/test sets. Candidate predictors were selected by the intersection of least absolute shrinkage and selection operator and Boruta. Eight algorithms were trained with repeated 5-fold cross-validation and compared primarily by area under the receiver operating characteristic curve (AUC) and Brier score; the top model underwent calibration and decision-curve analysis. Shapley additive explanations (SHAP) provided model explanations; a Shiny web calculator was implemented.

Of 899 participants (median age, 33 years; 64.4% female), 19.9% reported past-year suicidality. All models showed high discrimination in the test set (AUCs > 0.900). The random forest (RANGER implementation) performed best (AUC, 0.978; 95% CI, 0.955–1.000; area under the precision–recall curve, 0.960; Brier, 0.028; accuracy, 0.967; sensitivity, 0.927; specificity, 0.977), with good calibration and favorable net clinical benefit on DCA. SHAP ranked insomnia severity index as the leading contributor, followed by the five facet mindfulness questionnaire and the repeatable battery for the assessment of neuropsychological status.

In SSD outpatients, an interpretable RANGER-based model showed strong internal performance for classifying participants who reported past-year self-reported suicidality, and yielded favorable clinical net benefit across relevant decision thresholds. A web-based calculator illustrates potential usability in outpatient settings; external validation and prospective implementation studies are warranted before routine adoption.

The online version contains supplementary material available at 10.1186/s12888-026-07901-9.

## Full-text entities

- **Diseases:** SSD (MESH:D000071896), insomnia (MESH:D007319)
- **Species:** Homo sapiens (human, species) [taxon 9606]

## Full text

_Full body text omitted from this summary view._ Fetch the complete paper as Markdown: https://tomesphere.com/paper/PMC13020323/full.md

## Figures

8 figures with captions in the complete paper: https://tomesphere.com/paper/PMC13020323/full.md

## References

3 references — full list in the complete paper: https://tomesphere.com/paper/PMC13020323/full.md

---
Source: https://tomesphere.com/paper/PMC13020323