# Ensuring generalizability and clinical utility in mental health care applications: Robust artificial intelligence‐based treatment predictions in diverse psychosis populations

**Authors:** Fiona Coutts, Sergio Mena, Esin Ucur, W Wolfgang Fleischhacker, Rene Kahn, Jeffrey Lieberman, Alkomiet Hasan, Oliver Howes, Christoph Correll, Nikolaos Koutsouleris, Paris Alexandros Lalousis

PMC · DOI: 10.1111/pcn.13914 · Psychiatry and Clinical Neurosciences · 2025-11-06

## TL;DR

This paper presents AI models that predict antipsychotic treatment outcomes in psychosis patients, showing good performance across diverse groups but highlighting the need for more inclusive data.

## Contribution

A robust framework for training and validating AI models in psychiatry, with generalizable results across different psychosis populations.

## Key findings

- Models predicted symptom severity and remission with moderate accuracy (r = 0.4–0.68, BAC = 62.4%–69%) and validated well in external cohorts.
- Performance remained significant with only 8–9 key variables, showing practical utility.
- Model accuracy varied across sex, ethnicity, and medication subgroups, indicating potential equity issues.

## Abstract

Artificial Intelligence (AI)‐based prediction models of treatment response promise to revolutionize psychiatric care by enabling personalized treatment, but very few have been thoroughly tested in different samples or compared to current clinical standards. Here we present models predicting antipsychotic response and assess their clinical utility in a robust methodological framework.

Machine learning models were trained and cross‐validated on clinical and sociodemographic data from 594 individuals with established schizophrenia (NCT00014001) and 323 individuals with first episode psychosis (NCT03510325). Models predicted four measures of antipsychotic response at 3 months after baseline. Clinical utility was assessed using decision curve and calibration curve analyses. Model performance was tested in a reduced feature space and across sex, ethnicity, antipsychotic, and symptom change subgroups to investigate model fairness.

Models predicting total symptom severity (r = 0.4–0.68) and symptomatic remission (BAC = 62.4%–69%) performed well in both samples and externally validated successfully in the opposing cohort (r = 0.4–0.5, BAC = 63.5%–65.7%). Performance remained significant when the models were reduced to 8–9 key variables (r = 0.53 for total symptom severity, BAC = 65.3% for symptomatic remission). Models predicting symptomatic remission had a net benefit across risk thresholds of 0.5–0.9 and were moderately well‐calibrated (ECE = 0.16–0.18). Model performance different across sex, ethnicity and medication subgroups.

We present a robust framework for training and assessing the clinical utility of prediction models in psychiatry. Our models generalize across different psychosis populations and show promising calibration and net benefit. However, performance disparities across demographic and treatment subgroups highlight the need for more diverse clinical samples to ensure equitable prediction.

## Linked entities

- **Diseases:** schizophrenia (MONDO:0005090), psychosis (MONDO:0005485)

## Full-text entities

- **Diseases:** schizophrenia (MESH:D012559), psychiatric (MESH:D001523), psychosis (MESH:D011618)

## Full text

_Full body text omitted from this summary view._ Fetch the complete paper as Markdown: https://tomesphere.com/paper/PMC12757767/full.md

## Figures

5 figures with captions in the complete paper: https://tomesphere.com/paper/PMC12757767/full.md

## References

65 references — full list in the complete paper: https://tomesphere.com/paper/PMC12757767/full.md

---
Source: https://tomesphere.com/paper/PMC12757767