# A clinical prediction model for schizophrenia based on machine learning algorithms

**Authors:** Weifeng Jin, Shuzi Chen, Qiong Gao, Dan Li, Wei Lu, Mengxia Wang, Qing Chen, Ping Lin

PMC · DOI: 10.3389/fmed.2025.1726905 · Frontiers in Medicine · 2026-01-05

## TL;DR

This study developed a machine learning-based diagnostic tool for schizophrenia using routine blood tests and demographic data.

## Contribution

The novel contribution is a machine learning model for schizophrenia diagnosis using peripheral blood indicators and demographic data.

## Key findings

- Random Forest achieved an AUC of 1.00 in training and 0.877 in validation.
- Logistic regression was selected as the final model due to overfitting concerns.
- Arg, TP, ALP, HDL, UA, and LDL were identified as significant predictors.

## Abstract

To develop an auxiliary diagnostic tool for schizophrenia based on multiple test variables using different machine learning algorithms.

This retrospective study used routinely collected peripheral blood biochemical indicators, along with demographic data, to develop a diagnostic model for first-episode schizophrenia. A total of 180 patients with first-episode schizophrenia between January and August 2024, and 214 healthy controls as a population undergoing routine medical examinations during the same period. Data on age, gender, and various blood test results were collected. The dataset was divided into a training set (70%; n = 275) and a internal validation set (30%; n = 119). First, Univariate logistic regression was used to analyze significant indicators (p < 0.1), and feature selection was subsequently performed using the Boruta and LASSO algorithms. Machine learning models were then developed using seven machine learning algorithms, and the Area Under the Curve (AUC), Sensitivity, Specificity, Positive Predictive Value (Pos Pred Value), Negative Predictive Value (Neg Pred Value), Precision, Recall, and F1 score of each model were evaluated. Finally, we constructed an easily interpretable prediction tool based on a multiple logistic regression model. After model construction, we validated the model using an external validation set and a differential diagnosis set. A nomogram of the model outcomes was constructed, and its discrimination, calibration, and clinical decision curves were evaluated.

Arg, TP, ALP, HDL, UA, and LDL were ultimately identified as significant predictors through Univariate logistic regression combined with the Boruta and LASSO algorithms. The Random Forest algorithm outperformed other machine learning models, achieving an AUC of 1.00 for the training set and 0.877 for the validation set. However, due to the risk of overfitting, we ultimately selected the multivariate logistic regression model as the final model for our study and constructed nomograms.

In this study, an auxiliary diagnostic tool for schizophrenia was established using machine learning algorithms combined with routine blood indicators. The logistic regression model demonstrated good performance and can serve as a diagnostic aid for schizophrenia.

## Linked entities

- **Diseases:** schizophrenia (MONDO:0005090)

## Full-text entities

- **Genes:** ATHS (atherosclerosis susceptibility (lipoprotein associated)) [NCBI Gene 470] {aka ALP}
- **Diseases:** schizophrenia (MESH:D012559)
- **Species:** Homo sapiens (human, species) [taxon 9606]

## Full text

_Full body text omitted from this summary view._ Fetch the complete paper as Markdown: https://tomesphere.com/paper/PMC12812932/full.md

## Figures

8 figures with captions in the complete paper: https://tomesphere.com/paper/PMC12812932/full.md

## References

34 references — full list in the complete paper: https://tomesphere.com/paper/PMC12812932/full.md

---
Source: https://tomesphere.com/paper/PMC12812932