# Sociodemographic Variables in Offender and Non-Offender Patients Diagnosed with Schizophrenia Spectrum Disorders—An Explorative Analysis Using Machine Learning

**Authors:** Andreas B. Hofmann, Marc Dörner, Lena Machetanz, Johannes Kirchebner

PMC · DOI: 10.3390/healthcare12171699 · 2024-08-26

## TL;DR

This study uses machine learning to explore sociodemographic differences between patients with schizophrenia who have committed offenses and those who have not.

## Contribution

The novel aspect is applying machine learning to identify sociodemographic patterns in schizophrenia patients with and without criminal behavior.

## Key findings

- Gradient boosting was the most effective machine learning algorithm for the dataset.
- Three sociodemographic variables (country of birth, residence status, and educational status) were most relevant for discrimination.
- The model had a moderate AUC of 0.65, indicating limited ability to distinguish offender and non-offender patients based on sociodemographics.

## Abstract

With the growing availability of medical data and the enhanced performance of computers, new opportunities for data analysis in research are emerging. One of these modern approaches is machine learning (ML), an advanced form of statistics broadly defined as the application of complex algorithms. ML provides innovative methods for detecting patterns in complex datasets. This enables the identification of correlations or the prediction of specific events. These capabilities are especially valuable for multifactorial phenomena, such as those found in mental health and forensic psychiatry. ML also allows for the quantification of the quality of the emerging statistical model. The present study aims to examine various sociodemographic variables in order to detect differences in a sample of 370 offender patients and 370 non-offender patients, all with schizophrenia spectrum disorders, through discriminative model building using ML. In total, 48 variables were tested. Out of seven algorithms, gradient boosting emerged as the most suitable for the dataset. The discriminative model finally included three variables (regarding country of birth, residence status, and educational status) and yielded an area under the curve (AUC) of 0.65, meaning that the statistical discrimination of offender and non-offender patients based purely on the sociodemographic variables is rather poor.

## Linked entities

- **Diseases:** schizophrenia (MONDO:0005090)

## Full-text entities

- **Diseases:** Schizophrenia Spectrum Disorders (MESH:D019967)
- **Species:** Homo sapiens (human, species) [taxon 9606]

## Figures

4 figures with captions in the complete paper: https://tomesphere.com/paper/PMC11394671/full.md

---
Source: https://tomesphere.com/paper/PMC11394671