# Speech analytics across the schizophrenia spectrum disorders: multimodal natural language processing and machine learning modelling in a Chinese-speaking population

**Authors:** Jiaqi Liu, Sumiao Zhou, Guangxing Deng, Meng Ji, Xufei Zhu, Xue He, Qijie Kuang, Shenglin She

PMC · DOI: 10.3389/fpsyt.2025.1725859 · 2026-01-06

## TL;DR

This study aims to develop speech-based biomarkers for diagnosing schizophrenia spectrum disorders using natural language processing and machine learning in a Chinese-speaking population.

## Contribution

The study introduces a new Chinese speech database and integrates machine learning to identify discriminative speech features for schizophrenia spectrum disorders.

## Key findings

- A Chinese speech database will be established for multidimensional analysis of speech characteristics.
- Speech features will be quantified using natural language processing to develop objective biomarkers for SSD diagnosis.
- Machine learning will be used to identify highly discriminative speech patterns specific to schizophrenia spectrum disorders.

## Abstract

Formal thought disorder (FTD) is a core symptom of schizophrenia spectrum disorders (SSDs). As a key representational dimension of FTD, speech features have been shown in previous studies to hold potential as diagnostic biomarkers for SSD. However, relevant research remains limited, and such speech features have not yet been applied clinically for SSD diagnosis.

The aim of this research is to establish a Chinese speech database for multidimensional analysis of speech characteristics, quantify these high-dimensional linguistic features using natural language processing (NLP), and ultimately develop objective biomarkers for diagnosing and assessing the severity of SSD.

This will be a single-center, prospective, observational study. In accordance with the DSM-5 criteria, a total of 300 inpatients or outpatients meeting the diagnostic criteria for SSD are planned to be included. Healthy controls with no history of intellectual disability will subsequently be matched. Each participant will undergo a 1-to-2-hour task-guided interview conducted by a psychiatrist, which includes an app-based assessment of the PANSS(Positive and Negative Syndrome Scale), short passage reading, an animal fluency test, a pseudosentence reading task, a symptom severity rating task, an inner-world expression task, and a picture description task. All the interviews will be audio-recorded. After the interview, clinical rating scales will assess psychiatric symptom severity, social functioning, and thought-language disorders. During the study, at an interval of 2 weeks.

By multidimensionally quantifying these speech characteristics and integrating machine learning, this study aims to screen highly discriminative speech feature combinations specific to SSD, thereby providing technical and theoretical support for the precise diagnosis and personalized intervention of SSD. These findings will deepen psychiatrists’ understanding of the linguistic pathological mechanisms underlying SSD and promote the development of diagnostic tools and intervention protocols based on novel biomarkers.

## Linked entities

- **Diseases:** intellectual disability (MONDO:0001071)

## Full-text entities

- **Diseases:** FTD (MESH:D009358), Positive (MESH:D000377), intellectual disability (MESH:D008607), SSD (MESH:C563928), SSDs (MESH:D019967), thought-language disorders (MESH:D007806), psychiatric (MESH:D001523)
- **Species:** Homo sapiens (human, species) [taxon 9606]

## Figures

1 figure with captions in the complete paper: https://tomesphere.com/paper/PMC12815835/full.md

---
Source: https://tomesphere.com/paper/PMC12815835