Speech-Based Estimation of Schizophrenia Severity Using Feature Fusion

Gowtham Premananth; Carol Espy-Wilson

arXiv:2411.06033·eess.AS·November 21, 2024·ICASSP

Speech-Based Estimation of Schizophrenia Severity Using Feature Fusion

Gowtham Premananth, Carol Espy-Wilson

PDF

Open Access

TL;DR

This paper presents a deep learning framework that fuses articulatory and self-supervised speech features to accurately estimate schizophrenia severity from speech, achieving significant error reduction over previous models.

Contribution

It introduces a novel feature fusion approach combining articulatory and self-supervised speech features, along with an auto-encoder-based representation learning framework for improved severity estimation.

Findings

01

Reduced MAE by 9.18% with the proposed model

02

Reduced RMSE by 9.36% compared to previous models

03

Effective fusion of articulatory and self-supervised features enhances accuracy

Abstract

Speech-based assessment of the schizophrenia spectrum has been widely researched over in the recent past. In this study, we develop a deep learning framework to estimate schizophrenia severity scores from speech using a feature fusion approach that fuses articulatory features with different self-supervised speech features extracted from pre-trained audio models. We also propose an auto-encoder-based self-supervised representation learning framework to extract compact articulatory embeddings from speech. Our top-performing speech-based fusion model with Multi-Head Attention (MHA) reduces Mean Absolute Error (MAE) by 9.18% and Root Mean Squared Error (RMSE) by 9.36% for schizophrenia severity estimation when compared with the previous models that combined speech and video inputs.

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsEmotion and Mood Recognition

MethodsAttention Is All You Need · Softmax · Linear Layer · Multi-Head Attention