Self-supervised Multimodal Speech Representations for the Assessment of   Schizophrenia Symptoms

Gowtham Premananth; Carol Espy-Wilson

arXiv:2409.09733·eess.AS·November 19, 2024

Self-supervised Multimodal Speech Representations for the Assessment of Schizophrenia Symptoms

Gowtham Premananth, Carol Espy-Wilson

PDF

Open Access

TL;DR

This paper presents a self-supervised multimodal speech representation system using VQ-VAE for schizophrenia assessment, effectively classifying symptoms and predicting severity from vocal and facial cues.

Contribution

It introduces a novel VQ-VAE based multimodal representation learning framework for schizophrenia assessment, including severity prediction, outperforming prior methods.

Findings

01

Outperforms previous models on multi-class classification metrics

02

Accurately predicts schizophrenia severity scores

03

Effective multimodal speech representations for clinical assessment

Abstract

Multimodal schizophrenia assessment systems have gained traction over the last few years. This work introduces a schizophrenia assessment system to discern between prominent symptom classes of schizophrenia and predict an overall schizophrenia severity score. We develop a Vector Quantized Variational Auto-Encoder (VQ-VAE) based Multimodal Representation Learning (MRL) model to produce task-agnostic speech representations from vocal Tract Variables (TVs) and Facial Action Units (FAUs). These representations are then used in a Multi-Task Learning (MTL) based downstream prediction model to obtain class labels and an overall severity score. The proposed framework outperforms the previous works on the multi-class classification task across all evaluation metrics (Weighted F1 score, AUC-ROC score, and Weighted Accuracy). Additionally, it estimates the schizophrenia severity score, a task not…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsEmotion and Mood Recognition · Voice and Speech Disorders · Phonetics and Phonology Research