Multimodal Approach for Assessing Neuromotor Coordination in Schizophrenia Using Convolutional Neural Networks
Yashish M. Siriwardena, Chris Kitchen, Deanna L. Kelly, Carol, Espy-Wilson

TL;DR
This paper presents a multimodal CNN approach that combines video and audio data to detect schizophrenia with strong positive symptoms by analyzing speech and facial coordination patterns, outperforming traditional methods.
Contribution
The study introduces a novel multimodal CNN that fuses facial action units and vocal tract variables to improve schizophrenia detection accuracy.
Findings
Multimodal CNN achieves 18% higher F1 score over baseline.
Vocal tract variables outperform MFCCs when fused with facial data.
Speech and facial coordination patterns are distinct in schizophrenic patients.
Abstract
This study investigates the speech articulatory coordination in schizophrenia subjects exhibiting strong positive symptoms (e.g. hallucinations and delusions), using two distinct channel-delay correlation methods. We show that the schizophrenic subjects with strong positive symptoms and who are markedly ill pose complex articulatory coordination pattern in facial and speech gestures than what is observed in healthy subjects. This distinction in speech coordination pattern is used to train a multimodal convolutional neural network (CNN) which uses video and audio data during speech to distinguish schizophrenic patients with strong positive symptoms from healthy subjects. We also show that the vocal tract variables (TVs) which correspond to place of articulation and glottal source outperform the Mel-frequency Cepstral Coefficients (MFCCs) when fused with Facial Action Units (FAUs) in the…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
