A multi-modal approach for identifying schizophrenia using cross-modal   attention

Gowtham Premananth; Yashish M.Siriwardena; Philip Resnik; Carol; Espy-Wilson

arXiv:2309.15136·eess.SP·April 22, 2024

A multi-modal approach for identifying schizophrenia using cross-modal attention

Gowtham Premananth, Yashish M.Siriwardena, Philip Resnik, Carol, Espy-Wilson

PDF

Open Access

TL;DR

This paper presents a multi-modal system combining audio, video, and text data with cross-modal attention to improve schizophrenia detection, outperforming previous methods by 8.53% in F1 score.

Contribution

It introduces a novel multi-modal classification framework using cross-modal attention and hierarchical models for schizophrenia identification.

Findings

01

Outperforms previous state-of-the-art by 8.53% in F1 score.

02

Effectively combines audio, video, and text modalities.

03

Uses high-level coordination features and hierarchical attention for improved accuracy.

Abstract

This study focuses on how different modalities of human communication can be used to distinguish between healthy controls and subjects with schizophrenia who exhibit strong positive symptoms. We developed a multi-modal schizophrenia classification system using audio, video, and text. Facial action units and vocal tract variables were extracted as low-level features from video and audio respectively, which were then used to compute high-level coordination features that served as the inputs to the audio and video modalities. Context-independent text embeddings extracted from transcriptions of speech were used as the input for the text modality. The multi-modal system is developed by fusing a segment-to-session-level classifier for video and audio modalities with a text model based on a Hierarchical Attention Network (HAN) with cross-modal attention. The proposed multi-modal system…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsEmotion and Mood Recognition · Music and Audio Processing · Speech Recognition and Synthesis