A Novel Audio-Visual Information Fusion System for Mental Disorders Detection
Yichun Li, Shuanglin Li, Syed Mohsen Naqvi

TL;DR
This paper introduces a multimodal audio-visual system using attention networks for detecting multiple mental disorders, achieving high accuracy and reducing computational costs compared to traditional methods.
Contribution
It presents a novel, general-purpose diagnosis system based on emotional expression features, combining audio and visual data with efficient deep learning techniques.
Findings
Achieves over 80% accuracy on ADHD dataset
Sets new state-of-the-art results on AVEC 2014 depression dataset
Uses less computational resources than traditional fMRI and EEG methods
Abstract
Mental disorders are among the foremost contributors to the global healthcare challenge. Research indicates that timely diagnosis and intervention are vital in treating various mental disorders. However, the early somatization symptoms of certain mental disorders may not be immediately evident, often resulting in their oversight and misdiagnosis. Additionally, the traditional diagnosis methods incur high time and cost. Deep learning methods based on fMRI and EEG have improved the efficiency of the mental disorder detection process. However, the cost of the equipment and trained staff are generally huge. Moreover, most systems are only trained for a specific mental disorder and are not general-purpose. Recently, physiological studies have shown that there are some speech and facial-related symptoms in a few mental disorders (e.g., depression and ADHD). In this paper, we focus on the…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsEmotion and Mood Recognition
MethodsSoftmax · Attention Is All You Need · Focus
