Mixture of Experts for Recognizing Depression from Interview and Reading Tasks
Loukas Ilias, Dimitris Askounis

TL;DR
This study introduces a novel deep learning approach using Mixture of Experts models to recognize depression from both spontaneous and read speech, leveraging multimodal fusion for improved accuracy.
Contribution
It is the first to combine representations of spontaneous and read speech with MoE models for depression recognition, addressing previous limitations.
Findings
Achieved 87% accuracy on the Androids corpus.
F1-score of 86.66% demonstrates high model performance.
Utilized multimodal fusion of audio features from different speech tasks.
Abstract
Depression is a mental disorder and can cause a variety of symptoms, including psychological, physical, and social. Speech has been proved an objective marker for the early recognition of depression. For this reason, many studies have been developed aiming to recognize depression through speech. However, existing methods rely on the usage of only the spontaneous speech neglecting information obtained via read speech, use transcripts which are often difficult to obtain (manual) or come with high word-error rates (automatic), and do not focus on input-conditional computation methods. To resolve these limitations, this is the first study in depression recognition task obtaining representations of both spontaneous and read speech, utilizing multimodal fusion methods, and employing Mixture of Experts (MoE) models in a single deep neural network. Specifically, we use audio files corresponding…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
