AudVowelConsNet: A Phoneme-Level Based Deep CNN Architecture for Clinical Depression Diagnosis
Muhammad Muzammel, Hanan Salam, Yann Hoffmann, Mohamed Chetouani,, Alice Othmani

TL;DR
This paper introduces a deep learning system that analyzes phoneme-level speech features, specifically vowels and consonants, to improve automatic depression diagnosis from speech data.
Contribution
It proposes and compares three spectrogram-based deep neural network architectures focusing on phoneme units, demonstrating superior performance through fusion of vowel and consonant features.
Findings
Consonant-based features outperform vowel-based features in depression recognition.
Fusion of vowel and consonant features significantly improves accuracy.
The proposed approach outperforms existing deep learning methods on the DAIC-WOZ dataset.
Abstract
Depression is a common and serious mood disorder that negatively affects the patient's capacity of functioning normally in daily tasks. Speech is proven to be a vigorous tool in depression diagnosis. Research in psychiatry concentrated on performing fine-grained analysis on word-level speech components contributing to the manifestation of depression in speech and revealed significant variations at the phoneme-level in depressed speech. On the other hand, research in Machine Learning-based automatic recognition of depression from speech focused on the exploration of various acoustic features for the detection of depression and its severity level. Few have focused on incorporating phoneme-level speech components in automatic assessment systems. In this paper, we propose an Artificial Intelligence (AI) based application for clinical depression recognition and assessment from speech. We…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
